|
qhat posted:I don't know why but there's a team in my company using some dynamodbs because they think a million records is apparently too much data to query on without nosql. This is the same team whose webpage load time I brought down from 15 seconds to half a second by adding a single index to one of the SQL tables they do have. Jesus Christ why do I have imposter syndrome ever then
|
# ? Apr 27, 2018 20:28 |
|
|
# ? Jan 20, 2025 18:50 |
|
Space Whale posted:Jesus Christ why do I have imposter syndrome ever then because you don't know how to wrap something simple in a bunch of crazy poo poo see: lightbulbs
|
# ? Apr 27, 2018 20:31 |
|
Space Whale posted:5000 tho and a tb of data? oh lol nvm. yeah we’re not brought in until the client is dealing with petabytes of data that has to be queried in milliseconds. nosql still has its advantages but at that level it’s definitely not that
|
# ? Apr 27, 2018 20:31 |
|
Sapozhnik posted:when you've got a working set that can't fit into a single instance's ram then you start thinking about application level sharding They probably don't have that much data in terms of absolute storage (I'd bet all poo poo like videos and images get thrown in a separate store and they just keep a ref/URI in the main DB), but thing is you can get really loving far with just sharding.
|
# ? Apr 27, 2018 21:24 |
|
MononcQc posted:afaict whatsapp has over a billion users and most of their work is done by regular DBs (in fact bad ones like in Erlang that can't store more than what RAM fits -- unless they have patches not mentioned in their talks) and very clever sharding adapted to the replication and query schemes they need. So how does sharding actually work? Is there a load balancer or do the SQL servers just talk to each other?
|
# ? Apr 27, 2018 21:30 |
|
Space Whale posted:So how does sharding actually work? Is there a load balancer or do the SQL servers just talk to each other? Distributed Hashing and black magic or some poo poo
|
# ? Apr 27, 2018 21:35 |
|
Janitor Prime posted:Distributed Hashing and black magic or some poo poo I thought that was red and green, since you're sharing the hash
|
# ? Apr 27, 2018 21:35 |
|
Space Whale posted:So how does sharding actually work? Is there a load balancer or do the SQL servers just talk to each other? i mean there's a few different ways to do it if your data has very clear logical divisions - say, every users data is 100% constrained to a geographic region - you can manually shard on that and just have a different server for each region/shard and have basically separate copies of your app for each region. see also: blizzard, a bunch of other online games where you have to specify whether you're logging into "North America" or "Korea" or whatever some databases have it built in - you choose a sharding key and your queries all get routed to the correct server using the shard key specified in the query no matter which node they go to in the first place. some databases may provide options for cross-shard queries, but this varies really "sharding" on its own doesn't imply a lot of detail, it just means "put your data into different buckets somehow so that each bucket is mostly independent", which is how all of the big-data systems like dynamo and cassandra work anyway. what matters is how much you do manually with knowledge of your application and query patterns vs. how much you rely on your database to do for you, which may or may not be well-suited to your application's actual workload it probably won't be well-suited to your actual workload because designing your data model so that it does work well with dynamo/cassandra/bigtable/etc requires actual thought *sharts* Arcsech fucked around with this message at 21:46 on Apr 27, 2018 |
# ? Apr 27, 2018 21:42 |
|
please tell me the database term was not taken from ultima online
|
# ? Apr 27, 2018 21:53 |
|
hobbesmaster posted:please tell me the database term was not taken from ultima online it seems as though this is a distinct possibility, given that the lore came from the need to create parallel independent instances of the game https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/ per wikipedia there was a system for replicated data that had SHARD as an acronym for "System for Highly Available Replicated Data" that existed before ultima online but, well, given that nobody's ever heard of SHARD-the-database and database engineers are turbonerds, what do you think is more likely
|
# ? Apr 27, 2018 22:01 |
|
most the time people complain about poor db performance, they really just need better schemas and queries
|
# ? Apr 27, 2018 22:27 |
|
hobbesmaster posted:please tell me the database term was not taken from ultima online
|
# ? Apr 27, 2018 22:37 |
|
uncurable mlady posted:most the time people complain about poor db performance, they really just need better schemas and queries Unless you're Google, this is always the case.
|
# ? Apr 27, 2018 23:08 |
|
Do web developers or devops do any work on the side? I'm about to start selling some of my personal hours to the highest bidder, but elance makes me think my experience doing anything well is a liability because I'm not scrambling to do it fast / shittily / in bulk. If good self employment strategies for extra cash belong in another thread I can post there instead.
|
# ? Apr 28, 2018 00:22 |
|
qhat posted:Unless you're Google, this is always the case. eh, i wouldn't be quite that reductive
|
# ? Apr 28, 2018 00:25 |
|
Arcsech posted:i mean there's a few different ways to do it So if you have a lot of, say, MLS data and real estate poo poo you could... split by those divisions?
|
# ? Apr 28, 2018 00:46 |
|
Cheekio posted:Do web developers or devops do any work on the side?
|
# ? Apr 28, 2018 02:34 |
|
Relational databases definitely still have a place in the world, but work best as a view of the data, not the source of truth. Sure you can shard things out almost indefinitely; but when you want to run analytics on your entire dataset, or subsets of that dataset; you're stuck stitching together partial results from thousands of databases. The better approach (depending on your usecase of course) is to try and have a single event stream that can put data in multiple places, the database being one of them. A snappy UI can be driven off a SQL database while larger analytics can come from BigQuery/Redshift/Spark Jobs/Beam Jobs etc etc. Important changes made in the UI can be broadcast to that event stream (as well as the local database its using in some sort of "unofficial" change, to make the change seem immediate). Using a single (or multiple) relational databases might be ok for awhile, but it doesn't take "Google" levels of data for this to be a problem. I work at a medium sized company trying to transition to this model, away from thousands of postgres instances across about 100 physical servers (with really beefy specs). The Postgres servers are fine (well, ok, not really) at delivering reports over relatively short time periods, but we constantly get requests from customers for reports across DB boundaries, or for long periods of time, and that really interferes with our transactional load. Googling for Whatapp's architecture diagram reveals they do this: data gets put in a relational database but also in riak and probably in a bunch of other places they don't list on the public documents.
|
# ? Apr 28, 2018 05:43 |
|
sounds like you didn't shard correctly, op
|
# ? Apr 28, 2018 06:53 |
|
Hitting up a buddy who works at big name company in Vancouver, seeing if he can get me a job there. It would be a huge pay increase if I were to I reckon. Long live nepotism.
|
# ? Apr 28, 2018 08:12 |
|
Vancouver Washington or B.C. tho
|
# ? Apr 28, 2018 15:23 |
|
ADINSX posted:Relational databases definitely still have a place in the world, but work best as a view of the data, not the source of truth. we do the same except cassandra and a lot of beefy mysql (😱) rds instances
|
# ? Apr 28, 2018 15:25 |
|
ADINSX posted:Relational databases definitely still have a place in the world, but work best as a view of the data, not the source of truth. Do this it makes GDPR compliance real fun.
|
# ? Apr 28, 2018 16:03 |
|
Space Whale posted:Vancouver Washington or B.C. tho BC.
|
# ? Apr 28, 2018 16:29 |
|
tk posted:Do this it makes GDPR compliance real fun. Yeah this is the wrench that gets thrown into the model. Its fun to talk about everything as an event and the ability to recreate the exact same state by re-streaming all events from the beginning of time. But what happens when a customer leaves? Or they have a retention policy? We're working on cleanup jobs that just go through all the views and remove the relevant data; basically you have to violate the "immutable" part of the events. In an ideal world there would be "delete" events that would remove records from the views... but you still need to remove them from any copy of the event stream itself, so what can you do. I guess the main point of the post was that saying "only google cares about this" is not true; its not hard to get too much data for a cluster of databases to no longer be the best solution. I don't even think its wrong for startups to design things from the onset this way. Maybe its a little resume-driven-development, but in their eyes they either grow to that scale or die anyway, so might as well plan to succeed. I'm gonna be interviewing with a satellite imaging company next week. They produce about 5-10TB of imagery a day and deal with even more serious "big data" problems, so thats pretty exciting.
|
# ? Apr 28, 2018 16:33 |
|
ADINSX posted:Using a single (or multiple) relational databases might be ok for awhile, but it doesn't take "Google" levels of data for this to be a problem. I work at a medium sized company trying to transition to this model, away from thousands of postgres instances across about 100 physical servers (with really beefy specs). The Postgres servers are fine (well, ok, not really) at delivering reports over relatively short time periods, but we constantly get requests from customers for reports across DB boundaries, or for long periods of time, and that really interferes with our transactional load. with very few exceptions, sql databases are build and sold for oltp workloads. always have been. reporting and analysis are weak, at best. you are not going to do olap on a database designed for oltp, and vice versa
|
# ? Apr 28, 2018 18:19 |
|
i am also curious what you consider "beefy." it's 2018: you can order an off-the-shelf x86 server with 384 cores and 48 tb of ram if you have deep pockets, petabytes of ram and thousands of cores are an option
|
# ? Apr 28, 2018 18:22 |
|
Cheekio posted:Do web developers or devops do any work on the side? I'm about to start selling some of my personal hours to the highest bidder, but elance makes me think my experience doing anything well is a liability because I'm not scrambling to do it fast / shittily / in bulk. elance and friends are a complete waste of time the buyers are idiots and the only successful sellers are in low-CoL countries so they will work for $5/hr don't even consider it unless you live in rural india and you have a very high tolerance for bullshit
|
# ? Apr 28, 2018 18:24 |
|
I think we're saying the same thing? Their main strength is OLTP but of course you can get away with running analytics on them as well, especially when the data is small. When our company was building this out, it was the mid aughts and it was a team of people who thought the database could solve everything (I've only been here for 2 years so this is second hand). So we have OLTP, OLAP and even a nightmarish web of triggers to create business objects as events are inserted into the database. Its a real triple threat and now machines are falling over. As for how beefy the servers are, not that beefy by those standards. Ram on the order of dozens of gigabytes, no idea about the number of cores. As I understand it most of them were acquired several years ago and are getting on in age, leaving the company with a decision: Buy a new round of hardware, or move most of the data to cloud services.
|
# ? Apr 28, 2018 18:30 |
|
ADINSX posted:As for how beefy the servers are, not that beefy by those standards. Ram on the order of dozens of gigabytes, no idea about the number of cores. So not beefy at all, then.
|
# ? Apr 28, 2018 18:32 |
|
please don't server shame
|
# ? Apr 28, 2018 18:34 |
|
ADINSX posted:I think we're saying the same thing? Their main strength is OLTP but of course you can get away with running analytics on them as well, especially when the data is small. When our company was building this out, it was the mid aughts and it was a team of people who thought the database could solve everything (I've only been here for 2 years so this is second hand). So we have OLTP, OLAP and even a nightmarish web of triggers to create business objects as events are inserted into the database. Its a real triple threat and now machines are falling over. yeah this is not a fundamental problem in sql databases this is a problem at your company. y'all should... stop doing that. ADINSX posted:As for how beefy the servers are, not that beefy by those standards. Ram on the order of dozens of gigabytes, no idea about the number of cores. As I understand it most of them were acquired several years ago and are getting on in age, leaving the company with a decision: Buy a new round of hardware, or move most of the data to cloud services. i think my smallest server at work has 768 gb. they are 1U boxes that we use for dumb cloud-type poo poo you should definitely buy a new round of hardware, first, to buy you time for the multi-year transition to cloud services. that ain't exactly like flipping a switch or something
|
# ? Apr 28, 2018 18:40 |
|
Notorious b.s.d. posted:i am also curious what you consider "beefy." it's 2018: you can order an off-the-shelf x86 server with 384 cores and 48 tb of ram spending a quarter of a million dollars or more on a beast like that just seems like all sorts of bad idea.
|
# ? Apr 28, 2018 18:41 |
|
Sapozhnik posted:spending a quarter of a million dollars or more on a beast like that just seems like all sorts of bad idea. a quarter of a million dollars would be a rounding error on my group's technology spend. not my company's spend. just my group. commodity hardware is really, really cheap for the cost of 1 fte, you could have four beefy database servers in two redundant pairs (bearing in mind they will be amortized out across three or four years on a lease) Notorious b.s.d. fucked around with this message at 18:45 on Apr 28, 2018 |
# ? Apr 28, 2018 18:42 |
|
vast acres of ram are always nice to have but how can you make effective use that many cores tied to a single system bus? your budget is enviable but also not indicative of anything in one direction or the other. any fool can shovel money into a furnace.
|
# ? Apr 28, 2018 18:45 |
|
Sapozhnik posted:vast acres of ram are always nice to have but how can you make effective use that many cores tied to a single system bus? obviously reaching across sockets on a NUMA system is a lot slower than local access but it is much, much, much, much, much faster than reaching out across a network
|
# ? Apr 28, 2018 18:46 |
|
Notorious b.s.d. posted:yeah this is not a fundamental problem in sql databases Yes... I know... we are in the process of stopping. Its not a fundamental problem with relational databases, but it is a fundamental problem when you say "Relational databases can be used for anything so long as you shard them and use indexes". Which is how we started talking about this. Notorious b.s.d. posted:i think my smallest server at work has 768 gb. they are 1U boxes that we use for dumb cloud-type poo poo We're about a year into the transition, and even if we weren't its not up to me, I'm only on the software side of things.
|
# ? Apr 28, 2018 18:46 |
|
Sapozhnik posted:your budget is enviable but also not indicative of anything in one direction or the other. any fool can shovel money into a furnace. it is a reminder that not all the world is a startup struggling to justify a larger ec2 instance type technology in general is insanely expensive, and commodity hardware is very, very cheap compared to either the cost of labor or the business value generated spending a quarter million on a beefy database server isn't just a good idea, it's often a no-brainer -- why invest any time or effort into designing a replacement for an oltp database when you can just upgrade the servers
|
# ? Apr 28, 2018 18:47 |
|
Notorious b.s.d. posted:you should definitely buy a new round of hardware, first, to buy you time for the multi-year transition to cloud services. that ain't exactly like flipping a switch or something current job refuses to accept this idea or the reality of their situation. they’re absolutely certain they can just dump two decades of legacy crap into aws if they just try harder.
|
# ? Apr 28, 2018 19:33 |
|
|
# ? Jan 20, 2025 18:50 |
|
jony neuemonic posted:current job refuses to accept this idea or the reality of their situation. they’re absolutely certain they can just dump two decades of legacy crap into aws if they just try harder. Because they realize once the immediate fire is put out the organization will go right back to creating a bunch of small new fires
|
# ? Apr 28, 2018 19:37 |