Mao Zedong Thot posted:Fire ur Sr developer that can't be assed to do their job ive made my comments about this at work already but the reality is that his core job he is doing exceptionally well, he just has only windows experience
|
|
# ? Jul 22, 2018 20:39 |
|
|
# ? Oct 6, 2024 01:14 |
|
yeah i dont know that mongo is any worse than other document stores; mongo is the only one i've used seriously. my issue is more with relational data being backed by a document store, that's the problem. it's an absolute nightmare to maintain.
|
# ? Jul 22, 2018 20:40 |
|
tef's posts always remind me that i need to learn more about distributed systems.
|
# ? Jul 22, 2018 20:52 |
|
MALE SHOEGAZE posted:tef's posts always remind me that i need to never work on distributed systems.
|
# ? Jul 22, 2018 20:53 |
|
[quote="eschaton posted:Rust hacking suggestion: take your functor thingy and profile the compiler and see if you can figure out why the slow part is slow, and what to do to fix it eschaton" post="486351738"] also spend your time after this job writing tools hacking on Rust itself would be a good start so would hacking on Swift, or writing new static analyses for clang, or helping with the SBCL POWER9/ppc64le port, that sort of thing [/quote] yeah working on understanding the rust compiler issue as we speak. it's taking me a while to isolate a good test case but i'm 100% sure it's related to generators. but i probably wont make a ton of progress until i'm done with this job. i would love to work on swift stuff too so i'll see what's going on! thanks for the recs!
|
# ? Jul 22, 2018 20:58 |
|
or that.
|
# ? Jul 22, 2018 20:59 |
|
gonadic io posted:tef's posts always remind me that i need to never work on distributed systems. here's the thing, if it involves multiple copies of things being kept in sync, you're out of luck
|
# ? Jul 22, 2018 21:04 |
|
maybe i'm brain weird, or maybe it was the things i did before programming (maths), but like distributed systems aren't especially hard or confusing but it is more frustrating than regular programmng because no amount of neat or clever tricks work and it all comes down to doing it the hard or slow way yes, atomic clocks count as doing it the hard and slow way, spanner still uses interval arithmetic for timestamps and paxos as almost every step in the process can fail in a ridiculous manner, and there is no way to bypass thngs almost every optimisation you make to 'make it faster' will usually crash the system all it boils down to is being very thorough with error handling and saying no a whole bunch of times being a pessimistic poo poo, that's all oh and learning all of the last 50-60 years of work or something because you need experience with the failure to prepare for them properly
|
# ? Jul 22, 2018 21:32 |
|
Have we figured out if PK has been buried alive in the Sonoran desert yet?
|
# ? Jul 22, 2018 21:35 |
|
cinci zoo sniper posted:the most common use cases are to airflow can do that; we use it at work and i hate it, but i havent used it for long enough to know if it is because it is bad or just that i'm too new at it
|
# ? Jul 22, 2018 22:27 |
|
they're basically unavoidable. even ye olde cgi web sites are a simple distributed system. just imagine that all of your various services that need to talk to each other are impatient users mashing the Submit Reply button over and over
|
# ? Jul 22, 2018 22:50 |
|
MALE SHOEGAZE posted:tef's posts always remind me that i need to learn more about distributed systems. i work with a distributed system and the thing that sucks the most is how easy it is to go up your own rear end covering edge cases and failure points. it gets absurd and never gets simpler and that's why tef's posts own.
|
# ? Jul 22, 2018 23:25 |
|
jony neuemonic posted:i put my money where my mouth is and chucked dapper and mediatr at a project and folks, my needs are extremely suited. dapper is extremely good but what's mediatr?
|
# ? Jul 22, 2018 23:30 |
|
Some storage we'll use on a high-availability product will be backed by just regular human readable files because it will be more important for an operator to be able to edit things by hand or crush it with an external file if all poo poo goes to hell than any level of performance that would be gained otherwise
|
# ? Jul 22, 2018 23:40 |
|
cinci zoo sniper posted:returning to topic of job scheduling: anyone knows alternatives to rundeck? my job needs a gui shell for cron jobs with basic tooling like logging or execution emails, since one of our senior developers still doesn’t know how to work with ssh tunnels or command line linux Jenkins? Add your scripts as projects and tell him to hit "build now" if he wants any of them run. Bonus is that you could extract the jobs out of cron itself as well and put 'em in Jenkins and it's trivial to install.
|
# ? Jul 22, 2018 23:40 |
|
MononcQc posted:Some storage we'll use on a high-availability product will be backed by just regular human readable files because it will be more important for an operator to be able to edit things by hand or crush it with an external file if all poo poo goes to hell than any level of performance that would be gained otherwise thats why our message bus passes pointers to files in s3. any performance loss (and there's not a lot) is offset by a) being able to sort of replay messages and b) diagnose failed payloads
|
# ? Jul 22, 2018 23:47 |
|
Powerful Two-Hander posted:dapper is extremely good but what's mediatr? this thing. it makes for a really nice application structure, especially if you do feature folders too. you can do a kind of half-assed cqrs thing without much effort that’s still more readable and easy to work with than typical mvc.
|
# ? Jul 23, 2018 00:07 |
|
tef posted:yeah, tbh it started out no more reliable than the filesystem, which, eh, isn't so bad the biggest issue is that distributing the software and even the software written to operate said software is at most 40% of delivering a reliable product in this space. with so few products delivering adequate resources and support, its pretty obvious how mongo was able to do as well it did. imo we're still missing an open source distributed transaction log (as a a service) with a protocol that can encode sql semantics with serializable isolation at low 1000s tps and an order of magnitude or so more reads. everything else is just ux and routing.
|
# ? Jul 23, 2018 00:25 |
|
FamDav posted:the biggest issue is that distributing the software and even the software written to operate said software is at most 40% of delivering a reliable product in this space. with so few products delivering adequate resources and support, its pretty obvious how mongo was able to do as well it did. from the outset, no-one needs a distributed transaction log, and the overhead of setup and learning to use it outweighs 'make it work now' when you do try to build one, you find 'one-size-fits-all' breaks down entirely, and usually the one machine install or debugging setup is completely poo poo what mongo did was make the single machine usecase work for free, then wait for the phone to ring when people needed sharding, if it's hard to get running in 5 minutes no-one will keep trying unless forced to programmers will write some tool or product they wished they had from the beginning, but not the product they'd have used at the time like, if a programmer makes a startup, and make a database startup, then they pivot into monitoring or analytics, in one order or another no-one at a startup wants to pay for things, the cheap bastard mentality will last for years after they started or, even if you make the thing people need, people don't want that even so, building off the shelf distributed systems products is hard because it requries tight integration around the neighbouring component a distributed transaction log will force clients to handle failures, emit heartbeats, timeouts, the client needs to be fit into the system in order to guarantee reliability the first problem the log will end up having to handle is deduplication, so it will need to keep an index in sync with writes in order to remove them or update them, it's the same problem as updating a distributed transaction in place, somewhat then you still need to go off and integrate into the cloud and handle provisioning and ha and all of that crud tef fucked around with this message at 01:39 on Jul 23, 2018 |
# ? Jul 23, 2018 01:23 |
|
almost nerd sniped into 'what every big rear end store ends up looking like' but, it's not every big rear end site but a lot of them
|
# ? Jul 23, 2018 02:11 |
|
it isn't a log, it'll be a database, like, with cassandra, mysql, dynamodb, kafka people end up building roughly the same thing first: key-blob store dump some stuff in a table, pull it back out as an object then: secondary indexes the blob is json, and we want a view by timestamp, too. people normally accept some delay with replication here then: caching atop the store we have a lot of reads for the newest version all of this is partitioned by the way, somehow then: other forms of indexing so a full text index, for example, connected up to the writes from the store or schema free indexing, as seen in documentdb then: stored procedures when a write happens, we want to do something, mostly once but we'll put some code in to handle exceptions then: batch procedures ok, so some of these things *need to happen overnight* some *need to happen every 30 seconds*, on all of the rows then: warehousing, cold storage fine, so, we need to aggregate stats, logs, historic data, so we write old versions of the data to other tables to make room for them then: a write buffer ok so incoming writes are hard now but .... then: application logic/rpc call a method on an object, something reads the state, does stuff, and tries to issue an update, etc yes things will be connected up with logs, but in the end, you'll have a database as the primary store, if only for deduplication so so much of this can be done with off the shelf parts, and honestly, using mysql or kafka is quite a big off the shelf part, but a large part of it is wiring it into your intrastructure, making machines pop back up, etc, that's the real shitshow
|
# ? Jul 23, 2018 02:44 |
|
thank you tef for continuing to rant in yospos sincerely, no yoscasm
|
# ? Jul 23, 2018 02:50 |
|
i'm just mad all the time
|
# ? Jul 23, 2018 02:53 |
|
tef posted:from the outset, no-one needs a distributed transaction log, and the overhead of setup and learning to use it outweighs 'make it work now' so as far as anecdotes go, the one I use at work takes about 15 minutes to get set up locally. alternatively, you can just use the in-memory test fixture for unit tests. there's a cli for provisioning logs for production use. it provisions in about a second. quote:even so, building off the shelf distributed systems products is hard because it requries tight integration around the neighbouring component the implication of what i wrote was that - much in the same way kubernetes is open source but - people will end up paying for hosted versions in most cases because so much other stateful software reduces down to it, and can be implemented in terms of it. tef posted:it isn't a log, it'll be a database, so I was vague due to phone posting, but by "right protocol" i was implying that the API of the distributed log made it straightforward to implement isolation in the log itself. that is quote:first: key-blob store serializable multi-key transactions should be straightforward to propose to the log assuming you have a materialized view of the log for read purposes. and if you have a materialized view of the log anyways, that ends up helping for horizontal scale out of reads. quote:then: other forms of indexing and really, these are a combination of "backup log and/or materialized views to an object store" with support for some kind of read through and "materialize log to some other datastore"
|
# ? Jul 23, 2018 03:20 |
|
like i had an unfinished rant about "don't use a database as a queue, or the other way around" again so people have noticed "don't use a database as queue". it sucks for two big reasons the first is that jobs are often ephemeral, or that a lot of jobs are active for five minutes then never read again the second is that when people do put jobs in a database, they use transactions to manage them, and one long running job breaks everything the idea is that someone will transactionally delete a database entry, and hold open the transaction as to have a form of error handling, retry it ends up sucking for two reasons first: it turns out that hundreds of workers trying to find the first unlocked record sucks, unless you use SKIP LOCK second: that whole insert and delete thing works better on mysql, but really bad on postgres, which is append only under the covers, and so deletes or updates take longer to reclaim data that and if anyone has a long running transaction it gets hosed the thing is, "using a queue as a database" happens too like, what they'll do is have 'worker' then 'queue1', then another worker, another queue, and final third worker, chained up or, they'll have some retry mechanism based on re-inserting things into the queue or they have some tracking of offsets or people throw the word tranactional around what happens here is that you have some process with some states it can be in "step 1, step 2, step3", and the various workers involved in each step instead of storing 'item a is at step1', it is in the step1 queue, and so forth instead of having one database to track this, you now have as many as you have steps, and it only tracks the items that successfully came out of one worker and went into another, rarely tracking where things are when there not in a queue the thing is, it can be made to work, just pay amazon a lot of money and make the devs watch a dashboard and try and maybe turn things on and off until it works and the clog is passed through the pipeline again it isn't the queue, it's the persistence, it isn't the queue, it's that the queue is being used to represent state so yeah you end up building a scheduler and a process table to run stuff reliably rather than just a handful of pipes
|
# ? Jul 23, 2018 03:20 |
|
maybe it would help if i referred to it as a distributed ledger instead of a log? i realize by using the words log and transaction together i'm inviting comparison to kafka, which is the wrong comparison.
|
# ? Jul 23, 2018 03:22 |
|
FamDav posted:so as far as anecdotes go, the one I use at work takes about 15 minutes to get set up locally. alternatively, you can just use the in-memory test fixture for unit tests. nice quote:the implication of what i wrote was that - much in the same way kubernetes is open source but - people will end up paying for hosted versions in most cases because so much other stateful software reduces down to it, and can be implemented in terms of it. i hear you, but i just don't think that 'logs' are the best way to think about it, even if a lot of what a database does is log stuff like log structured filesystems are great too, but the real problem is managing partitons / rebalancing but that's also the same as multi key transactions quote:so I was vague due to phone posting, but by "right protocol" i was implying that the API of the distributed log made it straightforward to implement isolation in the log itself. that is the thing is that's more than a log but everyone loves log quote:serializable multi-key transactions should be straightforward to propose to the log assuming you have a materialized view of the log for read purposes. and if you have a materialized view of the log anyways, that ends up helping for horizontal scale out of reads. at this point i'm like 'yeah ok you get my point about it's a database' quote:and really, these are a combination of "backup log and/or materialized views to an object store" with support for some kind of read through and "materialize log to some other datastore" yeah, but they end up in the wishlist every time i look at a kafka log and i find uuids or keys in the messages i am deeply suspicious
|
# ? Jul 23, 2018 03:29 |
|
FamDav posted:the biggest issue is that distributing the software and even the software written to operate said software is at most 40% of delivering a reliable product in this space. with so few products delivering adequate resources and support, its pretty obvious how mongo was able to do as well it did. Bloomberg's comdb2 https://github.com/bloomberg/comdb2 is more-or-less capable of this. I have no idea how bad it is to administer, since it was provided as a managed service internally, but it does what it says on the tin - serializable SQL transactions at 1k+ tps with a multi-datacenter cluster. They use it for drat near everything. As long as you don't try to use it for OLAP-type stuff (which everyone at BB did/does because it's all have lol), it works pretty good.
|
# ? Jul 23, 2018 03:39 |
|
tef posted:nice so i think we're in agreement that its not a log in the kafka sense (oh god), but instead a ledger that supports multi-key serializable transactions (i emphasize serializable with enough sophistication to support index/range queries). i also want to clarify that i expect most application developers to never actually interact with the ledger directly, but instead either use client libraries that implement some higher level logic or much more likely services implemented on top of it. a zookeeper API shim, a cluster manager, etc. etc. poo poo, i've seen someone replace mysql's storage engine (its not a useful implementation, though) quote:yeah, but they end up in the wishlist the nice part of a ledger is that, so long as you can scale reads from it, you can apply it into pretty much whatever you want and however you want. want to put it in a db but don't want to tombstone anything? just process deletes as setting the deleted_at column. want to put every mutation as a distinct row in your data warehouse? you can do that. the application of data into another system is arguably the most straightforward part of all this. read the stream, process the update, write to your state management table that you've read up to point X. quote:every time i look at a kafka log and i find uuids or keys in the messages i am deeply suspicious that sounds depressing. also event sourcing most anywhere. Rhusitaurion posted:Bloomberg's comdb2 https://github.com/bloomberg/comdb2 is more-or-less capable of this. I have no idea how bad it is to administer, since it was provided as a managed service internally, but it does what it says on the tin - serializable SQL transactions at 1k+ tps with a multi-datacenter cluster. They use it for drat near everything. As long as you don't try to use it for OLAP-type stuff (which everyone at BB did/does because it's all have lol), it works pretty good. see above on materializing different views. the dataplane is conflated with the data such that you can't just move the data into a form that is better suited for your use-case.
|
# ? Jul 23, 2018 03:53 |
tef posted:i'm just mad all the time
|
|
# ? Jul 23, 2018 04:38 |
|
The computer disease
|
# ? Jul 23, 2018 04:52 |
it’s not that bad, i just with everything related to computers was different at my work
|
|
# ? Jul 23, 2018 05:09 |
|
tef posted:i'm just mad all the time if ranting in the pos helps with that then yes please more but if we wind up exacerbating it then don’t, we’re not worth your mental health
|
# ? Jul 23, 2018 05:10 |
|
tef posted:every time i look at a kafka log and i find uuids or keys in the messages i am deeply suspicious Sorry, excuse my ignorance as I'm a terrible programmer, but isn't uuids/keys going to be how you track your state for event sourcing? I thought kafka was for event sourcing...
|
# ? Jul 23, 2018 05:14 |
|
eschaton posted:if ranting in the pos helps with that then yes please more i have been appreciating tefs rants on distributed systems for a while now even with about half of them resulting in me going "oh god that's us and now also, yet another terrible medical software programmer checking in e: vvv Night Shade fucked around with this message at 06:15 on Jul 23, 2018 |
# ? Jul 23, 2018 05:29 |
|
Finster Dexter posted:Sorry, excuse my ignorance as I'm a terrible programmer, but isn't uuids/keys going to be how you track your state for event sourcing? I thought kafka was for event sourcing... event sourcing is the next nosql its bad
|
# ? Jul 23, 2018 06:03 |
|
hey i want to throw together a tool for sending commands to the transputer board. unfortunately the system it’s in runs windows 95 and I don’t actually really know anything about pre-.net windows pogromming do I use MFC or what I have visual studio 6 installed
|
# ? Jul 23, 2018 06:28 |
normally this is the case where you curl up and succumb to the sweet embrace of the void
|
|
# ? Jul 23, 2018 06:36 |
|
tef posts make me feel like a huge fuckin idiot
|
# ? Jul 23, 2018 06:55 |
|
|
# ? Oct 6, 2024 01:14 |
|
redleader posted:posts make me feel like a huge fuckin idiot
|
# ? Jul 23, 2018 07:02 |