Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
cinci zoo sniper
Mar 15, 2013




Mao Zedong Thot posted:

Fire ur Sr developer that can't be assed to do their job

ive made my comments about this at work already but the reality is that his core job he is doing exceptionally well, he just has only windows experience

Adbot
ADBOT LOVES YOU

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
yeah i dont know that mongo is any worse than other document stores; mongo is the only one i've used seriously.

my issue is more with relational data being backed by a document store, that's the problem. it's an absolute nightmare to maintain.

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
tef's posts always remind me that i need to learn more about distributed systems.

gonadic io
Feb 16, 2011

>>=

MALE SHOEGAZE posted:

tef's posts always remind me that i need to never work on distributed systems.

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
[quote="

eschaton posted:

Rust hacking suggestion: take your functor thingy and profile the compiler and see if you can figure out why the slow part is slow, and what to do to fix it

eschaton" post="486351738"]
also spend your time after this job writing tools

hacking on Rust itself would be a good start

so would hacking on Swift, or writing new static analyses for clang, or helping with the SBCL POWER9/ppc64le port, that sort of thing
[/quote]

yeah working on understanding the rust compiler issue as we speak. it's taking me a while to isolate a good test case but i'm 100% sure it's related to generators. but i probably wont make a ton of progress until i'm done with this job.

i would love to work on swift stuff too so i'll see what's going on! thanks for the recs!

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder

or that.

tef
May 30, 2004

-> some l-system crap ->

gonadic io posted:

tef's posts always remind me that i need to never work on distributed systems.

here's the thing, if it involves multiple copies of things being kept in sync, you're out of luck

tef
May 30, 2004

-> some l-system crap ->
maybe i'm brain weird, or maybe it was the things i did before programming (maths), but like distributed systems aren't especially hard or confusing but it is more frustrating than regular programmng because

no amount of neat or clever tricks work and it all comes down to doing it the hard or slow way

yes, atomic clocks count as doing it the hard and slow way, spanner still uses interval arithmetic for timestamps and paxos

as almost every step in the process can fail in a ridiculous manner, and there is no way to bypass thngs

almost every optimisation you make to 'make it faster' will usually crash the system


all it boils down to is being very thorough with error handling and saying no a whole bunch of times

being a pessimistic poo poo, that's all


oh and learning all of the last 50-60 years of work or something because you need experience with the failure to prepare for them properly

Schadenboner
Aug 15, 2011

by Shine
Have we figured out if PK has been buried alive in the Sonoran desert yet?

:ohdear:

Corla Plankun
May 8, 2007

improve the lives of everyone

cinci zoo sniper posted:

the most common use cases are to

1) read log after failed execution
2) manually run the job before schedule

airflow can do that; we use it at work and i hate it, but i havent used it for long enough to know if it is because it is bad or just that i'm too new at it

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

they're basically unavoidable. even ye olde cgi web sites are a simple distributed system. just imagine that all of your various services that need to talk to each other are impatient users mashing the Submit Reply button over and over

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

MALE SHOEGAZE posted:

tef's posts always remind me that i need to learn more about distributed systems.

i work with a distributed system and the thing that sucks the most is how easy it is to go up your own rear end covering edge cases and failure points. it gets absurd and never gets simpler and that's why tef's posts own.

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


jony neuemonic posted:

i put my money where my mouth is and chucked dapper and mediatr at a project and folks, my needs are extremely suited.

dapper is extremely good but what's mediatr?

MononcQc
May 29, 2007

Some storage we'll use on a high-availability product will be backed by just regular human readable files because it will be more important for an operator to be able to edit things by hand or crush it with an external file if all poo poo goes to hell than any level of performance that would be gained otherwise :toot:

abigserve
Sep 13, 2009

this is a better avatar than what I had before

cinci zoo sniper posted:

returning to topic of job scheduling: anyone knows alternatives to rundeck? my job needs a gui shell for cron jobs with basic tooling like logging or execution emails, since one of our senior developers still doesn’t know how to work with ssh tunnels or command line linux

Jenkins?

Add your scripts as projects and tell him to hit "build now" if he wants any of them run. Bonus is that you could extract the jobs out of cron itself as well and put 'em in Jenkins and it's trivial to install.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

MononcQc posted:

Some storage we'll use on a high-availability product will be backed by just regular human readable files because it will be more important for an operator to be able to edit things by hand or crush it with an external file if all poo poo goes to hell than any level of performance that would be gained otherwise :toot:

thats why our message bus passes pointers to files in s3. any performance loss (and there's not a lot) is offset by a) being able to sort of replay messages and b) diagnose failed payloads

jony neuemonic
Nov 13, 2009

Powerful Two-Hander posted:

dapper is extremely good but what's mediatr?

this thing. it makes for a really nice application structure, especially if you do feature folders too. you can do a kind of half-assed cqrs thing without much effort that’s still more readable and easy to work with than typical mvc.

FamDav
Mar 29, 2008

tef posted:

yeah, tbh it started out no more reliable than the filesystem, which, eh, isn't so bad


people resent it because it represents the truth that good engineering is useless without good docs, support, or onboarding, and mongodb only did the latter at first, and started acquiring the former recently—honestly, any database under a decade old won't be reliable. then again what about basho and riak, that well engineered software that had more conference attendees than users.

the thing that gets me is the people who smarmed about a 'real' database, when none of them know what isolation level transactions run at, dump poo poo into persistent message queues 'to handle failure', and then slap a dht'd memcache atop and do stale reads. it's fine, i mean i'm not saying mongodb is good but frankly you can build a lossy system out of everything and everyone already has.

the people making GBS threads on mongo are the sort of people who then go and use etc, consul, and a variety of home brew key value stores backed by mmap , complete with a homebrew replication system that's baked atop raft as it tuns out raft is designed for teaching, not performance. poo poo on mongo for all i care, it has a crappy name, poor transactional support, and well, god i tried to do a join once and it was awful


but like, you do realise 'good and smart' engineers have repeatedly tried and failed to build something better in that market space

the biggest issue is that distributing the software and even the software written to operate said software is at most 40% of delivering a reliable product in this space. with so few products delivering adequate resources and support, its pretty obvious how mongo was able to do as well it did.

imo we're still missing an open source distributed transaction log (as a a service) with a protocol that can encode sql semantics with serializable isolation at low 1000s tps and an order of magnitude or so more reads. everything else is just ux and routing.

tef
May 30, 2004

-> some l-system crap ->

FamDav posted:

the biggest issue is that distributing the software and even the software written to operate said software is at most 40% of delivering a reliable product in this space. with so few products delivering adequate resources and support, its pretty obvious how mongo was able to do as well it did.

imo we're still missing an open source distributed transaction log (as a a service) with a protocol that can encode sql semantics with serializable isolation at low 1000s tps and an order of magnitude or so more reads. everything else is just ux and routing.

from the outset, no-one needs a distributed transaction log, and the overhead of setup and learning to use it outweighs 'make it work now'

when you do try to build one, you find 'one-size-fits-all' breaks down entirely, and usually the one machine install or debugging setup is completely poo poo

what mongo did was make the single machine usecase work for free, then wait for the phone to ring when people needed sharding,

if it's hard to get running in 5 minutes no-one will keep trying unless forced to


programmers will write some tool or product they wished they had from the beginning, but not the product they'd have used at the time

like, if a programmer makes a startup, and make a database startup, then they pivot into monitoring or analytics, in one order or another

no-one at a startup wants to pay for things, the cheap bastard mentality will last for years after they started

or, even if you make the thing people need, people don't want that



even so, building off the shelf distributed systems products is hard because it requries tight integration around the neighbouring component

a distributed transaction log will force clients to handle failures, emit heartbeats, timeouts, the client needs to be fit into the system in order to guarantee reliability

the first problem the log will end up having to handle is deduplication, so it will need to keep an index in sync with writes in order to remove them or update them, it's the same problem as updating a distributed transaction in place, somewhat

then you still need to go off and integrate into the cloud and handle provisioning and ha and all of that crud

tef fucked around with this message at 01:39 on Jul 23, 2018

tef
May 30, 2004

-> some l-system crap ->
almost nerd sniped into 'what every big rear end store ends up looking like' but, it's not every big rear end site but a lot of them

tef
May 30, 2004

-> some l-system crap ->
it isn't a log, it'll be a database,

like, with cassandra, mysql, dynamodb, kafka people end up building roughly the same thing

first: key-blob store

dump some stuff in a table, pull it back out as an object

then: secondary indexes

the blob is json, and we want a view by timestamp, too. people normally accept some delay with replication here

then: caching atop the store

we have a lot of reads for the newest version

all of this is partitioned by the way, somehow

then: other forms of indexing

so a full text index, for example, connected up to the writes from the store

or schema free indexing, as seen in documentdb

then: stored procedures

when a write happens, we want to do something, mostly once but we'll put some code in to handle exceptions

then: batch procedures

ok, so some of these things *need to happen overnight* some *need to happen every 30 seconds*, on all of the rows

then: warehousing, cold storage

fine, so, we need to aggregate stats, logs, historic data, so we write old versions of the data to other tables to make room for them

then: a write buffer

ok so incoming writes are hard now but ....

then: application logic/rpc

call a method on an object, something reads the state, does stuff, and tries to issue an update, etc



yes things will be connected up with logs, but in the end, you'll have a database as the primary store, if only for deduplication

so so much of this can be done with off the shelf parts, and honestly, using mysql or kafka is quite a big off the shelf part, but a large part of it is wiring it into your intrastructure, making machines pop back up, etc, that's the real shitshow

DaTroof
Nov 16, 2000

CC LIMERICK CONTEST GRAND CHAMPION
There once was a poster named Troof
Who was getting quite long in the toof
thank you tef for continuing to rant in yospos

sincerely, no yoscasm

tef
May 30, 2004

-> some l-system crap ->
i'm just mad all the time

FamDav
Mar 29, 2008

tef posted:

from the outset, no-one needs a distributed transaction log, and the overhead of setup and learning to use it outweighs 'make it work now'

when you do try to build one, you find 'one-size-fits-all' breaks down entirely, and usually the one machine install or debugging setup is completely poo poo

what mongo did was make the single machine usecase work for free, then wait for the phone to ring when people needed sharding,

if it's hard to get running in 5 minutes no-one will keep trying unless forced to

so as far as anecdotes go, the one I use at work takes about 15 minutes to get set up locally. alternatively, you can just use the in-memory test fixture for unit tests.

there's a cli for provisioning logs for production use. it provisions in about a second.

quote:

even so, building off the shelf distributed systems products is hard because it requries tight integration around the neighbouring component

a distributed transaction log will force clients to handle failures, emit heartbeats, timeouts, the client needs to be fit into the system in order to guarantee reliability

the first problem the log will end up having to handle is deduplication, so it will need to keep an index in sync with writes in order to remove them or update them, it's the same problem as updating a distributed transaction in place, somewhat

then you still need to go off and integrate into the cloud and handle provisioning and ha and all of that crud

the implication of what i wrote was that - much in the same way kubernetes is open source but - people will end up paying for hosted versions in most cases because so much other stateful software reduces down to it, and can be implemented in terms of it.

tef posted:

it isn't a log, it'll be a database,

like, with cassandra, mysql, dynamodb, kafka people end up building roughly the same thing

so I was vague due to phone posting, but by "right protocol" i was implying that the API of the distributed log made it straightforward to implement isolation in the log itself. that is

quote:

first: key-blob store

then: secondary indexes

then: caching atop the store

serializable multi-key transactions should be straightforward to propose to the log assuming you have a materialized view of the log for read purposes. and if you have a materialized view of the log anyways, that ends up helping for horizontal scale out of reads.

quote:

then: other forms of indexing

then: warehousing, cold storage

and really, these are a combination of "backup log and/or materialized views to an object store" with support for some kind of read through and "materialize log to some other datastore"

tef
May 30, 2004

-> some l-system crap ->
like i had an unfinished rant about "don't use a database as a queue, or the other way around"

again



so people have noticed "don't use a database as queue". it sucks for two big reasons

the first is that jobs are often ephemeral, or that a lot of jobs are active for five minutes then never read again

the second is that when people do put jobs in a database, they use transactions to manage them, and one long running job breaks everything

the idea is that someone will transactionally delete a database entry, and hold open the transaction as to have a form of error handling, retry

it ends up sucking for two reasons

first: it turns out that hundreds of workers trying to find the first unlocked record sucks, unless you use SKIP LOCK

second: that whole insert and delete thing works better on mysql, but really bad on postgres, which is append only under the covers, and so deletes or updates take longer to reclaim data

that and if anyone has a long running transaction it gets hosed


the thing is, "using a queue as a database" happens too

like, what they'll do is have 'worker' then 'queue1', then another worker, another queue, and final third worker, chained up

or, they'll have some retry mechanism based on re-inserting things into the queue

or they have some tracking of offsets or people throw the word tranactional around

what happens here is that you have some process with some states it can be in "step 1, step 2, step3", and the various workers involved in each step

instead of storing 'item a is at step1', it is in the step1 queue, and so forth

instead of having one database to track this, you now have as many as you have steps, and it only tracks the items that successfully came out of one worker and went into another, rarely tracking where things are when there not in a queue

the thing is, it can be made to work, just pay amazon a lot of money and make the devs watch a dashboard and try and maybe turn things on and off until it works and the clog is passed through the pipeline again

it isn't the queue, it's the persistence, it isn't the queue, it's that the queue is being used to represent state

so yeah

you end up building a scheduler and a process table to run stuff reliably rather than just a handful of pipes

FamDav
Mar 29, 2008
maybe it would help if i referred to it as a distributed ledger instead of a log? i realize by using the words log and transaction together i'm inviting comparison to kafka, which is the wrong comparison.

tef
May 30, 2004

-> some l-system crap ->

FamDav posted:

so as far as anecdotes go, the one I use at work takes about 15 minutes to get set up locally. alternatively, you can just use the in-memory test fixture for unit tests.

there's a cli for provisioning logs for production use. it provisions in about a second.

nice

quote:

the implication of what i wrote was that - much in the same way kubernetes is open source but - people will end up paying for hosted versions in most cases because so much other stateful software reduces down to it, and can be implemented in terms of it.

i hear you, but i just don't think that 'logs' are the best way to think about it, even if a lot of what a database does is log stuff

like log structured filesystems are great too, but the real problem is managing partitons / rebalancing

but that's also the same as multi key transactions

quote:

so I was vague due to phone posting, but by "right protocol" i was implying that the API of the distributed log made it straightforward to implement isolation in the log itself. that is

the thing is that's more than a log but everyone loves log

quote:

serializable multi-key transactions should be straightforward to propose to the log assuming you have a materialized view of the log for read purposes. and if you have a materialized view of the log anyways, that ends up helping for horizontal scale out of reads.

at this point i'm like 'yeah ok you get my point about it's a database'

quote:

and really, these are a combination of "backup log and/or materialized views to an object store" with support for some kind of read through and "materialize log to some other datastore"

yeah, but they end up in the wishlist


every time i look at a kafka log and i find uuids or keys in the messages i am deeply suspicious

Rhusitaurion
Sep 16, 2003

One never knows, do one?

FamDav posted:

the biggest issue is that distributing the software and even the software written to operate said software is at most 40% of delivering a reliable product in this space. with so few products delivering adequate resources and support, its pretty obvious how mongo was able to do as well it did.

imo we're still missing an open source distributed transaction log (as a a service) with a protocol that can encode sql semantics with serializable isolation at low 1000s tps and an order of magnitude or so more reads. everything else is just ux and routing.

Bloomberg's comdb2 https://github.com/bloomberg/comdb2 is more-or-less capable of this. I have no idea how bad it is to administer, since it was provided as a managed service internally, but it does what it says on the tin - serializable SQL transactions at 1k+ tps with a multi-datacenter cluster. They use it for drat near everything. As long as you don't try to use it for OLAP-type stuff (which everyone at BB did/does because it's all have lol), it works pretty good.

FamDav
Mar 29, 2008

tef posted:

nice


i hear you, but i just don't think that 'logs' are the best way to think about it, even if a lot of what a database does is log stuff

like log structured filesystems are great too, but the real problem is managing partitons / rebalancing

but that's also the same as multi key transactions


the thing is that's more than a log but everyone loves log


at this point i'm like 'yeah ok you get my point about it's a database'

so i think we're in agreement that its not a log in the kafka sense (oh god), but instead a ledger that supports multi-key serializable transactions (i emphasize serializable with enough sophistication to support index/range queries).

i also want to clarify that i expect most application developers to never actually interact with the ledger directly, but instead either use client libraries that implement some higher level logic or much more likely services implemented on top of it. a zookeeper API shim, a cluster manager, etc. etc. poo poo, i've seen someone replace mysql's storage engine (its not a useful implementation, though)

quote:

yeah, but they end up in the wishlist

the nice part of a ledger is that, so long as you can scale reads from it, you can apply it into pretty much whatever you want and however you want. want to put it in a db but don't want to tombstone anything? just process deletes as setting the deleted_at column. want to put every mutation as a distinct row in your data warehouse? you can do that.

the application of data into another system is arguably the most straightforward part of all this. read the stream, process the update, write to your state management table that you've read up to point X.

quote:

every time i look at a kafka log and i find uuids or keys in the messages i am deeply suspicious

that sounds depressing. also event sourcing most anywhere.

Rhusitaurion posted:

Bloomberg's comdb2 https://github.com/bloomberg/comdb2 is more-or-less capable of this. I have no idea how bad it is to administer, since it was provided as a managed service internally, but it does what it says on the tin - serializable SQL transactions at 1k+ tps with a multi-datacenter cluster. They use it for drat near everything. As long as you don't try to use it for OLAP-type stuff (which everyone at BB did/does because it's all have lol), it works pretty good.

see above on materializing different views. the dataplane is conflated with the data such that you can't just move the data into a form that is better suited for your use-case.

cinci zoo sniper
Mar 15, 2013




tef posted:

i'm just mad all the time

Mao Zedong Thot
Oct 16, 2008



The computer disease

cinci zoo sniper
Mar 15, 2013




it’s not that bad, i just with everything related to
computers was different at my work

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

tef posted:

i'm just mad all the time

if ranting in the pos helps with that then yes please more

but if we wind up exacerbating it then don’t, we’re not worth your mental health

Finster Dexter
Oct 20, 2014

Beyond is Finster's mad vision of Earth transformed.

tef posted:

every time i look at a kafka log and i find uuids or keys in the messages i am deeply suspicious

Sorry, excuse my ignorance as I'm a terrible programmer, but isn't uuids/keys going to be how you track your state for event sourcing? I thought kafka was for event sourcing...

Night Shade
Jan 13, 2013

Old School

eschaton posted:

if ranting in the pos helps with that then yes please more

but if we wind up exacerbating it then don’t, we’re not worth your mental health

i have been appreciating tefs rants on distributed systems for a while now even with about half of them resulting in me going "oh god that's us and now we i have to fix it before it breaks horribly".

also, yet another terrible medical software programmer checking in

e: vvv :rip:

Night Shade fucked around with this message at 06:15 on Jul 23, 2018

FamDav
Mar 29, 2008

Finster Dexter posted:

Sorry, excuse my ignorance as I'm a terrible programmer, but isn't uuids/keys going to be how you track your state for event sourcing? I thought kafka was for event sourcing...

event sourcing is the next nosql













its bad

Luigi Thirty
Apr 30, 2006

Emergency confection port.

hey i want to throw together a tool for sending commands to the transputer board. unfortunately the system it’s in runs windows 95 and I don’t actually really know anything about pre-.net windows pogromming

do I use MFC or what

I have visual studio 6 installed

cinci zoo sniper
Mar 15, 2013




normally this is the case where you curl up and succumb to the sweet embrace of the void

redleader
Aug 18, 2005

Engage according to operational parameters
tef posts make me feel like a huge fuckin idiot

Adbot
ADBOT LOVES YOU

Night Shade
Jan 13, 2013

Old School

redleader posted:

posts make me feel like a huge fuckin idiot

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply