Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
MononcQc
May 29, 2007

my fav bit of the kafka architecture is whenever someone notices that oh yeah topic compaction is not enough to guarantee reliable long term storage (i.e. re-partitioning fucks with all the keys and therefore linear history of entries) so you need another canonical data source to act as a kind of backup, and so what you do is put a consumer that materializes the views in a DB.

But that's nice because you can use the DB for some direct querying. Except for some stateful component doing stream analysis over historical data; every time that component restarts, you need to sync the whole state to build the thing afresh, but doing this from a DB is not super simple so you do it from Kafka, but since Kafka can't necessarily tell you it has all the data and the DB is the one that's canonically right, you end up building ad-hoc diffs between a DB and a live stream for every restart

And there's like no good solution, you just cover your ears and hope you never make it to that point because you know you'll be hosed janitoring and reconciliating two data sources that don't necessarily have a good way to talk to each other aside from some small component/microservice written in a language only 1 person knew and they left 3 months ago

Adbot
ADBOT LOVES YOU

Shaggar
Apr 26, 2006
sounds like the solution is to design your db properly and then not use kafka

MononcQc
May 29, 2007

scans the whole table sequentially on every instance boot

this is what this db was meant for

gonadic io
Feb 16, 2011

>>=
luckily we don't give a poo poo about our data soooooo

like if any telemetry packets from devices get lost we don't care - the device resends often. if any commands from servers get lost we don't care - the controller will still see that the devices isn't in the desired state and keep resending until its telemetry changes

gonadic io
Feb 16, 2011

>>=

MononcQc posted:

And there's like no good solution, you just cover your ears and hope you never make it to that point because you know you'll be hosed janitoring and reconciliating two data sources that don't necessarily have a good way to talk to each other aside from some small component/microservice written in a language only 1 person knew and they left 3 months ago

as one of the last scala devs here, driven out by the golang crowd but still writing critical architecture,

MononcQc
May 29, 2007

hell if I don't know the feeling there

AggressivelyStupid
Jan 9, 2012

terrible programming thread:

Powerful Two-Hander posted:

so yeah, that was a waste of my life.

prisoner of waffles
May 8, 2007

Ah! well a-day! what evil looks
Had I from old and young!
Instead of the cross, the fishmech
About my neck was hung.

Powerful Two-Hander posted:

so yeah, that was a waste of my life.

ha, there should be chick tracts but for programmers

Finster Dexter
Oct 20, 2014

Beyond is Finster's mad vision of Earth transformed.

cinci zoo sniper posted:

he smugly proclaimed that he will take care of document storage question and we will be given a special sql interface to work with xml documents without storing them in rdbms

I think postgres' XML data type does exactly this. But then you have to store your xml inside an icky rdbms

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

prisoner of waffles posted:

ha, there should be chick tracts but for programmers

so which one would be analogous to the one where the guy rapes his daughter but everyone sweeps it under the rug because he learns to love jesus?

i'm guessing it would have to do with javascript.

gonadic io
Feb 16, 2011

>>=

CRIP EATIN BREAD posted:

so which one would be analogous to the one where the guy rapes his daughter but everyone sweeps it under the rug because he learns to love jesus?

i'm guessing it would have to do with javascript.

No that's the last dev on a critical legacy project

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
bus factor: kiddly diddler

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


prisoner of waffles posted:

ha, there should be chick tracts but for programmers

there's a hans reiser joke here I'm sure of it...

cinci zoo sniper
Mar 15, 2013




Finster Dexter posted:

I think postgres' XML data type does exactly this. But then you have to store your xml inside an icky rdbms

i mean that’s every rdbms these days that offers xml functionality, the matter of question is if xml exists at all in sql environment, which is what that guy alluded to, that it doesn’t have to

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

Powerful Two-Hander posted:

there's a hans reiser joke here I'm sure of it...

found the art:

simble
May 11, 2004

MononcQc posted:

my fav bit of the kafka architecture is whenever someone notices that oh yeah topic compaction is not enough to guarantee reliable long term storage (i.e. re-partitioning fucks with all the keys and therefore linear history of entries) so you need another canonical data source to act as a kind of backup, and so what you do is put a consumer that materializes the views in a DB.

But that's nice because you can use the DB for some direct querying. Except for some stateful component doing stream analysis over historical data; every time that component restarts, you need to sync the whole state to build the thing afresh, but doing this from a DB is not super simple so you do it from Kafka, but since Kafka can't necessarily tell you it has all the data and the DB is the one that's canonically right, you end up building ad-hoc diffs between a DB and a live stream for every restart

And there's like no good solution, you just cover your ears and hope you never make it to that point because you know you'll be hosed janitoring and reconciliating two data sources that don't necessarily have a good way to talk to each other aside from some small component/microservice written in a language only 1 person knew and they left 3 months ago

counterpoint: just don't do this. use kafka for streaming. there are a few use cases where you'd want to store data in a a compacted topic and imo its only for things that are directly related to supporting your kafka cluster (like schemas, if you're using avro).

also the idea of repartitioning a compacted topic sounds like another nightmare that i simply would never do. i mean whats the real argument for having a large number of partitions (or increasing the number of partitions) for compacted data anyways? just have a few (<=5) and replicate it a few times.

Bloody
Mar 3, 2013

i dont know what kafka and avro are and im pretty sure i am not really missing out on anything as a result

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
avro is a cool serialization format that is binary and has a schema, and allows forward and backwards compatibility.

it's use-case is generally limited to situations where you control both ends, but it works great.

kafka is just plain cool but basically its just a stream processing platform that owns.

Bloody
Mar 3, 2013

how is that different from protobufs

Bloody
Mar 3, 2013

what kind of streams is it good for processing

mystes
May 31, 2006

big streams

gonadic io
Feb 16, 2011

>>=

Bloody posted:

how is that different from protobufs

Avro is much more json-oriented (but a binary version) whereas pb is much more byte-oriented.

MononcQc
May 29, 2007

simble posted:

counterpoint: just don't do this. use kafka for streaming. there are a few use cases where you'd want to store data in a a compacted topic and imo its only for things that are directly related to supporting your kafka cluster (like schemas, if you're using avro).

also the idea of repartitioning a compacted topic sounds like another nightmare that i simply would never do. i mean whats the real argument for having a large number of partitions (or increasing the number of partitions) for compacted data anyways? just have a few (<=5) and replicate it a few times.

yeah basically that's the fine usage: you don't repartition, and you don't treat kafka as a canonical data store of any kind. You treat it as a kind of special queue that gets a couple of weeks of persistence for multiple readers and then you're good. It's just that sooner or later if you read distsys literature you'll see someone saying you could use kafka for atomic broadcast which means data replication central but that is only true as long as you never have to repartition anything ever.

Repartions basically require you to manually do stop all writing, read all entries in existing partitions, and republish each value in the new partitions with higher "stamps" before syncing all clients to start publishing in the new partition as well; my understanding is that there is no standard tooling around it and the general advice seems to be "oh yeah you shouldn't have built your cluster that way"

So the best path forwards with Kafka is to not treat it as a thing where you care enough about its data in the long term to land you in that situation.

FamDav
Mar 29, 2008
can you not backup to s3 and does kafka not have read-through from s3

MononcQc
May 29, 2007

FamDav posted:

can you not backup to s3 and does kafka not have read-through from s3

if you use insertion order to define a "happens-before" relationship with data points, as is often recommended for atomic broadcast implementations, then changing the partitions means you change the time relationships between each "key" in an overall stream: if you read all of partition 1 before partition 2, or if you read most of 1 before 2, and the canonical value for "key" is in 1, then you may crush newer state of "key" with older state in your materialized view.

If you're using timestamps you already had no clearly defined "happens-before" relationship so who cares (for high-frequency events that is)

simble
May 11, 2004

MononcQc posted:

Repartions basically require you to manually do stop all writing, read all entries in existing partitions, and republish each value in the new partitions with higher "stamps" before syncing all clients to start publishing in the new partition as well; my understanding is that there is no standard tooling around it and the general advice seems to be "oh yeah you shouldn't have built your cluster that way"

So the best path forwards with Kafka is to not treat it as a thing where you care enough about its data in the long term to land you in that situation.

if I was ever put into this situation, I would likely create a new topic with a new partition count, and write a simple consumer/producer (or use kafka connect) to shovel the messages into the new topic. this way order would be preserved.

then when they're ready, consumers and producers can switch to the new topic and with monitoring, you can tell when the old topic is effectively not used and then eol/delete it.

the reality is that the only time you should need to repartition data in kafka is if you need to increase parallelism for a particular consumer group. it could happen, for sure, but it should be a relatively rare event.

redleader
Aug 18, 2005

Engage according to operational parameters

Powerful Two-Hander posted:

so yeah, that was a waste of my life

mods new thread title please

MononcQc
May 29, 2007

simble posted:

if I was ever put into this situation, I would likely create a new topic with a new partition count, and write a simple consumer/producer (or use kafka connect) to shovel the messages into the new topic. this way order would be preserved.

then when they're ready, consumers and producers can switch to the new topic and with monitoring, you can tell when the old topic is effectively not used and then eol/delete it.

the reality is that the only time you should need to repartition data in kafka is if you need to increase parallelism for a particular consumer group. it could happen, for sure, but it should be a relatively rare event.

Right. That's the reasonable way to do it. You do need some coordination around the transfer as well, it's just kind of funny to imagine it being a thing everyone has to reinvent every time they gotta scale up.

Incidentally you should probably have to scale up way less often if you don't have the expectation that kafka acts as you persistent data store; you can just lose older data and not bother with scaling up as long as the throughput is there. However, if you treat it as a persistent store, you may have to scale according to storage space and/or throughput. So your data cardinality + the storage may impact it.

Really, not assuming it stores your data forever is making yourself a favor operationally speaking.

Nomnom Cookie
Aug 30, 2009



you can do a lot of things with Kafka but only a few of them are a good idea. I blame confluent pushing it for every use case possible

simble
May 11, 2004

Kevin Mitnick P.E. posted:

you can do a lot of things with Kafka but only a few of them are a good idea. I blame confluent pushing it for every use case possible

i wholeheartedly agree with both of these points. if i hear my boss say ksql one more time....

FlapYoJacks
Feb 12, 2009
Terrible programming you say?

Here's a static analysis of a Java application that my team is inheriting from India. :v:

simble
May 11, 2004

:sever:

gonadic io
Feb 16, 2011

>>=

ratbert90 posted:

Terrible programming you say?

Here's a static analysis of a Java application that my team is inheriting from India. :v:



Lmao what analysis is that? Does it work with scala? Not that we'd be anywhere near that horror show

Also sorry for your loss

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

ratbert90 posted:

Terrible programming you say?

Here's a static analysis of a Java application that my team is inheriting from India. :v:



weird, I JUST (like 15 minutes ago) finished setting up a new project here at work to push results and block builds through Sonar and...



gonadic io posted:

Lmao what analysis is that? Does it work with scala? Not that we'd be anywhere near that horror show

Also sorry for your loss

its sonar, and yes it does

FlapYoJacks
Feb 12, 2009

gonadic io posted:

Lmao what analysis is that? Does it work with scala? Not that we'd be anywhere near that horror show

Also sorry for your loss

SonarQube, and oh yes it does.

Luckily most of the 1k of vulnerabilities is "Use a goddamned logger"

However, there are about 10 plaintext passwords in the java files. :v:

Edit*

I am firm in the camp of "Let's re-architect and rewrite from scratch." Management didn't want to hear it though (My boss is with me though.)
Now we have some solid actual analytics as to "this poo poo sucks and here is why."

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Props to that one dev that wrote 15 unit tests even though it was obvious that noone else gave a poo poo.

FlapYoJacks
Feb 12, 2009

Jabor posted:

Props to that one dev that wrote 15 unit tests even though it was obvious that noone else gave a poo poo.

And they all pass! :v:

toiletbrush
May 17, 2010

ratbert90 posted:

I am firm in the camp of "Let's re-architect and rewrite from scratch." Management didn't want to hear it though (My boss is with me though.)
Now we have some solid actual analytics as to "this poo poo sucks and here is why."
Did management at least have the common courtesy to emptily promise that they'd give you the time and resources to refactor away the tech debt?

FlapYoJacks
Feb 12, 2009

toiletbrush posted:

Did management at least have the common courtesy to emptily promise that they'd give you the time and resources to refactor away the tech debt?

Fun fact about my current boss:
He not only promised that on the current project that we would not only get time to re-architect, but he delivered on that promise.

We had 6 full months to rewrite and re-architect an (admittedly far better) application, and in the end, it's the nicest software with full test coverage and end to end tests that scales well.

He will fight tooth and nail for us, and our team has earned a rep of being fixers now. :v: This is why India was so incredibly scared of us taking over even a tiny bit of their code, because they knew what I am doing would happen. They had been ignoring my requests for Jenkins access for months, and then they got a new DevOps guy in America. A bottle of scotch and 1 day later, I had the code, how they built, and how they deployed, and I am now starting to ask questions that they don't want to ask.

Questions such as:
Why do we have a 92Mb sql file in an application you swear is microservices based? :v:

FlapYoJacks fucked around with this message at 23:43 on Oct 29, 2018

Adbot
ADBOT LOVES YOU

Beamed
Nov 26, 2010

Then you have a responsibility that no man has ever faced. You have your fear which could become reality, and you have Godzilla, which is reality.


ratbert90 posted:

Fun fact about my current boss:
He not only promised that on the current project that we would not only get time to re-architect, but he delivered on that promise.

We had 6 full months to rewrite and re-architect an (admittedly far better) application, and in the end it's the nicest software with full test coverage and end to end tests that scales well.

He will fight tooth and nail for us, and our team has earned a rep of being fixers now. :v:

meanwhile the company you left is still millions of dollars in the hole right?

happy endings are so heartwarming

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply