|
my fav bit of the kafka architecture is whenever someone notices that oh yeah topic compaction is not enough to guarantee reliable long term storage (i.e. re-partitioning fucks with all the keys and therefore linear history of entries) so you need another canonical data source to act as a kind of backup, and so what you do is put a consumer that materializes the views in a DB. But that's nice because you can use the DB for some direct querying. Except for some stateful component doing stream analysis over historical data; every time that component restarts, you need to sync the whole state to build the thing afresh, but doing this from a DB is not super simple so you do it from Kafka, but since Kafka can't necessarily tell you it has all the data and the DB is the one that's canonically right, you end up building ad-hoc diffs between a DB and a live stream for every restart And there's like no good solution, you just cover your ears and hope you never make it to that point because you know you'll be hosed janitoring and reconciliating two data sources that don't necessarily have a good way to talk to each other aside from some small component/microservice written in a language only 1 person knew and they left 3 months ago
|
# ? Oct 29, 2018 14:16 |
|
|
# ? Apr 19, 2024 01:56 |
|
sounds like the solution is to design your db properly and then not use kafka
|
# ? Oct 29, 2018 14:19 |
|
scans the whole table sequentially on every instance boot this is what this db was meant for
|
# ? Oct 29, 2018 14:21 |
|
luckily we don't give a poo poo about our data soooooo like if any telemetry packets from devices get lost we don't care - the device resends often. if any commands from servers get lost we don't care - the controller will still see that the devices isn't in the desired state and keep resending until its telemetry changes
|
# ? Oct 29, 2018 14:27 |
|
MononcQc posted:And there's like no good solution, you just cover your ears and hope you never make it to that point because you know you'll be hosed janitoring and reconciliating two data sources that don't necessarily have a good way to talk to each other aside from some small component/microservice written in a language only 1 person knew and they left 3 months ago as one of the last scala devs here, driven out by the golang crowd but still writing critical architecture,
|
# ? Oct 29, 2018 14:30 |
|
hell if I don't know the feeling there
|
# ? Oct 29, 2018 14:31 |
|
terrible programming thread: Powerful Two-Hander posted:so yeah, that was a waste of my life.
|
# ? Oct 29, 2018 14:53 |
|
Powerful Two-Hander posted:so yeah, that was a waste of my life. ha, there should be chick tracts but for programmers
|
# ? Oct 29, 2018 14:58 |
|
cinci zoo sniper posted:he smugly proclaimed that he will take care of document storage question and we will be given a special sql interface to work with xml documents without storing them in rdbms I think postgres' XML data type does exactly this. But then you have to store your xml inside an icky rdbms
|
# ? Oct 29, 2018 15:31 |
|
prisoner of waffles posted:ha, there should be chick tracts but for programmers so which one would be analogous to the one where the guy rapes his daughter but everyone sweeps it under the rug because he learns to love jesus? i'm guessing it would have to do with javascript.
|
# ? Oct 29, 2018 15:43 |
|
CRIP EATIN BREAD posted:so which one would be analogous to the one where the guy rapes his daughter but everyone sweeps it under the rug because he learns to love jesus? No that's the last dev on a critical legacy project
|
# ? Oct 29, 2018 15:45 |
|
bus factor: kiddly diddler
|
# ? Oct 29, 2018 15:47 |
|
prisoner of waffles posted:ha, there should be chick tracts but for programmers there's a hans reiser joke here I'm sure of it...
|
# ? Oct 29, 2018 15:57 |
Finster Dexter posted:I think postgres' XML data type does exactly this. But then you have to store your xml inside an icky rdbms i mean that’s every rdbms these days that offers xml functionality, the matter of question is if xml exists at all in sql environment, which is what that guy alluded to, that it doesn’t have to
|
|
# ? Oct 29, 2018 16:00 |
|
Powerful Two-Hander posted:there's a hans reiser joke here I'm sure of it... found the art:
|
# ? Oct 29, 2018 16:00 |
|
MononcQc posted:my fav bit of the kafka architecture is whenever someone notices that oh yeah topic compaction is not enough to guarantee reliable long term storage (i.e. re-partitioning fucks with all the keys and therefore linear history of entries) so you need another canonical data source to act as a kind of backup, and so what you do is put a consumer that materializes the views in a DB. counterpoint: just don't do this. use kafka for streaming. there are a few use cases where you'd want to store data in a a compacted topic and imo its only for things that are directly related to supporting your kafka cluster (like schemas, if you're using avro). also the idea of repartitioning a compacted topic sounds like another nightmare that i simply would never do. i mean whats the real argument for having a large number of partitions (or increasing the number of partitions) for compacted data anyways? just have a few (<=5) and replicate it a few times.
|
# ? Oct 29, 2018 17:07 |
|
i dont know what kafka and avro are and im pretty sure i am not really missing out on anything as a result
|
# ? Oct 29, 2018 17:46 |
|
avro is a cool serialization format that is binary and has a schema, and allows forward and backwards compatibility. it's use-case is generally limited to situations where you control both ends, but it works great. kafka is just plain cool but basically its just a stream processing platform that owns.
|
# ? Oct 29, 2018 17:49 |
|
how is that different from protobufs
|
# ? Oct 29, 2018 18:06 |
|
what kind of streams is it good for processing
|
# ? Oct 29, 2018 18:07 |
|
big streams
|
# ? Oct 29, 2018 18:12 |
|
Bloody posted:how is that different from protobufs Avro is much more json-oriented (but a binary version) whereas pb is much more byte-oriented.
|
# ? Oct 29, 2018 18:15 |
|
simble posted:counterpoint: just don't do this. use kafka for streaming. there are a few use cases where you'd want to store data in a a compacted topic and imo its only for things that are directly related to supporting your kafka cluster (like schemas, if you're using avro). yeah basically that's the fine usage: you don't repartition, and you don't treat kafka as a canonical data store of any kind. You treat it as a kind of special queue that gets a couple of weeks of persistence for multiple readers and then you're good. It's just that sooner or later if you read distsys literature you'll see someone saying you could use kafka for atomic broadcast which means data replication central but that is only true as long as you never have to repartition anything ever. Repartions basically require you to manually do stop all writing, read all entries in existing partitions, and republish each value in the new partitions with higher "stamps" before syncing all clients to start publishing in the new partition as well; my understanding is that there is no standard tooling around it and the general advice seems to be "oh yeah you shouldn't have built your cluster that way" So the best path forwards with Kafka is to not treat it as a thing where you care enough about its data in the long term to land you in that situation.
|
# ? Oct 29, 2018 19:20 |
|
can you not backup to s3 and does kafka not have read-through from s3
|
# ? Oct 29, 2018 19:24 |
|
FamDav posted:can you not backup to s3 and does kafka not have read-through from s3 if you use insertion order to define a "happens-before" relationship with data points, as is often recommended for atomic broadcast implementations, then changing the partitions means you change the time relationships between each "key" in an overall stream: if you read all of partition 1 before partition 2, or if you read most of 1 before 2, and the canonical value for "key" is in 1, then you may crush newer state of "key" with older state in your materialized view. If you're using timestamps you already had no clearly defined "happens-before" relationship so who cares (for high-frequency events that is)
|
# ? Oct 29, 2018 19:35 |
|
MononcQc posted:Repartions basically require you to manually do stop all writing, read all entries in existing partitions, and republish each value in the new partitions with higher "stamps" before syncing all clients to start publishing in the new partition as well; my understanding is that there is no standard tooling around it and the general advice seems to be "oh yeah you shouldn't have built your cluster that way" if I was ever put into this situation, I would likely create a new topic with a new partition count, and write a simple consumer/producer (or use kafka connect) to shovel the messages into the new topic. this way order would be preserved. then when they're ready, consumers and producers can switch to the new topic and with monitoring, you can tell when the old topic is effectively not used and then eol/delete it. the reality is that the only time you should need to repartition data in kafka is if you need to increase parallelism for a particular consumer group. it could happen, for sure, but it should be a relatively rare event.
|
# ? Oct 29, 2018 20:17 |
|
Powerful Two-Hander posted:so yeah, that was a waste of my life mods new thread title please
|
# ? Oct 29, 2018 21:15 |
|
simble posted:if I was ever put into this situation, I would likely create a new topic with a new partition count, and write a simple consumer/producer (or use kafka connect) to shovel the messages into the new topic. this way order would be preserved. Right. That's the reasonable way to do it. You do need some coordination around the transfer as well, it's just kind of funny to imagine it being a thing everyone has to reinvent every time they gotta scale up. Incidentally you should probably have to scale up way less often if you don't have the expectation that kafka acts as you persistent data store; you can just lose older data and not bother with scaling up as long as the throughput is there. However, if you treat it as a persistent store, you may have to scale according to storage space and/or throughput. So your data cardinality + the storage may impact it. Really, not assuming it stores your data forever is making yourself a favor operationally speaking.
|
# ? Oct 29, 2018 21:22 |
|
you can do a lot of things with Kafka but only a few of them are a good idea. I blame confluent pushing it for every use case possible
|
# ? Oct 29, 2018 21:29 |
|
Kevin Mitnick P.E. posted:you can do a lot of things with Kafka but only a few of them are a good idea. I blame confluent pushing it for every use case possible i wholeheartedly agree with both of these points. if i hear my boss say ksql one more time....
|
# ? Oct 29, 2018 21:36 |
|
Terrible programming you say? Here's a static analysis of a Java application that my team is inheriting from India.
|
# ? Oct 29, 2018 21:46 |
|
|
# ? Oct 29, 2018 21:50 |
|
ratbert90 posted:Terrible programming you say? Lmao what analysis is that? Does it work with scala? Not that we'd be anywhere near that horror show Also sorry for your loss
|
# ? Oct 29, 2018 22:26 |
|
ratbert90 posted:Terrible programming you say? weird, I JUST (like 15 minutes ago) finished setting up a new project here at work to push results and block builds through Sonar and... gonadic io posted:Lmao what analysis is that? Does it work with scala? Not that we'd be anywhere near that horror show its sonar, and yes it does
|
# ? Oct 29, 2018 22:27 |
|
gonadic io posted:Lmao what analysis is that? Does it work with scala? Not that we'd be anywhere near that horror show SonarQube, and oh yes it does. Luckily most of the 1k of vulnerabilities is "Use a goddamned logger" However, there are about 10 plaintext passwords in the java files. Edit* I am firm in the camp of "Let's re-architect and rewrite from scratch." Management didn't want to hear it though (My boss is with me though.) Now we have some solid actual analytics as to "this poo poo sucks and here is why."
|
# ? Oct 29, 2018 22:32 |
|
Props to that one dev that wrote 15 unit tests even though it was obvious that noone else gave a poo poo.
|
# ? Oct 29, 2018 23:22 |
|
Jabor posted:Props to that one dev that wrote 15 unit tests even though it was obvious that noone else gave a poo poo. And they all pass!
|
# ? Oct 29, 2018 23:25 |
|
ratbert90 posted:I am firm in the camp of "Let's re-architect and rewrite from scratch." Management didn't want to hear it though (My boss is with me though.)
|
# ? Oct 29, 2018 23:38 |
|
toiletbrush posted:Did management at least have the common courtesy to emptily promise that they'd give you the time and resources to refactor away the tech debt? Fun fact about my current boss: He not only promised that on the current project that we would not only get time to re-architect, but he delivered on that promise. We had 6 full months to rewrite and re-architect an (admittedly far better) application, and in the end, it's the nicest software with full test coverage and end to end tests that scales well. He will fight tooth and nail for us, and our team has earned a rep of being fixers now. This is why India was so incredibly scared of us taking over even a tiny bit of their code, because they knew what I am doing would happen. They had been ignoring my requests for Jenkins access for months, and then they got a new DevOps guy in America. A bottle of scotch and 1 day later, I had the code, how they built, and how they deployed, and I am now starting to ask questions that they don't want to ask. Questions such as: Why do we have a 92Mb sql file in an application you swear is microservices based? FlapYoJacks fucked around with this message at 23:43 on Oct 29, 2018 |
# ? Oct 29, 2018 23:41 |
|
|
# ? Apr 19, 2024 01:56 |
|
ratbert90 posted:Fun fact about my current boss: meanwhile the company you left is still millions of dollars in the hole right? happy endings are so heartwarming
|
# ? Oct 29, 2018 23:43 |