awful programming: feature flags and suffering

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > awful programming: feature flags and suffering

«‹›9 »

MononcQc: May 29, 2007

https://twitter.com/anne_biene/status/1020410128510660608

The build tool I maintain was a highlighted negative for a programming class

# ¿ Jul 21, 2018 04:03

Adbot: ADBOT LOVES YOU

# ¿ Apr 26, 2024 01:45

MononcQc: May 29, 2007

denormalisation bites again!

# ¿ Jul 21, 2018 16:03

MononcQc: May 29, 2007

Some storage we'll use on a high-availability product will be backed by just regular human readable files because it will be more important for an operator to be able to edit things by hand or crush it with an external file if all poo poo goes to hell than any level of performance that would be gained otherwise :toot:

# ¿ Jul 22, 2018 23:40

MononcQc: May 29, 2007

the only sustainable way I found to blog was to just post a thing there when it's either:

a) a talk transcript
b) a rant I keep repeating over and over until I'm tired and then I just link to the thing

b) is worth it because if you're a talkative person with a lot of things to rant about, it ends up saving time. Everything else ended up being kind of meh, and either felt like chasing clicks (I got rid of all tracking to fix that), or dumping half-finished tutorials.

# ¿ Jul 24, 2018 02:17

MononcQc: May 29, 2007

I think most functional language that have pattern matches and guard sequences for them support that feature.

# ¿ Jul 27, 2018 22:09

MononcQc: May 29, 2007

NihilCredo posted:

here's a real life scenario: products have VAT rates. those VAT rates can occasionally change over time, meaning both that a product can be moved to a different VAT rate, or that a VAT rate can be increased or decreased by a percentage point or two (let's keep it simple and say it's one VAT rate per product (it's not)).

is there a unit testing book that will teach me how to structure an invoicing system in such a way that I can test that it won't apply the wrong VAT rate when an invoice is created, edited, modified, or duplicated, even if the VAT rate for a product changes at some point?

Can you define what you mean by "wrong rate", like basically the VAT should be denormalized and stored at the time of the invoice rather than mutating with the current rate the entire way through?

# ¿ Aug 14, 2018 01:29

MononcQc: May 29, 2007

I mean there's many ways you can test them, but it comes closer to "integration" since if the VAT is 'take current snapshot then mutate but the invoice is unchanged' arguably may depend on a component having to control current VAT (or a per-date historic table), and one handling the invoices themselves. So to check the interaction (or lack thereof) between both components, I'd imagine this stuff would not initially fit well under unit tests.

But if you wanted to unit test it, you gotta isolate everything to ignore the filesystem, network, and other components, you could use a mock; first call has a reported current VAT at say 10%, second call 11%, 12%, and so on. Make sure that the value you have on the invoice is at a given % value; call the mock to read/change the value so it should be higher. Check on the invoice again and ensure the invoice was unchanged. That shows the isolation between current rate and rate-at-invoice-time. Anyway that would be the simple way to go without caring too much about whether it's clean or not.

But yeah, having mostly finished writing a testing book, a big problem is that interesting tests require interesting codebases to demonstrate stuff, and having to introduce an interesting system just to test it after the fact is a hell of a lot of work (both for reader and author), and it's hard to make it interesting to the reader. You've got to dump a bunch of code going "this is not actually relevant to the lesson in this chapter but you have to go through it anyway, sorry!"

# ¿ Aug 14, 2018 01:54

MononcQc: May 29, 2007

This is where purity lessons from Haskell are interesting to borrow from. If code handles time, as much as possible make that pure. Whatever date or value to use, someone hands it to you.

If you have a file from which you read content, read the contents externally and only pass in the bytes/string/stream to whatever consumes it.

This makes it really easy to then test these components by passing them all the edge cases without needing to control the environment strictly, though you'll need some kind of higher-level view that deals and copes with all the side-effect injection.

It's not always realistic and working in all places, but there's often an interesting way to go about these things if you want to ease testing and analysis.

# ¿ Aug 14, 2018 15:03

MononcQc: May 29, 2007

question for low level devs: do you ever have to struggle picking the size of integers you'll use? Do you go like "hm will I ever need to multiply these numbers, better go for a bigger unit" or something or just go "gently caress it, we'll readjust if we need to later, let's just care about overflow behavior".

I realized that I have not frequently needed to make these decisions at a conscious level and I have no good heuristic.

# ¿ Aug 21, 2018 00:58

MononcQc: May 29, 2007

JawnV6 posted:

do you have a specific use case? aside from high school dimensional analysis or hw-constraints, there's not much too it that'll apply much higher

Nah not really. I mean I have been programming in languages with bignums for most of my career, and I mostly just winged it when I handled binary protocols. So you end up going with sizes based on how many types you want to represent, and for types something like "this is a 32 bit unsigned integer as a size tag, if you need more you just gotta use this additional continuation flag" and realized I had no good mechanism to choose aside from trying to guess.

# ¿ Aug 22, 2018 15:19

MononcQc: May 29, 2007

the entire idiocy of go's error handling is not that you need to check the return values here and there; monad and maybe types have you do that in Haskell or Rust all the time, so do Erlang and Elixir's tuples.

The stupid part of it is that go will not be mad if you don't check them and will happily keep on trucking into undefined program behaviour and do whatever it can without bailing out. Then the goroutine may vanish, but it will do so silently and undetectably for the most part, so you're stuck with a bunch of partly-failed and leaked goroutines with possibly busted memory, but from the outside, the little binary looks like it's doing fantastic

# ¿ Aug 24, 2018 20:02

MononcQc: May 29, 2007

MALE SHOEGAZE posted:

I agree, the problem is not with the semantics of go's error handling. returning errors as values is good, in my experience.

go just does nothing to make dealing with errors-as-values nice, so it sucks.

Returning errors as value is good, but you have to have a good uniform way to unambiguously be able to get contextual information from the origin of the error as a more remote observer. A lot of "return errors as values" does a very good job at efficiently isolating and letting you handle a bad condition locally, but do a terrible job at carrying the right context when local action is hard to take.

Imagine a routine handling 15,000 different files and that returns a thing with 'Error(enoent)' or 'Error(eperm)' rather than actual good context. Now you have no great way to gain insights about what went wrong unless you go and reconstruct the context around the failure by hand and investigation. You have to figure out the environment at the time, the possible areas of code that could have carried that information, and so on.

This brings you to the same step as Prolog returning 'false' without having an explanation as to why, something people have hated about Prolog for decades.

Invariably, better error handling has the programmer manually assemble context and carry it around, and either you need to know ahead of time what callers outside of your realm will need in terms of information, or you end up grabbing most of what you can and you end up with manually assembling stacktraces together, and now you're not that much better than if you had checked exceptions but with a more terse construct in most cases, except that you need the entire community to agree on the format that makes sense to track.

Go can't even do the latter because lol generics.

# ¿ Aug 24, 2018 20:22

MononcQc: May 29, 2007

prisoner of waffles posted:

uh where would https://dave.cheney.net/2016/06/12/stack-traces-and-the-errors-package fit (or not fit) into your picture of the stacktraceless error value?

like, in one's own code you can auto-capture stack traces, but other people's libraries might not have stack traces or might be recording them in an incompatible format?
code:
interface HasStackTrace { ... }

The problems there are social:

the error package is optional
the error package is not guaranteed to be unique
the error package must be used by everyone ubiquitously
the error package may not be called and there are no warnings or no constructs about it
it ends up relying on "jam everything you can into a string" because even what you wrap cannot unambiguously be typed in a portable manner

Rust for example solves a bunch of these by having idiomatic, language-provided mechanisms for error type composition, macros around them to make them less cumbersome (try!) and so on.

I think it's still a bit painful when it comes to annotating the exceptions (I haven't used Rust enough) since there seems to be an ongoing discussion about this and all kinds of crates that promise to make it easier. Whereas a language where you have exceptions just handles it all for you. The try! macro is the closest thing I can think of to abstracting away the "bail out until you are in a context that knows what to do with exceptional conditions" without explicitly doing it, but requires you to annotate everything that requires it for the compiler not to yell.

I don't have enough experience with it, but it really feels like a kind of way to deal with what would otherwise be checked exceptions. Like "I know this may fail, this is not the level of abstraction at which I care."

prisoner of waffles posted:

tbqh it sounds like rust error handling is Strictly Better

Go has, unambiguously, the worst error handling in a programming language since C I could think of (I don't know mumps well enough).

# ¿ Aug 24, 2018 21:28

MononcQc: May 29, 2007

DONT THREAD ON ME posted:

and the arguments against namespaces largely amount to "extra typing and things to remember and prefixes are basically the same thing," but prefixes are not the same because i can't own a prefix, but i can own a namespace.

There's an interesting example from the Erlang community where under "all deps are fetched from github" there were about 5 different postgres drivers, some of which had up to 20 forks at various stages of development.

Once the community started adopting hex as a package manager (which has no namespaces), people started collaborating to get patches upstream to the packages so that they could use the one easy name rather than a bunch of aliases and git repos.

So the lack of namespaces had a kind of indirect social effect forcing people to collaborate rather than just fix whatever they needed that week.

# ¿ Aug 30, 2018 02:25

MononcQc: May 29, 2007

DONT THREAD ON ME posted:

yeah, i get that argument and I do see it in effect in the community. but i'd also point out that the erlang community is much smaller than rust community (currently) intends to be, and i think that makes a difference.

in other news, i think i'm thinking about doing a roguelike.

yeah I think namespaces are unavoidable, but it�s not a bad idea to bootstrap a community without them.

# ¿ Aug 30, 2018 04:39

MononcQc: May 29, 2007

I am a user who always used hg but has been forced to use git out of tiredness of assholes asking to migrate all the things to git and github. It ended up being easier to to just switch than use software I like and think is good because of network effects and it is loving bad

also the command I can never use without doc is any of 'find' or 'tar' (if tar -xvf <tar.gz file> won't cut it)

E: I use find * | grep <pattern I am looking for>

# ¿ Sep 5, 2018 20:32

MononcQc: May 29, 2007

I've actually used git bisect a few times and it helps to generally have commits that work and build so you can properly test things.

Aside from that welp

# ¿ Sep 7, 2018 03:15

MononcQc: May 29, 2007

I like mercurial's approach of tracking what is a public and a private repository (or branches) and only allowing rewrite on stuff that is not marked as public (in order to avoid breaking the workflow of others)

# ¿ Sep 7, 2018 16:57

MononcQc: May 29, 2007

I know literally nothing about graphics programming and all of that poo poo amazes me.

# ¿ Sep 10, 2018 18:32

MononcQc: May 29, 2007

DONT THREAD ON ME posted:

so, i need some life balance, but i also want to do something that i'm really excited about.

tl;dr i dont loving know but i'm really liking the games stuff i'm doing atm, i can't imagine anything more rewarding than participating in world building.

You're making the assumption that you will have excitement in a job working directly in poo poo you like, but part of why people are able to have fun doing their side projects is because they do them on their own terms, at their own pace, in the order they like. The big risk is that you'll take something you enjoy, and kill it by handing over all of your excitement to someone else who is mostly in charge of the schedule, who picks who works with you, chooses what are the priorities, and only asks your advice in passing to know when you're going to be done to introduce something else.

Be careful about too easily dismissing the context and environment that lets you enjoy the things you do in your free time. They're really important variables that have a huge impact in what you can find fun or not.

# ¿ Sep 11, 2018 18:08

MononcQc: May 29, 2007

I found more job satisfaction switching to a place that is a bit less challenging, gives fewer boasting factors, but also doesn't ask me to be on call, do overtime, or check emails outside of working hours so that my free time is truly mine. I've been buying back a level of energy I didn't have in prior jobs just because it would take so much head-space and caring about it super hard made it even worse.

# ¿ Sep 11, 2018 18:11

MononcQc: May 29, 2007

my car/bicycle database setup

# ¿ Sep 13, 2018 11:54

MononcQc: May 29, 2007

I use leader if a server elected (as in quorum systems), and master if it is a title bestowed upon the server by an operator. The leader or master is backed by followers, which may replace it some day. "replicas" for the followers works as well, but carries a meaning of "never gonna be made leader or master and is only there for redundancy". I don't really know if there's a stigma attached to it, but I always figured that 'master/follower' would never hit the same sensitive spot that 'master/slave' usually does since it re-contextualizes 'master' a bit. So far no complaints.

Primary/secondary can work well, but mostly in a tight failover/takeover scenario with paired hardware that doesn't really change topology: there is an expectation that you have two computers, the task runs mainly on one, the primary. If the primary fails, the systems fails over to the secondary. As soon as the primary comes back up, it may take over the work again, and the secondary sits tight doing little.

# ¿ Sep 13, 2018 15:51

MononcQc: May 29, 2007

if you get a 2xx it worked and you can move on
if you get a 4xx you hosed up and unless you fix things you're hosed
if you get a 5xx the server hosed up and you may retry while hoping you got idempotence right

I've seen application/content-types that would do negotiation that way. Rather than using OPTIONS, which few endpoints support, you'd send the message with the new format (application/whatever+some-spec) and if it crashed, rolled it down to an older version until you hit a non-400 answer, then cached the content-type for a period of time before retrying.

# ¿ Sep 20, 2018 21:02

MononcQc: May 29, 2007

If it's not a website I'd just 4xx with a content body.

The old webmachine diagram is the best way to go around and structure a REST framework for actual resource usage.

# ¿ Sep 20, 2018 21:57

MononcQc: May 29, 2007

Chalks posted:

In my case it is a website, and since my developers both write and consume the api (primarily) I'm also interested in identifying our fuckups on both ends by reporting on 5xx and 4xx errors, so that's part of my motivation. A bunch of 404 errors are a sign that someone's coded something wrong in the client site rather than someone keeps doing searches that return no results.

yeah, though trying to create a user that already exists, by that map, should yield a 409 Conflict response telling you the resource you're attempting to create is already there.

# ¿ Sep 20, 2018 22:01

MononcQc: May 29, 2007

I'm a return a bunch of 100 Continue just because I can, during any request while it is being uploaded, because the first revision of the HTTP/1.1 spec allowed it despite the two following ones mandating to do something else (but asks clients to still support it just in case they talk to an older 1.1 server). You will get your 200 at some other point.

Also If you can find a host that does limited or no validation about protocol negotiation upgrades (as used for websockets or HTTP/2 without TLS) you can legitimately set up a trivial upgrade mechanism that downgrades to raw TCP. With minimal glue, any cloud host using layer7 LBs that allows this (but may forbid other stuff) can be used as a SOCKS proxy even if they disable CONNECT headers and other standard proxy poo poo :ssh:

# ¿ Sep 21, 2018 01:22

MononcQc: May 29, 2007

redleader posted:

so what do you return if the request is well formed but invalid (because of rules like, i dunno, the caller passes an email address without an @)? i tend to use 400, but that diagram seems to reserve 400 for malformed requests and doesn't seem to offer any advice for this situation

imo http response codes often feel juuust not quite right for rest apis

400 is the general catch-all if nothing else fits. Like some folks would say "what your content-type is isn't application/json, it's application/my-custom-format+json and if the e-mail is invalid, that's a breakage of the expected document type so you get a 400 and it's fair since you didn't submit the right content-type". I tend to think that this is kind of acceptable as an interpretation, like there's a difference between content-encoding and the expected content itself, but you can take that leeway to force 400s onto people submitting "syntactically valid but semantically invalid" requests.

In that chart you have a bit of an annoyance because the 400 check becomes before the content-length checks, which isn't great since you may need the body to be parsed in order to return the 400. I think the first few ones could have their orders switched to reflect that at no cost. If you look at other frameworks, they tend to be more liberal with 400s while keeping the same workflow overall. https://ninenines.eu/docs/en/cowboy/2.4/guide/rest_flowcharts/ is an alternative flowchart representation for a similar framework but works more tightly from the HTTP Verbs. You'll find that any unsuccessful callback applying a PUT or POST change (AcceptCallback in the chart) will return a 400. If the code crashes, then it generates a 5xx from it.

anatoliy pltkrvkay posted:

can we talk about how wonderful the 506 word salad is

quote:
The 506 status code indicates that the server has an internal
configuration error: the chosen variant resource is configured to
engage in transparent content negotiation itself, and is therefore
not a proper end point in the negotiation process

explain what the gently caress that means without context

The 'variants' here refer to either different content-type (i.e. a text could be served as text, HTML, or PDF) or language (French, English, Spanish). This was a really big hopeful spot around the semantic web where browsers and servers could negotiate the best and most appropriate content for everyone based on their local preferences and capabilities.

It would be returned in alternate headers through a 300 multiple choices when various acceptable representations are in place. It could also be a thing they planned could be facilitated by 'smart caching proxies' that could do all that clever negotiation on-behalf of dumber backend servers. This is kind of critical because it can yield a scenario where a 'variant resource' itself re-forwards to one of the proxies or other variant resources. When that happens, you have a circular dependency that can never resolve.

That is when 506 is in theory supposed to be used.

Nobody uses it because the semantic web is mostly dead, and when multiple variants exist servers just pick the top one according to accept headers as the one result to return.

Plus HTTPS/TLS everywhere drastically reduces the number of fancy proxies people get aside from within a larger org where certs may be shared on one front-end more or less safely, and HTTP2 requiring TLS+ALPN almost end-to-end makes it even harder (few servers seem to support HTTP/1.1 -> HTTP/2 upgrade paths without ALPN negotiation it seems), so it's not really poised to make a comeback either.

MononcQc fucked around with this message at 14:21 on Sep 21, 2018

# ¿ Sep 21, 2018 14:18

MononcQc: May 29, 2007

the talent deficit posted:

basically mean you did something wrong and trying again won't work (unless it's a 429 Too Many Requests, which should really be a 5xx error) so go figure your poo poo out.

Heh that one is tricky. It was intended for rate-limiting which could be based on per-customer setting or account. The RFCs recommend setting a 'Retry-After: <seconds>' header to the response, which would mean "uh you have breached your end of the contract, take some time off". Under that interpretation I guess it makes sense to make it go 429 rather than 5xx, since the blame lies with the client.

The equivalent for a server-side overload would be 503 (Service unavailable), which can also specify a retry-after header for the client. The distinction lets you know who's to blame for the overload, basically.

429 is a good way to show that the status codes don't necessarily point at "can or can't retry", but rather "who's to blame for a query that failed". It just happens that often, if the client is to blame, they can't safely retry. 404 is kind of funnier there because it blames the client for asking for an unknown resource, but maybe the server should be to blame for not having it?

# ¿ Sep 21, 2018 14:29

MononcQc: May 29, 2007

turn it into a hashmap of functions and do a dispatch based on that (if it finds the function)

# ¿ Sep 21, 2018 14:36

MononcQc: May 29, 2007

necrotic posted:

use 422 Unprocessable Entity

https://tools.ietf.org/html/rfc4918#section-11.2

This is a WEBDAV extension, which has kind of been obsoleted by the latest HTTP RFC 7231 (https://tools.ietf.org/html/rfc7231#section-6.5.1).

Basically the order goes:

RFC 2068 (HTTP/1.1 v1):
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.

RFC 2616 (HTTP/1.1 v2, unchanged here):
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.

RFC 4918 (HTTP/1.1 // WEBDAV):
The 422 (Unprocessable Entity) status code means the server
understands the content type of the request entity (hence a
415(Unsupported Media Type) status code is inappropriate), and the
syntax of the request entity is correct (thus a 400 (Bad Request)
status code is inappropriate) but was unable to process the contained
instructions. For example, this error condition may occur if an XML
request body contains well-formed (i.e., syntactically correct), but
semantically erroneous, XML instructions.

RFC 7231 (HTTP/1.1 v3, final form evolved from HTTPBis efforts):
The 400 (Bad Request) status code indicates that the server cannot or
will not process the request due to something that is perceived to be
a client error (e.g., malformed request syntax, invalid request
message framing, or deceptive request routing).

WEBDAV worked around the restrictive nature of previous RFCs that really focused on 'syntax' wholesale. But the latest spec relaxed what 400 allowed so that you should feel alright using it now, since it is really redefined as "the server thinks the client is to blame therefore 400". In spirit, that was the reason WEBDAV defined a new status. It is still valid to use, but its rationale (400 is too strict) no longer really applies.

HTTP revisions are kind of poo poo, 4 of them all use 1.1 as a version but coexist in at least 4 distinct standards.

# ¿ Sep 21, 2018 16:49

MononcQc: May 29, 2007

anatoliy pltkrvkay posted:

thankfully we now have HTTP/2, aka "content-length mismatches are now a fatal error, get hosed"

not that the mismatches didn't not cause issues in http/1.1, but half the time it would cause request/response handling to only degrade somewhat and still more or less succeed

im curious if the [hard fail in http/2](https://httpwg.org/specs/rfc7540.html#malformed) was actually necessary to implement the stream multiplexing stuff (decent chance that it is, idk, don't want to wander through a bunch of old ietf mailing lists) or just that one of the authors was so fed up with handling weird length mismatch gremlin bugs that they wrote the spec that way (much more amusing)

Well we're not all out of it for HTTP. The HTTPBis group is currently having a bunch of active documents for what appears to be HTTP/2 improvements. However, mandating lowercase header names with binary lengths everywhere should definitely help preventing some brittle implementations.

Soricidus posted:

we�ve learned the hard way that the so-called robustness principle is a terrible idea that inevitably leads to security disasters. eg a different kind of content body not matching content length was literally the bug behind heartbleed. maybe it�s harmless in an http context but it�s better to just force terrible programmers to fix our terrible code and not take the risk.

For HTTP the problem happens with proxies; the smarter the proxies (that still makes that mistake) the worse it is. Basically, the trick is to have one request body hide another request inside of it, such that a proxy and a server will interpret the boundaries differently.

You want a caching proxy to think that it receives a response to a single request while it hides a second one inside the body that the server will respond to. This second response will be used to poison the cache on the proxy such that it will return that value instead of the original one. So if you sent a request for the website header, a static image with long cache time, but managed to get it to cache an unrelated document, you can break or corrupt webpages. In the worst cases, you can couple it with XSS (or PHP sites calling to any external script) to get a bunch of intermediary proxies to cache your broken fix on your behalf.

For a while corporations could use caching proxies on their edge to save the amount of upstream traffic, and it could be an interesting way for an attacker within the org to gently caress up a lot of stuff for people there, without the website even knowing about it.

With non-caching proxies it's not as bad, but it's still a risk if people rely on the proxy to enforce some rules (i.e. disabling CONNECT requests so people don't use you as a forwarding proxy, enforcing some rate-limiting policies, etc.)

# ¿ Sep 22, 2018 13:06

MononcQc: May 29, 2007

how come everyone wanting to interrupt a topic in this thread does it to talk about rust-related things?

also this is my mandatory mention that modal editing is the best thing I've ever had as a computer toucher in order to deal with tendonitis. Fewer chorded keyboard shortcuts because I don't need to hold the ctrl key just means I can write for 12 hours in a day instead of 3 before feeling pain.

# ¿ Sep 24, 2018 20:02

MononcQc: May 29, 2007

Janitor Prime posted:

Like why are smart quotes a thing, I've never read a rational discussion about why ' and " need to have those loving ligatures for any reason

' is an apostrophe, not a quote. ‘ and ’ are quotation marks (in some languages), which have an orientation the same way parentheses or brackets do.

If you work in publishing or printing, the three of them are distinct and must be set correctly.

Of course, a lot of systems that do "smart quotes" end up replacing the apostrophe with a right quote, so welp.

# ¿ Oct 9, 2018 23:07

MononcQc: May 29, 2007

my fav bit of the kafka architecture is whenever someone notices that oh yeah topic compaction is not enough to guarantee reliable long term storage (i.e. re-partitioning fucks with all the keys and therefore linear history of entries) so you need another canonical data source to act as a kind of backup, and so what you do is put a consumer that materializes the views in a DB.

But that's nice because you can use the DB for some direct querying. Except for some stateful component doing stream analysis over historical data; every time that component restarts, you need to sync the whole state to build the thing afresh, but doing this from a DB is not super simple so you do it from Kafka, but since Kafka can't necessarily tell you it has all the data and the DB is the one that's canonically right, you end up building ad-hoc diffs between a DB and a live stream for every restart

And there's like no good solution, you just cover your ears and hope you never make it to that point because you know you'll be hosed janitoring and reconciliating two data sources that don't necessarily have a good way to talk to each other aside from some small component/microservice written in a language only 1 person knew and they left 3 months ago

# ¿ Oct 29, 2018 14:16

MononcQc: May 29, 2007

scans the whole table sequentially on every instance boot

this is what this db was meant for

# ¿ Oct 29, 2018 14:21

MononcQc: May 29, 2007

hell if I don't know the feeling there

# ¿ Oct 29, 2018 14:31

MononcQc: May 29, 2007

simble posted:

counterpoint: just don't do this. use kafka for streaming. there are a few use cases where you'd want to store data in a a compacted topic and imo its only for things that are directly related to supporting your kafka cluster (like schemas, if you're using avro).

also the idea of repartitioning a compacted topic sounds like another nightmare that i simply would never do. i mean whats the real argument for having a large number of partitions (or increasing the number of partitions) for compacted data anyways? just have a few (<=5) and replicate it a few times.

yeah basically that's the fine usage: you don't repartition, and you don't treat kafka as a canonical data store of any kind. You treat it as a kind of special queue that gets a couple of weeks of persistence for multiple readers and then you're good. It's just that sooner or later if you read distsys literature you'll see someone saying you could use kafka for atomic broadcast which means data replication central but that is only true as long as you never have to repartition anything ever.

Repartions basically require you to manually do stop all writing, read all entries in existing partitions, and republish each value in the new partitions with higher "stamps" before syncing all clients to start publishing in the new partition as well; my understanding is that there is no standard tooling around it and the general advice seems to be "oh yeah you shouldn't have built your cluster that way"

So the best path forwards with Kafka is to not treat it as a thing where you care enough about its data in the long term to land you in that situation.

# ¿ Oct 29, 2018 19:20

MononcQc: May 29, 2007

FamDav posted:

can you not backup to s3 and does kafka not have read-through from s3

if you use insertion order to define a "happens-before" relationship with data points, as is often recommended for atomic broadcast implementations, then changing the partitions means you change the time relationships between each "key" in an overall stream: if you read all of partition 1 before partition 2, or if you read most of 1 before 2, and the canonical value for "key" is in 1, then you may crush newer state of "key" with older state in your materialized view.

If you're using timestamps you already had no clearly defined "happens-before" relationship so who cares (for high-frequency events that is)

# ¿ Oct 29, 2018 19:35

Adbot: ADBOT LOVES YOU

# ¿ Apr 26, 2024 01:45

MononcQc: May 29, 2007

simble posted:

if I was ever put into this situation, I would likely create a new topic with a new partition count, and write a simple consumer/producer (or use kafka connect) to shovel the messages into the new topic. this way order would be preserved.

then when they're ready, consumers and producers can switch to the new topic and with monitoring, you can tell when the old topic is effectively not used and then eol/delete it.

the reality is that the only time you should need to repartition data in kafka is if you need to increase parallelism for a particular consumer group. it could happen, for sure, but it should be a relatively rare event.

Right. That's the reasonable way to do it. You do need some coordination around the transfer as well, it's just kind of funny to imagine it being a thing everyone has to reinvent every time they gotta scale up.

Incidentally you should probably have to scale up way less often if you don't have the expectation that kafka acts as you persistent data store; you can just lose older data and not bother with scaling up as long as the throughput is there. However, if you treat it as a persistent store, you may have to scale according to storage space and/or throughput. So your data cardinality + the storage may impact it.

Really, not assuming it stores your data forever is making yourself a favor operationally speaking.

# ¿ Oct 29, 2018 21:22

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > awful programming: feature flags and suffering

«‹›9 »