|
carry on then posted:i work on middleware/developer tools so i only have access to little 3controller/3worker clusters, how big is a decent production kube cluster, anyways? depends on how much effort you put into tweaking cluster autoscaler and how long it’s been since someone looked at the ec2 bill. I believe the practical limit is a few thousand nodes before the pod scheduler just can’t handle its poo poo anymore
|
# ¿ Sep 13, 2021 06:40 |
|
|
# ¿ Apr 27, 2024 08:30 |
|
dads friend steve posted:literally nothing about k8s is simple. it cannot possibly be simple; there are a million moving parts to even the smallest cluster its definitely not all essential complexity. one of the founding design principles was "we can just have everything watch etcd and keep the entire cluster consistent that way". of course now you are basically making every single worker node and any other api client a raft participant, which doesn't have the best scaling properties. the solution implemented was caches and the seasoned distributed systems architect can guess the rest
|
# ¿ Sep 13, 2021 19:55 |
|
if I had those needs I would not mention them anywhere that children, the elderly, or the immunocompromised could be exposed, but I’m just courteous like that
|
# ¿ Sep 14, 2021 06:07 |
|
you can also use ipvs now but the default is still iptables
|
# ¿ Sep 14, 2021 11:04 |
|
operators are very powerful but it should be rare that a cluster owner is writing an operator. if you own a cluster, and you're managing it with ansible (ugh wtf), then why on earth would you make an operator backed by ansible. just use ansible. the point of the operator is that you use go or something to do stuff your normal tooling can't handle well
|
# ¿ Oct 17, 2021 22:45 |
|
validating and mutating webhooks are a great way to brick your cluster. I guess they do other stuff too but I’ve mostly observed them causing problems
|
# ¿ Nov 23, 2021 22:10 |
|
Progressive JPEG posted:for validators at least, might be able to use "failurePolicy: Ignore" to fail open if the pod is down. for example ive been using (and happy with) opa/gatekeeper and that's their default linkerd provides a mutating webhook and that works fine. it's not modifying your containers or anything
|
# ¿ Nov 24, 2021 05:29 |
|
operator framework doesnt support it because its a pretty deranged thing to want
|
# ¿ Jan 6, 2022 06:37 |
|
like when you say "maybe utilizing static internal cluster state info" i start thinking that an operator probably isnt what you actually want to make, because the whole point of an operator is that it sits around watching events on a resource and reacting to them. if you don't care about getting updates from the apiserver then what you're making is not an operator more or less by definition
|
# ¿ Jan 6, 2022 06:41 |
|
imagine a container crashing on a worker node, forever. that's the kubernetes vision of the future
|
# ¿ Jan 6, 2022 22:29 |
|
ate poo poo on live tv posted:Don't forget that whoever implemented your docker orchestration system doesn't understand that routing is a thing so all the IPs have to be in the same network. This is why you can only ever have one Availability Zone in AWS, it's just impossible to work in any other way. this is a weirdly specific gripe that i dont understand
|
# ¿ Jan 7, 2022 05:09 |
|
ate poo poo on live tv posted:Which part don't you understand? Why writing the code so that layer 2 adjacency is required is bad? Or something else. ah you are responsible for some application that was written by people holding the packets wrong, and then cloud is made for people who don't even know what vlans are so that causes a lot of pain. i think i get it now
|
# ¿ Jan 7, 2022 20:26 |
|
sending raw ethernet frames directly to cluster peers "for performance". like if you didn't hear a skinny 23 year old white guy talking earnestly about routing overhead when you read that sentence, you're still a junior
|
# ¿ Jan 7, 2022 21:33 |
|
dads friend steve posted:what the gently caress not literally, but that level of brain damage, yes. have you not done thing with packet before
|
# ¿ Jan 8, 2022 01:42 |
|
CMYK BLYAT! posted:i tried explaining it to our salespeople via a metaphor about how an operating system manages sharing CPU time/memory space and providing hardware abstractions for applications on a single machine, but for a machine cluster, and am pretty sure our sales team has no better understanding of kubernetes as a result ok so you know you don't have to use operator framework. again, if you're not watching for changes it's not an operator. it's just code that reads from the cluster and then writes to the cluster and then terminates. replace the helm crap with some more CRDs, run a script to turn the new resources into whatever is the result of a deployment, and when everyone realizes that this is stupid you have a clear path to handle those CRDs with the current controller/a new one. it sounds like these concerns are separable so maybe you want to end up with two controllers but anyway, why are you stuck on operator framework. just don't use it until you need to use it
|
# ¿ Jan 8, 2022 21:51 |
|
my homie dhall posted:i think kubernetes is only complex if you need persistent storage or if you do something stupid like install a service mesh I sincerely wish for everyone who believes that k8s isn’t complex to not run into one of the many, many “edge cases” that make k8s complex. edge cases in scare quotes because they weren’t until k8s showed up
|
# ¿ Sep 8, 2022 17:05 |
|
terraform 0.15 is bearable but its still poo poo
|
# ¿ Sep 9, 2022 01:43 |
|
my homie dhall posted:there’s no way for the control plane to be HA and operate in this manner because it means removing pods would require synchronizing the entire cluster. the way this person apparently wants kube to work is literally not possible lol for removing a pod to be successful the entire cluster must synchronize. this is implied by how services work in kubernetes. all that is required for pod deletion to be safe is a mechanism for nodes to indicate how far behind their kube-proxy is. this would place an incredible write load on etcd because kubernetes was designed by muppets, but it would still be HA
|
# ¿ Sep 10, 2022 02:47 |
|
btw if you really want to see some poo poo overload your apiserver. cluster dns flapping, pods getting requests for 30 seconds after they terminate, it’s real fun
|
# ¿ Sep 10, 2022 02:50 |
|
my homie dhall posted:what is supposed to happen if a node is temporarily unavailable or slow to update? if a node fucks off into hyperspace then we mark it down. if it’s slow to update then we wait until it updates
|
# ¿ Sep 10, 2022 06:41 |
|
my homie dhall posted:you realize synchronization means strong consistency, right? hmm does it. I feel like in this case we can use the fact that the pod is dying to make things work without getting too fancy. look at this: at resourceVersion m a pod is deleted at resourceVersion n we observe that every ready node’s kube-proxy is synced up to at least m with some hand waving about kubelet and kube-proxy we can now say that it’s no longer possible for a ready node to send requests to the pod, and it won’t be possible at any future resourceVersion kubelet kills the pod I think it’s worth noting that this only works if there’s some way to keep unready nodes from receiving incoming traffic, and usually that way is “ask a properly built load balancer to handle that” which tickles me
|
# ¿ Sep 10, 2022 06:52 |
|
my homie dhall posted:note that this is equivalent to “send a message to every node, commit once all of them reply to confirm” which I think is obviously not HA. you essentially have a distributed state machine that you need kept consistent (here meaning past a certain revision) across all modes that can only move forward if all nodes are available nah fam if a node isn’t available we can assume it’s also not forwarding requests and so doesn’t need to block deletion. the actual problem is nodes that are available but not making progress. I contend that getting stuck is correct in this case and if your infra team doesn’t like getting pages about it then they shouldn’t have pushed so hard for kubernetes
|
# ¿ Sep 10, 2022 15:04 |
|
distortion park posted:I should point out that idk if the problem I originally posted is impossible to solve in general, but it definitely didn't occur using ECS Fargate and definitely did running the same system on eks. This was a pretty small system with light but consistent load ECS was built by people who knew how to design for scale and it shows
|
# ¿ Sep 10, 2022 20:41 |
|
nrook posted:there is very little I wouldn't do to avoid figuring out how to run two python servers with different versions of the same deps on the same server at the same time the way to do this is use docker
|
# ¿ Sep 12, 2022 05:16 |
|
nudgenudgetilt posted:or venvs... until three months later when one app has upgraded and needs Python 3.9+ and the other app breaks on anything newer than 3.6 so now you’re managing interpreters too. just use docker
|
# ¿ Sep 12, 2022 05:21 |
|
freeasinbeer posted:I’m a heretic but I really like terraform workspaces for making GBS threads out identical EKS clusters. makes dealing with 20+ clusters across multiple regions way easier then the folder per env, or worse per cluster. locals and sane defaults can go a long way imo. works ok with the kustomize provider. you’re using terraform as a templating layer then, basically. I wouldn’t trust something running inside the cluster to manage the infrastructure gunk that kube needs to be usable in production, and terraform is already there, so why not the thing about workspaces is that you could have a module, a list, a foreach, and achieve the same result in a much more easily discoverable way. some might object to managing 20 eks clusters in a single state file, but to those people I say that glory shuns a coward
|
# ¿ Dec 2, 2022 05:01 |
|
|
# ¿ Apr 27, 2024 08:30 |
|
you can also put the control plane inside the vpc though
|
# ¿ Dec 17, 2022 01:31 |