|
The Fool posted:First day at new job today Me too! First day as an SRE. I’m documenting which service is used by which version of our application so we can label our k8s deployments because literally this is not documented anywhere. We only have two versions applications, new and old, and nobody knows what belongs to what (in fairness there's lots of shared functionality... but these services sure aren't very "micro") The Iron Rose fucked around with this message at 18:08 on Nov 30, 2020 |
# ? Nov 30, 2020 18:01 |
|
|
# ? Apr 25, 2024 12:22 |
|
drat y'all, that's a lot of start dates. Hope your new jobs go great!
|
# ? Nov 30, 2020 18:25 |
|
Congrats on your first day The Fool, The Iron Rose and BaseballPCHiker. I today. Flirted with the idea of pulling 2 jobs for a month or 2 but really can’t bring myself to do that. New role starts on Jan 2nd. First official role as Teamlead.
|
# ? Nov 30, 2020 18:28 |
|
Congratulations to all of you getting a fresh start. Wish you the best of luck, and if worst comes to worst try turning it off and back on again.
|
# ? Nov 30, 2020 18:47 |
|
Just remember, SRE stands for "Simply Reboot Everything"
|
# ? Nov 30, 2020 18:49 |
|
Thanks everyone! First time in a full-time blue team security role so I'm really excited to get started.
|
# ? Nov 30, 2020 18:49 |
Matt Zerella posted:Just remember, SRE stands for "Simply Reboot Everything" The Iron Rose posted:(in fairness there's lots of shared functionality... but these services sure aren't very "micro") Look, we jammed this poo poo in k8s so it's microservices now. At least that's what we told our boss. i am a moron fucked around with this message at 18:52 on Nov 30, 2020 |
|
# ? Nov 30, 2020 18:50 |
|
Matt Zerella posted:Just remember, SRE stands for "Simply Reboot Everything" wtf I'm an SRE now
|
# ? Nov 30, 2020 18:54 |
|
Thanks everyone. I’m currently browsing the internal wiki while waiting for my dev account to be provisioned.
|
# ? Nov 30, 2020 18:54 |
|
the internal wiki doesn't require you to sign in?
|
# ? Nov 30, 2020 19:10 |
|
Biowarfare posted:the internal wiki doesn't require you to sign in? I have a regular user account, so I can sign in to my laptop and a bunch of other company resources, but don’t have any privileged accounts yet so I can’t do any actual work.
|
# ? Nov 30, 2020 19:13 |
Also I think it was Sickening who I was bitching about APIM's too earlier in the thread. I've since discovered APIMs can do VNET stuff now. Or I missed it before. But anyways, they aren't as bad as I thought. It also looks like it's been that way for a while, so my username again becomes a self-fulfilling prophecy. Never trust anything I say.
|
|
# ? Nov 30, 2020 19:16 |
|
The Iron Rose posted:First day as an SRE. The Fool posted:First day at new job today the cloud is going to kill your jobs
|
# ? Nov 30, 2020 19:27 |
|
Methanar posted:the cloud is going to kill Look someone’s gotta build skynet why should it not be me?
|
# ? Nov 30, 2020 19:29 |
|
Methanar posted:the cloud is going to kill your jobs Stop with the low-content nonsense.
|
# ? Nov 30, 2020 19:41 |
|
The Iron Rose posted:Look someone’s gotta build skynet why should it not be me? Sending a reprogrammed terraforminator back in time to get you to build it all in US-East-1.
|
# ? Nov 30, 2020 20:06 |
|
luminalflux posted:Unfortunately we're in the position where Vault actually ticks a lot of boxes for us: I've been mulling this over for a little bit, and it reminds me of a brief period where I worked at a place that had been running on the same tech and the same really sick Level 3 colo deal for like 20 years. This place had a heavily used internal CA, extremely robust internal dns schema that every device participated in, everything did rsyslog, all device configs checked into/updated from CVS, etc. Development published rpms to an internal yum repository, node metadata went into internal txt records, stuff like that. It's like if you took a snapshot of every tool used to run a linux application stack in the year 2003 and spent all of your time from then until now refining your usage of those tools. Working here definitely created this "well, why not use <the thing that has existed for 20+ years>" attitude that I approach every problem with -- our internal CA was a few machines that had access to the root keypair and openssl command line tools installed. Certificate distribution is handled by ansible's "delegate_to", where so long as sufficient credentials are provided, tasks run on a random CA machine to generate and sign a certificate, but in the context of the consuming machine(s) so that results from the signing process are available for use in followup automation. The benefit of using ansible for this is that the new, "modern" process is exactly the same as the old process that someone checked into cvs in 2003, except ansible does it for you instead of just describing how to do it. The roles that configure applications (including mysql in theory, in practice usually postgres) iterate over the servers and correctly apply credentials and certificates in such a way that does not cause application downtime. Usually this is really simple and you can just apply "serial: 50%", but sometimes it gets more complicated, like the kafka scenario I mentioned earlier where we needed to make assertions about prometheus metrics during the deployment. I'm of the firm stance that the only time you should consider using a hashicorp tool is when your rate of change or implementation complexity is so high that the relevant old piece of technology is untenable. If your ssl certificate would expire in 5 minutes, yeah, definitely use vault for that. If you can't tolerate internal txt record TTLs in your application, yeah use some kind of key value store. If you just want every web server to have unique database credentials, that's easy and has been easy for quite some time, vault is not a requirement for it.
|
# ? Nov 30, 2020 20:08 |
|
Biowarfare posted:You can afford to run a cluster of mysql on raw ec2 with provisioned iops mounts but can't afford 400/m? Using provisioned IOPS is dumb for the most part - you can get cheaper performance just by allocating large gp2 volumes since you get 3 IOPS per GB up to 16k. If you need more than 5.5TB storage you can consider using software raid - we did this for a minute. Now our main DBs are on i3 instances using instance storage, with ProxySQL and Orchestrator to handle leader promotion if an instance goes away. 12 rats tied together posted:The roles that configure applications (including mysql in theory, in practice usually postgres) iterate over the servers and correctly apply credentials and certificates in such a way that does not cause application downtime. Usually this is really simple and you can just apply "serial: 50%", but sometimes it gets more complicated, like the kafka scenario I mentioned earlier where we needed to make assertions about prometheus metrics during the deployment. We have somewhere around 200 production instances, and we ran in to limits early on running ansible from a control machine for doing deploys/updates. We've moved to a model where ansible runs locally on each instance (it checks out the ansible repo and runs against local host) and pretty much runs a full converge like you would with chef-solo.
|
# ? Nov 30, 2020 20:36 |
|
That's a little curious, the largest playbook I've run is I think 2048 machines (a priam cassandra cluster), but even with such a high number of nodes ansible will only touch whatever the fork count is at once. You do need a pretty powerful machine for a high (>60) fork count, though.luminalflux posted:If you need more than 5.5TB storage you can consider using software raid - we did this for a minute. I was vehemently against this implementation at first until the requesting engineer gave me the price comparison between software raid0 and provisioned iops. It actually ended up working great, too, except the ansible part.
|
# ? Nov 30, 2020 21:30 |
|
The Iron Rose posted:Look someone’s gotta build skynet why should it not be me? For a while I was working on new cluster creation automation for a high powered storage and analytics PaaS system. If we had somehow hooked the automation into our hardware ordering channel we could have basically made a supercomputer self-replicating.
|
# ? Nov 30, 2020 21:38 |
|
12 rats tied together posted:That's a little curious, the largest playbook I've run is I think 2048 machines (a priam cassandra cluster), but even with such a high number of nodes ansible will only touch whatever the fork count is at once. You do need a pretty powerful machine for a high (>60) fork count, though. I haven't delved into it fully why it breaks, but we've gone down this route with a controller running a playbook that runs ansible running the local playbook on each host. Mutterings i've heard from previous SREs were "it was too finicky and broke a lot", but that might just be bad ansible. We've moved on to the point where we do baked AMIs into autoscaling groups with Spinnaker, and each node just runs ansible locally from cloud-init on boot. quote:The thing I am least proud of at $last-job is doing basically this for an aerospike cluster, where we would IIRC create up to a 38 volume software raid0 across gp2 volumes. I had to add an ansible assertion to make sure that the role wasn't running with more volumes than can be attached to an ec2 instance, and then we had some funky looping to create valid device indexes given that we might be creating more devices than there are alphabet letters. I was under the impression that the max EBS bandwidth you can wring out of an instance is 80k on the newest c5 instances so i'm curious how that worked out
|
# ? Nov 30, 2020 22:26 |
|
luminalflux posted:I was under the impression that the max EBS bandwidth you can wring out of an instance is 80k on the newest c5 instances so i'm curious how that worked out These were actually i3.8xlarges so according to the chart the performance was likely even worse than 80k, unfortunately the requesting engineer and their team were extremely siloed from standard ops so I don't have specifics, just that there were no complaints. There was actually a single complaint -- this team didn't trust ops basically at all, so instead of using an autoscaling group and adjusting thresholds based on need, and instead of asking for launch-triggered provisioning, their plan was to request that we provision and configure a poo poo ton of them (4x standard need, I think?) and they were going to turn off the ones they weren't using, and then manually turn them on when needed. Somehow this plan missed the fact that i3 instances have ephemeral root volumes, so, they did this the first time, destroyed all of their clusters, and didn't notice until they needed the capacity and turned on a bunch of blank instances with 40 attached volumes. After that was resolved I understand that they were happy with the clusters indefinitely but that ops hated, and probably still hates, the ansible.
|
# ? Nov 30, 2020 22:44 |
|
Someone was asking a few weeks ago if we preferred content in text or video form, and I responded that I definitely preferred to read text. Someone that hates me must have read that, because all of the up to date new employee stuff is in the form of recordings of Teams presentations.
|
# ? Nov 30, 2020 22:54 |
|
i watched an architecture overview video today which was significantly more comprehensible than the agonizing wiki document for it it was mostly useful because the presenter would go off on occasional joking asides that provided pivotal context.
|
# ? Nov 30, 2020 22:56 |
|
I’m watching a series of videos on how the application deployment pipelines are designed and the guy is clearing is through every 10 seconds.
|
# ? Nov 30, 2020 23:02 |
|
in unrelated news, my first ever stackoverflow post from 2 years and 11 months ago (that I answered myself) got a reply today, thanking me for coming up with a solution and expanding on it! Was a nice warm fuzzy feeling to know this completely random useful thing was providing dividends years later. well, nice and fuzzy until i read what was deeply amateurish powershell but you can't have everything
|
# ? Nov 30, 2020 23:10 |
|
Digging through your stack overflow history can be cringeworthy...
|
# ? Dec 1, 2020 00:20 |
|
Or your SA posting history, for that matter.
|
# ? Dec 1, 2020 00:22 |
|
The Iron Rose posted:well, nice and fuzzy until i read what was deeply amateurish powershell but you can't have everything I still get stackexchange upvotes or rep or whatever from some very amateur powershell advice (that was functionally correct though explained wrong) from 2011.. It's the gift that keeps on giving. man that reminds me I should check into the unanswered questions on there occasionally.
|
# ? Dec 1, 2020 00:34 |
|
The Fool posted:I’m watching a series of videos on how the application deployment pipelines are designed and the guy is clearing is through every 10 seconds. Jfc, the guy is eating while recording
|
# ? Dec 1, 2020 00:35 |
|
When I do my yearly watch-through of our architecture videos I feel stupid, not because the concepts are complex but because everything is arranged and hacked together so nonsensically. e: That reminds me of today: an FTP proxy server stopped working for us. Well it turns out instead of working like a normal proxy where it's basically transparent and hides the source IP nope... we have a user account that logs into the proxy server itself and then transfers the file to the proxy and then to our server. Also, we are using this proxy when FTPing files from our own company, in our own DC for... reasons. The network engineer was like "you guys are crazy." Sometimes I don't know how everything isn't on fire. Woof Blitzer fucked around with this message at 00:52 on Dec 1, 2020 |
# ? Dec 1, 2020 00:46 |
|
The Fool posted:Jfc, the guy is eating while recording Watched a girl get Panda Express delivered during a call today and the she just scarfed the noodles with no shame
|
# ? Dec 1, 2020 01:11 |
|
my kind of girl
|
# ? Dec 1, 2020 01:26 |
|
Internet Explorer posted:Or your SA posting history, for that matter.
|
# ? Dec 1, 2020 01:41 |
|
Internet Explorer posted:Or your SA posting history, for that matter. One of these days an SA post is going to feature into a muckraking ad for a congressman. Content: I have a new area of work, and i get to learn new and exciting things! Yay!. My mentor for this position happens to be 16 hours ahead of me, so now I can't even start a meeting with him until like, 6 PM my time. My winter just got some seriously long nights...
|
# ? Dec 1, 2020 04:25 |
|
I really wonder how anyone is going to run for politics in the future when we've got stuff like MySpace, Xanga, Facebook, Twitter, etc. floating around.
|
# ? Dec 1, 2020 05:21 |
|
Gabriel S. posted:I really wonder how anyone is going to run for politics in the future when we've got stuff like MySpace, Xanga, Facebook, Twitter, etc. floating around. I don't know how far this goes into politics chat rules but apparently a teenager won a Kansas state house primary with everything from sexual harassment to revenge porn of a 13 year old.
|
# ? Dec 1, 2020 05:30 |
|
Gabriel S. posted:I really wonder how anyone is going to run for politics in the future when we've got stuff like MySpace, Xanga, Facebook, Twitter, etc. floating around. Have you been in a coma since around October/November 2016? Because hoo-boy is there some stuff to catch you up on...
|
# ? Dec 1, 2020 11:15 |
|
Also afaik old MySpace and Xanga content is effectively wiped from existence.
|
# ? Dec 1, 2020 13:05 |
|
|
# ? Apr 25, 2024 12:22 |
|
Neddy Seagoon posted:Have you been in a coma since around October/November 2016? Because hoo-boy is there some stuff to catch you up on... Internet 1.0 was the best internet.
|
# ? Dec 1, 2020 15:24 |