Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

The Fool posted:

First day at new job today :peanut:

Also, new place is a terraform enterprise customer

Me too! First day as an SRE.

I’m documenting which service is used by which version of our application so we can label our k8s deployments because literally this is not documented anywhere.


We only have two versions applications, new and old, and nobody knows what belongs to what :suicide:

(in fairness there's lots of shared functionality... but these services sure aren't very "micro")

The Iron Rose fucked around with this message at 18:08 on Nov 30, 2020

Adbot
ADBOT LOVES YOU

Internet Explorer
Jun 1, 2005





drat y'all, that's a lot of start dates.

Hope your new jobs go great!

LochNessMonster
Feb 3, 2005

I need about three fitty


Congrats on your first day The Fool, The Iron Rose and BaseballPCHiker.

I :yotj: today. Flirted with the idea of pulling 2 jobs for a month or 2 but really can’t bring myself to do that.

New role starts on Jan 2nd. First official role as Teamlead.

Dick Trauma
Nov 30, 2007

God damn it, you've got to be kind.
Congratulations to all of you getting a fresh start. Wish you the best of luck, and if worst comes to worst try turning it off and back on again.

Matt Zerella
Oct 7, 2002

Norris'es are back baby. It's good again. Awoouu (fox Howl)
Just remember, SRE stands for "Simply Reboot Everything"

BaseballPCHiker
Jan 16, 2006

Thanks everyone! First time in a full-time blue team security role so I'm really excited to get started.

i am a moron
Nov 12, 2020

"I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

Matt Zerella posted:

Just remember, SRE stands for "Simply Reboot Everything"

:captainpop:

The Iron Rose posted:

(in fairness there's lots of shared functionality... but these services sure aren't very "micro")

Look, we jammed this poo poo in k8s so it's microservices now. At least that's what we told our boss.

i am a moron fucked around with this message at 18:52 on Nov 30, 2020

Woof Blitzer
Dec 29, 2012

[-]

Matt Zerella posted:

Just remember, SRE stands for "Simply Reboot Everything"

wtf I'm an SRE now

The Fool
Oct 16, 2003


Thanks everyone.

I’m currently browsing the internal wiki while waiting for my dev account to be provisioned.

Impotence
Nov 8, 2010
Lipstick Apathy
the internal wiki doesn't require you to sign in?

The Fool
Oct 16, 2003


Biowarfare posted:

the internal wiki doesn't require you to sign in?

I have a regular user account, so I can sign in to my laptop and a bunch of other company resources, but don’t have any privileged accounts yet so I can’t do any actual work.

i am a moron
Nov 12, 2020

"I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"
Also I think it was Sickening who I was bitching about APIM's too earlier in the thread. I've since discovered APIMs can do VNET stuff now. Or I missed it before. But anyways, they aren't as bad as I thought. It also looks like it's been that way for a while, so my username again becomes a self-fulfilling prophecy. Never trust anything I say.

Methanar
Sep 26, 2013

by the sex ghost

The Iron Rose posted:

First day as an SRE.


The Fool posted:

First day at new job today

the cloud is going to kill your jobs

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Methanar posted:

the cloud is going to kill

Look someone’s gotta build skynet why should it not be me?

Internet Explorer
Jun 1, 2005





Methanar posted:

the cloud is going to kill your jobs

Stop with the low-content nonsense.

Matt Zerella
Oct 7, 2002

Norris'es are back baby. It's good again. Awoouu (fox Howl)

The Iron Rose posted:

Look someone’s gotta build skynet why should it not be me?

Sending a reprogrammed terraforminator back in time to get you to build it all in US-East-1.

12 rats tied together
Sep 7, 2006

luminalflux posted:

Unfortunately we're in the position where Vault actually ticks a lot of boxes for us:
  • We need an internal CA that doesn't suck, and a way to distribute certs. AWS ACM Private CA costs $400/mo. Per Region.
  • We run MySQL on EC2 instances (See my Percona Live talk "When RDS doesn't live up to it's promises" (cancelled due to covid!)) and want to rotate MySQL (and other) credentials

I've been mulling this over for a little bit, and it reminds me of a brief period where I worked at a place that had been running on the same tech and the same really sick Level 3 colo deal for like 20 years. This place had a heavily used internal CA, extremely robust internal dns schema that every device participated in, everything did rsyslog, all device configs checked into/updated from CVS, etc. Development published rpms to an internal yum repository, node metadata went into internal txt records, stuff like that.

It's like if you took a snapshot of every tool used to run a linux application stack in the year 2003 and spent all of your time from then until now refining your usage of those tools. Working here definitely created this "well, why not use <the thing that has existed for 20+ years>" attitude that I approach every problem with -- our internal CA was a few machines that had access to the root keypair and openssl command line tools installed. Certificate distribution is handled by ansible's "delegate_to", where so long as sufficient credentials are provided, tasks run on a random CA machine to generate and sign a certificate, but in the context of the consuming machine(s) so that results from the signing process are available for use in followup automation.

The benefit of using ansible for this is that the new, "modern" process is exactly the same as the old process that someone checked into cvs in 2003, except ansible does it for you instead of just describing how to do it.

The roles that configure applications (including mysql in theory, in practice usually postgres) iterate over the servers and correctly apply credentials and certificates in such a way that does not cause application downtime. Usually this is really simple and you can just apply "serial: 50%", but sometimes it gets more complicated, like the kafka scenario I mentioned earlier where we needed to make assertions about prometheus metrics during the deployment.

I'm of the firm stance that the only time you should consider using a hashicorp tool is when your rate of change or implementation complexity is so high that the relevant old piece of technology is untenable. If your ssl certificate would expire in 5 minutes, yeah, definitely use vault for that. If you can't tolerate internal txt record TTLs in your application, yeah use some kind of key value store.

If you just want every web server to have unique database credentials, that's easy and has been easy for quite some time, vault is not a requirement for it.

luminalflux
May 27, 2005



Biowarfare posted:

You can afford to run a cluster of mysql on raw ec2 with provisioned iops mounts but can't afford 400/m?

Using provisioned IOPS is dumb for the most part - you can get cheaper performance just by allocating large gp2 volumes since you get 3 IOPS per GB up to 16k. If you need more than 5.5TB storage you can consider using software raid - we did this for a minute. Now our main DBs are on i3 instances using instance storage, with ProxySQL and Orchestrator to handle leader promotion if an instance goes away.


12 rats tied together posted:

The roles that configure applications (including mysql in theory, in practice usually postgres) iterate over the servers and correctly apply credentials and certificates in such a way that does not cause application downtime. Usually this is really simple and you can just apply "serial: 50%", but sometimes it gets more complicated, like the kafka scenario I mentioned earlier where we needed to make assertions about prometheus metrics during the deployment.

We have somewhere around 200 production instances, and we ran in to limits early on running ansible from a control machine for doing deploys/updates. We've moved to a model where ansible runs locally on each instance (it checks out the ansible repo and runs against local host) and pretty much runs a full converge like you would with chef-solo.

12 rats tied together
Sep 7, 2006

That's a little curious, the largest playbook I've run is I think 2048 machines (a priam cassandra cluster), but even with such a high number of nodes ansible will only touch whatever the fork count is at once. You do need a pretty powerful machine for a high (>60) fork count, though.

luminalflux posted:

If you need more than 5.5TB storage you can consider using software raid - we did this for a minute.
The thing I am least proud of at $last-job is doing basically this for an aerospike cluster, where we would IIRC create up to a 38 volume software raid0 across gp2 volumes. I had to add an ansible assertion to make sure that the role wasn't running with more volumes than can be attached to an ec2 instance, and then we had some funky looping to create valid device indexes given that we might be creating more devices than there are alphabet letters.

I was vehemently against this implementation at first until the requesting engineer gave me the price comparison between software raid0 and provisioned iops. It actually ended up working great, too, except the ansible part.

Paladine_PSoT
Jan 2, 2010

If you have a problem Yo, I'll solve it

The Iron Rose posted:

Look someone’s gotta build skynet why should it not be me?

For a while I was working on new cluster creation automation for a high powered storage and analytics PaaS system. If we had somehow hooked the automation into our hardware ordering channel we could have basically made a supercomputer self-replicating.

luminalflux
May 27, 2005



12 rats tied together posted:

That's a little curious, the largest playbook I've run is I think 2048 machines (a priam cassandra cluster), but even with such a high number of nodes ansible will only touch whatever the fork count is at once. You do need a pretty powerful machine for a high (>60) fork count, though.

I haven't delved into it fully why it breaks, but we've gone down this route with a controller running a playbook that runs ansible running the local playbook on each host. Mutterings i've heard from previous SREs were "it was too finicky and broke a lot", but that might just be bad ansible.

We've moved on to the point where we do baked AMIs into autoscaling groups with Spinnaker, and each node just runs ansible locally from cloud-init on boot.

quote:

The thing I am least proud of at $last-job is doing basically this for an aerospike cluster, where we would IIRC create up to a 38 volume software raid0 across gp2 volumes. I had to add an ansible assertion to make sure that the role wasn't running with more volumes than can be attached to an ec2 instance, and then we had some funky looping to create valid device indexes given that we might be creating more devices than there are alphabet letters.

I was vehemently against this implementation at first until the requesting engineer gave me the price comparison between software raid0 and provisioned iops. It actually ended up working great, too, except the ansible part.

I was under the impression that the max EBS bandwidth you can wring out of an instance is 80k on the newest c5 instances so i'm curious how that worked out

12 rats tied together
Sep 7, 2006

luminalflux posted:

I was under the impression that the max EBS bandwidth you can wring out of an instance is 80k on the newest c5 instances so i'm curious how that worked out

These were actually i3.8xlarges so according to the chart the performance was likely even worse than 80k, unfortunately the requesting engineer and their team were extremely siloed from standard ops so I don't have specifics, just that there were no complaints.

There was actually a single complaint -- this team didn't trust ops basically at all, so instead of using an autoscaling group and adjusting thresholds based on need, and instead of asking for launch-triggered provisioning, their plan was to request that we provision and configure a poo poo ton of them (4x standard need, I think?) and they were going to turn off the ones they weren't using, and then manually turn them on when needed.

Somehow this plan missed the fact that i3 instances have ephemeral root volumes, so, they did this the first time, destroyed all of their clusters, and didn't notice until they needed the capacity and turned on a bunch of blank instances with 40 attached volumes. After that was resolved I understand that they were happy with the clusters indefinitely but that ops hated, and probably still hates, the ansible.

The Fool
Oct 16, 2003


Someone was asking a few weeks ago if we preferred content in text or video form, and I responded that I definitely preferred to read text.

Someone that hates me must have read that, because all of the up to date new employee stuff is in the form of recordings of Teams presentations.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
i watched an architecture overview video today which was significantly more comprehensible than the agonizing wiki document for it

it was mostly useful because the presenter would go off on occasional joking asides that provided pivotal context.

The Fool
Oct 16, 2003


I’m watching a series of videos on how the application deployment pipelines are designed and the guy is clearing is through every 10 seconds.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
in unrelated news, my first ever stackoverflow post from 2 years and 11 months ago (that I answered myself) got a reply today, thanking me for coming up with a solution and expanding on it! Was a nice warm fuzzy feeling to know this completely random useful thing was providing dividends years later.

well, nice and fuzzy until i read what was deeply amateurish powershell but you can't have everything

Paladine_PSoT
Jan 2, 2010

If you have a problem Yo, I'll solve it

Digging through your stack overflow history can be cringeworthy...

Internet Explorer
Jun 1, 2005





Or your SA posting history, for that matter.

Zaepho
Oct 31, 2013

The Iron Rose posted:

well, nice and fuzzy until i read what was deeply amateurish powershell but you can't have everything

I still get stackexchange upvotes or rep or whatever from some very amateur powershell advice (that was functionally correct though explained wrong) from 2011.. It's the gift that keeps on giving. man that reminds me I should check into the unanswered questions on there occasionally.

The Fool
Oct 16, 2003


The Fool posted:

I’m watching a series of videos on how the application deployment pipelines are designed and the guy is clearing is through every 10 seconds.

Jfc, the guy is eating while recording

Woof Blitzer
Dec 29, 2012

[-]
When I do my yearly watch-through of our architecture videos I feel stupid, not because the concepts are complex but because everything is arranged and hacked together so nonsensically.

e: That reminds me of today: an FTP proxy server stopped working for us. Well it turns out instead of working like a normal proxy where it's basically transparent and hides the source IP nope... we have a user account that logs into the proxy server itself and then transfers the file to the proxy and then to our server. Also, we are using this proxy when FTPing files from our own company, in our own DC for... reasons. The network engineer was like "you guys are crazy."

Sometimes I don't know how everything isn't on fire.

Woof Blitzer fucked around with this message at 00:52 on Dec 1, 2020

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

The Fool posted:

Jfc, the guy is eating while recording

Watched a girl get Panda Express delivered during a call today and the she just scarfed the noodles with no shame

klosterdev
Oct 10, 2006

Na na na na na na na na Batman!
my kind of girl

Wibla
Feb 16, 2011

Internet Explorer posted:

Or your SA posting history, for that matter.

:drat:

Paladine_PSoT
Jan 2, 2010

If you have a problem Yo, I'll solve it

Internet Explorer posted:

Or your SA posting history, for that matter.

One of these days an SA post is going to feature into a muckraking ad for a congressman.

Content: I have a new area of work, and i get to learn new and exciting things! Yay!.

My mentor for this position happens to be 16 hours ahead of me, so now I can't even start a meeting with him until like, 6 PM my time. My winter just got some seriously long nights...

Gucci Loafers
May 20, 2006

Ask yourself, do you really want to talk to pair of really nice gaudy shoes?


I really wonder how anyone is going to run for politics in the future when we've got stuff like MySpace, Xanga, Facebook, Twitter, etc. floating around.

Impotence
Nov 8, 2010
Lipstick Apathy

Gabriel S. posted:

I really wonder how anyone is going to run for politics in the future when we've got stuff like MySpace, Xanga, Facebook, Twitter, etc. floating around.

I don't know how far this goes into politics chat rules but apparently a teenager won a Kansas state house primary with everything from sexual harassment to revenge porn of a 13 year old.

Neddy Seagoon
Oct 12, 2012

"Hi Everybody!"

Gabriel S. posted:

I really wonder how anyone is going to run for politics in the future when we've got stuff like MySpace, Xanga, Facebook, Twitter, etc. floating around.

Have you been in a coma since around October/November 2016? Because hoo-boy is there some stuff to catch you up on...

Spring Heeled Jack
Feb 25, 2007

If you can read this you can read
Also afaik old MySpace and Xanga content is effectively wiped from existence.

Adbot
ADBOT LOVES YOU

Gucci Loafers
May 20, 2006

Ask yourself, do you really want to talk to pair of really nice gaudy shoes?


Neddy Seagoon posted:

Have you been in a coma since around October/November 2016? Because hoo-boy is there some stuff to catch you up on...

Internet 1.0 was the best internet.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply