Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

devmd01 posted:

“Tell me about a time you caused an outage”
WHICH TIME

(I plugged a T1 into an ethernet port on a phone system once)

Adbot
ADBOT LOVES YOU

RFC2324
Jun 7, 2012

http 418

I've never caused an outage.

I've hosed up a bunch in ways that would have been outages if they happened outside the maintenance window tho. Only had to extend the window because of my poo poo twice 😂

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
I don't think I've ever caused an outage either.

There was one time where I made a typo in our MDM-d wifi profile, which knocked everyone offline till they connected to our guest network and got the corrected profile 30 seconds later.

I have no doubt that, now that I'm working with distributed systems, outages will become a more common affair.


also thanks for the interview questions y'all! I stole a few of them. also added "what do you do when you don't know how to solve something" in the forlorn hope we hire someone who's confident enough to google their own problems rather than escalating every time.



vvvv oh that does remind me i briefly brought down our domain controllers because i forgot to check the IP address after rebooting post-patching. briefly lost DNS for an hour or two, but fortunately only like 3 IT people were in the office and a static IP address later all was well with the world.

The Iron Rose fucked around with this message at 17:33 on Dec 8, 2020

klosterdev
Oct 10, 2006

Na na na na na na na na Batman!
I blew up one of our sites updating a Server 08R2 machine last year. Had to drive over and static IP/DNS everything, managed to get into the bootlooping server with directory services restore mode, (not even safe mode worked) pulled the data, uploaded it to a server near the site, set up a share, applied the existing security groups to it, and configured automapping to point to the new location.

Runcible Cat
May 28, 2007

Ignoring this post

RFC2324 posted:

I've never caused an outage.

I've hosed up a bunch in ways that would have been outages if they happened outside the maintenance window tho. Only had to extend the window because of my poo poo twice 😂

I've toasted the odd router/switch with a firmware update, but that's not my fault as such, it's on the manufacturers :colbert:. And a client's DNS records got blanked moving to a new registrar, but I'd kept a backup so it was a trivial fix.

I did once manage to gently caress up and create a group policy that would have locked out all admin accounts (I think - it was something similarly ghastly), but I managed to realise that before I logged out of the one I was logged in on and reversed it.

I'm a fanatical documenter and backer-up of settings and it's been a godsend over the years.

RFC2324
Jun 7, 2012

http 418

Gats Akimbo posted:

I've toasted the odd router/switch with a firmware update, but that's not my fault as such, it's on the manufacturers :colbert:. And a client's DNS records got blanked moving to a new registrar, but I'd kept a backup so it was a trivial fix.

I did once manage to gently caress up and create a group policy that would have locked out all admin accounts (I think - it was something similarly ghastly), but I managed to realise that before I logged out of the one I was logged in on and reversed it.

I'm a fanatical documenter and backer-up of settings and it's been a godsend over the years.

I'm pretty fanatic about making sure I 100% understand the commands I am issuing, particularly after mirroring a new drive in the wrong direction and wiping the partition table on the old production drive

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!



You guys never pulled one of those on a production DB?

RFC2324
Jun 7, 2012

http 418

Bob Morales posted:



You guys never pulled one of those on a production DB?

I've always been stupid careful with databases because I don't understand why they are so complicated.

And like I said, I've screwed up stuff, but only during maintenance windows when it didn't count as an outage.

Paladine_PSoT
Jan 2, 2010

If you have a problem Yo, I'll solve it

RFC2324 posted:

I've never caused an outage.

Oof... I have... indirectly.

It's not my fault someone scheduled a load test in a datacenter that hosted all of a particular application for all of europe. It's also not my fault our load test ground the network to a screeching halt for 6 hours.

Methanar
Sep 26, 2013

by the sex ghost
lol if you've never had to debug an outage you caused while in the backseat of your boss's car as he road trips the SRE team from virginia to new york

RFC2324
Jun 7, 2012

http 418

Paladine_PSoT posted:

Oof... I have... indirectly.

It's not my fault someone scheduled a load test in a datacenter that hosted all of a particular application for all of europe. It's also not my fault our load test ground the network to a screeching halt for 6 hours.

poo poo, I lied, I have caused one outage.

I didn't realize taking a core dump of a java app would freeze the app while the dump ran, and was told to try and get one of a problem we had before the problem took down everything. I was also given access to prod, but not dev or test(not that the issue existed outside prod)

ssb
Feb 16, 2006

WOULD YOU ACCOMPANY ME ON A BRISK WALK? I WOULD LIKE TO SPEAK WITH YOU!!


The Iron Rose posted:

this is a great question that I will be stealing in... two hours or so


What are some of y'all's favourite sysadmin generalist questions to ask? I really like "explain how the internet works". Gives them an opportunity to go up and down the OSI model, or dip in and out of areas where they're most comfortable.

I like things that are conceptual and show that they understand how to do the less-technically-specific things of the job. It's great if they remember what DNS stands for and can recite LVM commands off the top of their head, but both of those things are easy to quick reference so long as you understand DNS and/or LVM for example. Here's a couple (somewhat tweaked) examples that we've used before (job was a senior linux engineer/admin). I think I mentioned a couple of these in the thread around the post you quoted:

* When designing an IT service or solution for a customer, what are some of the considerations that go into your design?

(wanted to hear that they had an idea to design enterprise solutions factoring in scale, HA needs, cost, security, vendor requirements, etc)

* When do you choose to write a script rather than implement a solution by hand

(script could include config management stuff, basically anything was OK here so long as it made sense and showed that they at least knew how to do this stuff)

* You have to do a critical security patch off-schedule - think heartbleed type severity. Please outline at a high level the technical steps and change management processes you'd follow.

(they have to have some concept of test before prod, understand how change management works at least to some degree, and they should mention how the patch is deployed and show some degree of understanding how to do that on umpteen servers (yum, config management, etc etc))

devmd01
Mar 7, 2006

Elektronik
Supersonik

Methanar posted:

lol if you've never had to debug an outage you caused while in the backseat of your boss's car as he road trips the SRE team from virginia to new york

I once did a point of sale software push while sitting in the back seat of the company jet waiting for our clearance to takeoff. I was working on it in the waiting area as the final plane prep was being done and I managed to forget to hand in my rental car keys. Oops.

Thanks Ants
May 21, 2004

#essereFerrari


My go-to is when I caused 40 doors to lock shut for two hours because they all failed 'safe' during a firmware update that I didn't know was about to happen.

Renegret
May 26, 2007

THANK YOU FOR CALLING HELP DOG, INC.

YOUR POSITION IN THE QUEUE IS *pbbbbbbbbbbbbbbbbt*


Cat Army Sworn Enemy
My only outage wasn't very interesting but I did learn an important lesson about documentation when the RCA team tried to go at me for causing one even though the only reason they found out in the first place is because I documented what happened.

Mustache Ride
Sep 11, 2001



I app banned explorer.exe across the entire enterprise once.

ConfusedUs
Feb 24, 2004

Bees?
You want fucking bees?
Here you go!
ROLL INITIATIVE!!





Mustache Ride posted:

I app banned explorer.exe across the entire enterprise once.

Back in the early 2000s I worked for a small game company who found that one of their game images was being hotlinked frequently all over the internet. This was back in the day when this was a Real Big Concern for everyone. The sysadmin decided to replace that image with tubgirl as a passive-aggressive way of forcing everyone to STOP USING OUR loving BANDWIDTH!

...he didn't realize that same image was used in a ton of our promotional materials.

That was a fun conference call.

RFC2324
Jun 7, 2012

http 418

ConfusedUs posted:

Back in the early 2000s I worked for a small game company who found that one of their game images was being hotlinked frequently all over the internet. This was back in the day when this was a Real Big Concern for everyone. The sysadmin decided to replace that image with tubgirl as a passive-aggressive way of forcing everyone to STOP USING OUR loving BANDWIDTH!

...he didn't realize that same image was used in a ton of our promotional materials.

That was a fun conference call.

amazing

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

ConfusedUs posted:

Back in the early 2000s I worked for a small game company who found that one of their game images was being hotlinked frequently all over the internet. This was back in the day when this was a Real Big Concern for everyone. The sysadmin decided to replace that image with tubgirl as a passive-aggressive way of forcing everyone to STOP USING OUR loving BANDWIDTH!

...he didn't realize that same image was used in a ton of our promotional materials.

That was a fun conference call.

I know of a company where the admin did something similar, forgot about it, and tubgirl got into a distributors catalog.

Arquinsiel
Jun 1, 2006

"There is no such thing as society. There are individual men and women, and there are families. And no government can do anything except through people, and people must look to themselves first."

God Bless Margaret Thatcher
God Bless England
RIP My Iron Lady

Thomamelas posted:

Like someone who pre-buys crypto currency for this situation would put the wallet anywhere other than a share that gets hit.
It turned out that the rest of entire team was enthusiastic alt-coin traders so this guy was looking for someone else who would take his side at lunchtime arguments. Apparently just acknowledging that it was too volatile for you to really rule out being able to cover the decryption bill was

Also lol at you thinking they had a share that wouldn't get hit. 445 blocked at the border and nowhere else.

sfwarlock
Aug 11, 2007
My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?

Internet Explorer
Jun 1, 2005





Hit F5 on my Reddit tab.

Methanar
Sep 26, 2013

by the sex ghost

sfwarlock posted:

My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?

go get a snack from the kitchen: time for a break

i am a moron
Nov 12, 2020

"I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

sfwarlock posted:

My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?

Put eight hours down on my time card and take off, that's what I call a win/win baby.

angry armadillo
Jul 26, 2010
My 2 go to outages are:

1. The time I got a call from a senior manager saying a large area of our workplace had gone offline - basically there are 6 buildings, each with their own data cab. Apparently the previous IT manager decided to pull fibres down there himself, then pull some random Cat5 network of cross links between the 6 buildings. My boss and I had never fully go to the bottom of where all those cables went until this day... It transpires I'd been in the server room and must have butt smashed the panel causing a fibre tail to break and in turn the 6 buildings all went offline. We very quickly had to cable test from cab to cab to work out where everything went and managed to find a way to get it all back on - my boss told the senior manager a rat had chewed a cable somewhere... thanks boss! Lesson learned - be careful in server rooms!

2. Our old web proxy did an AD sync once a day but also had to be purged manually if you changed anyone's username, no one apart from the Head of IT had access to the proxy and it was always a bit tricky to get him to actually do anything like press the purge button - he (wrongly) believed the AD overnight sync fixed all issues and would make you wait. Someone got married and I could not convince anyone the purge button needed to be pressed even though I knew I was right... so, that meant I had this user complaining their internet access wouldn't work... In my infinite wisdom, I decided the best way to resolve the proxy issue would be to delete her AD account and recreate her with the correct name, so it would sync straight in to the proxy and et voila - internet access restored!

Whilst I worked around a problem, it was very soon after that I became intimately acquainted with exchange mailbox restores.

Arquinsiel
Jun 1, 2006

"There is no such thing as society. There are individual men and women, and there are families. And no government can do anything except through people, and people must look to themselves first."

God Bless Margaret Thatcher
God Bless England
RIP My Iron Lady

Internet Explorer posted:

Hit F5 on my Reddit tab.
Same but twitter.

Impotence
Nov 8, 2010
Lipstick Apathy
I ask them if they know which of the MITM hardware appliances has crashed, because it's probably one of the invasive proxy middleboxes and not the ISP going down

Collateral Damage
Jun 13, 2009

As far as major outages go...

Cut off an entire floor's network access by fatfingering a trunk command on a switch stack, which also dropped my own remote access to the switch stack and forced me to barge in on a board meeting with a laptop and serial cable. (the network closet for that floor was attached to the biggest meeting room)

Nuked a production database because our test and prod environments weren't airgapped and it wasn't obvious that you could accidentally connect to the prod database from the test app server. Fortunately it had locally stored backups and was restored within an hour with no data loss.

AlexDeGruven
Jun 29, 2007

Watch me pull my dongle out of this tiny box


I don't recall any actual outages at my hand outside of "oops, this app is offline for a minute, now it's back", but I've caused a lot of mild panic over my multiple decades behind the keyboard.

Most recently, I was trying to tftp boot a box to rebuild it and the tftp process kept loving up. I tried it about 6 times and someone from the monitoring team came over and said "We keep seeing blips on the prod server, can you see what's up?", and before I even looked I knew I fat-fingered the IP. So every time I tried to trigger the tftp boot, it caused an IP conflict for a moment. But nothing actually went down, and there was no actual external effects as it was 7am, and not enough people were online to notice.

Data Graham
Dec 28, 2009

📈📊🍪😋



As someone who worked for FireEye back in the day (~2006-08) the recent headlines about the breach are pretty lol, and also :stare:

When I was there the company was days away from going under, and laid me off along with the entire QA staff in a hail-mary to keep the doors open long enough to get the product we were working on out the door and qualify for like one more round of venture funding. Apparently it worked

CaptainJuan
Oct 15, 2008

Thick. Juicy. Tender.

Imagine cutting into a Barry White Song.
ran a sql query that contained a cartesian product on the live db and filled up a server drive, bringing down part of our LOB web app

that was a fun 20 minutes

ConfusedUs
Feb 24, 2004

Bees?
You want fucking bees?
Here you go!
ROLL INITIATIVE!!





sfwarlock posted:

My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?

Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't).

i am a moron
Nov 12, 2020

"I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

ConfusedUs posted:

Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't).

:vince:

ssb
Feb 16, 2006

WOULD YOU ACCOMPANY ME ON A BRISK WALK? I WOULD LIKE TO SPEAK WITH YOU!!


ConfusedUs posted:

Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't).

Are you looking for a change to a new underpaid job? Because you have passed the interview.

ConfusedUs
Feb 24, 2004

Bees?
You want fucking bees?
Here you go!
ROLL INITIATIVE!!





shortspecialbus posted:

Are you looking for a change to a new underpaid job? Because you have passed the interview.

No. I just got a pseudo promotion without a raise but a much better title and no actual change in duties, so I'm good, thanks.

sfwarlock
Aug 11, 2007

sfwarlock posted:

My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?

So I used to think that there were two kinds of IT people: those who would start at the user's workstation and troubleshoot "out" and those that would start at a high level and troubleshoot "in".

Then I asked this question five-six years back, and the person leaned back in his chair, smiled benevolently, and said with a gentle smile, "I would calm her down by explaining that the internet is not down, it's her connection to the internet that is having issues. Otherwise, we would have world-wide chaos and panic. I'd have her go back and reboot, and I'd be along in a couple minutes to make sure she was okay."

ConfusedUs posted:

Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't).

10/10. Always scope the issue.

AlexDeGruven
Jun 29, 2007

Watch me pull my dongle out of this tiny box


"Just provision another TB on the db box, that should be trivial"

Application people are the worst.

Context: This is a physical box with on-board storage that was not put on the SAN because the nature of the app was that it was going away in a couple of years after we bought this hardware. They now want to more than triple the DB space. Because of course they do.

sfwarlock posted:

So I used to think that there were two kinds of IT people: those who would start at the user's workstation and troubleshoot "out" and those that would start at a high level and troubleshoot "in".

Then I asked this question five-six years back, and the person leaned back in his chair, smiled benevolently, and said with a gentle smile, "I would calm her down by explaining that the internet is not down, it's her connection to the internet that is having issues. Otherwise, we would have world-wide chaos and panic. I'd have her go back and reboot, and I'd be along in a couple minutes to make sure she was okay."

Whenever I got a call with "Is the Internet down?", I would respond "Well, I certainly hope not" and laugh to myself because I'm a loving neckbeard only missing a fedora sometimes.

lament.cfg
Dec 28, 2006

we have such posts
to show you




sfwarlock posted:

So I used to think that there were two kinds of IT people: those who would start at the user's workstation and troubleshoot "out" and those that would start at a high level and troubleshoot "in".

Then I asked this question five-six years back, and the person leaned back in his chair, smiled benevolently, and said with a gentle smile, "I would calm her down by explaining that the internet is not down, it's her connection to the internet that is having issues. Otherwise, we would have world-wide chaos and panic. I'd have her go back and reboot, and I'd be along in a couple minutes to make sure she was okay."


10/10. Always scope the issue.

:thunk:

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

sfwarlock posted:

My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?

Sarcastically respond: THE WHOLE INTERNET? OMFG IS IT ON CNN WHAT IS JEFF BEZOS DOING. Get the gently caress out of my office, Carl.

Adbot
ADBOT LOVES YOU

Internet Explorer
Jun 1, 2005





Honestly, "the internet is down" doesn't even bother me. It's not hard to understand what they mean when they say it. I'm sure many of us have even said "the internet is down" either at home or at work.

Now, "the system is down" is a different story. That one makes me cringe. No, your home page not loading does not mean the system is down. Please don't go around saying that.

https://www.youtube.com/watch?v=JwZwkk7q25I

Internet Explorer fucked around with this message at 17:51 on Dec 9, 2020

  • 1
  • 2
  • 3
  • 4
  • 5