|
devmd01 posted:“Tell me about a time you caused an outage” (I plugged a T1 into an ethernet port on a phone system once)
|
# ? Dec 8, 2020 17:12 |
|
|
# ? Mar 29, 2024 09:31 |
|
I've never caused an outage. I've hosed up a bunch in ways that would have been outages if they happened outside the maintenance window tho. Only had to extend the window because of my poo poo twice 😂
|
# ? Dec 8, 2020 17:23 |
|
I don't think I've ever caused an outage either. There was one time where I made a typo in our MDM-d wifi profile, which knocked everyone offline till they connected to our guest network and got the corrected profile 30 seconds later. I have no doubt that, now that I'm working with distributed systems, outages will become a more common affair. also thanks for the interview questions y'all! I stole a few of them. also added "what do you do when you don't know how to solve something" in the forlorn hope we hire someone who's confident enough to google their own problems rather than escalating every time. vvvv oh that does remind me i briefly brought down our domain controllers because i forgot to check the IP address after rebooting post-patching. briefly lost DNS for an hour or two, but fortunately only like 3 IT people were in the office and a static IP address later all was well with the world. The Iron Rose fucked around with this message at 17:33 on Dec 8, 2020 |
# ? Dec 8, 2020 17:26 |
|
I blew up one of our sites updating a Server 08R2 machine last year. Had to drive over and static IP/DNS everything, managed to get into the bootlooping server with directory services restore mode, (not even safe mode worked) pulled the data, uploaded it to a server near the site, set up a share, applied the existing security groups to it, and configured automapping to point to the new location.
|
# ? Dec 8, 2020 17:30 |
|
RFC2324 posted:I've never caused an outage. I've toasted the odd router/switch with a firmware update, but that's not my fault as such, it's on the manufacturers . And a client's DNS records got blanked moving to a new registrar, but I'd kept a backup so it was a trivial fix. I did once manage to gently caress up and create a group policy that would have locked out all admin accounts (I think - it was something similarly ghastly), but I managed to realise that before I logged out of the one I was logged in on and reversed it. I'm a fanatical documenter and backer-up of settings and it's been a godsend over the years.
|
# ? Dec 8, 2020 18:11 |
|
Gats Akimbo posted:I've toasted the odd router/switch with a firmware update, but that's not my fault as such, it's on the manufacturers . And a client's DNS records got blanked moving to a new registrar, but I'd kept a backup so it was a trivial fix. I'm pretty fanatic about making sure I 100% understand the commands I am issuing, particularly after mirroring a new drive in the wrong direction and wiping the partition table on the old production drive
|
# ? Dec 8, 2020 18:17 |
|
You guys never pulled one of those on a production DB?
|
# ? Dec 8, 2020 18:35 |
|
Bob Morales posted:
I've always been stupid careful with databases because I don't understand why they are so complicated. And like I said, I've screwed up stuff, but only during maintenance windows when it didn't count as an outage.
|
# ? Dec 8, 2020 18:52 |
|
RFC2324 posted:I've never caused an outage. Oof... I have... indirectly. It's not my fault someone scheduled a load test in a datacenter that hosted all of a particular application for all of europe. It's also not my fault our load test ground the network to a screeching halt for 6 hours.
|
# ? Dec 8, 2020 19:16 |
|
lol if you've never had to debug an outage you caused while in the backseat of your boss's car as he road trips the SRE team from virginia to new york
|
# ? Dec 8, 2020 19:25 |
|
Paladine_PSoT posted:Oof... I have... indirectly. poo poo, I lied, I have caused one outage. I didn't realize taking a core dump of a java app would freeze the app while the dump ran, and was told to try and get one of a problem we had before the problem took down everything. I was also given access to prod, but not dev or test(not that the issue existed outside prod)
|
# ? Dec 8, 2020 19:25 |
|
The Iron Rose posted:this is a great question that I will be stealing in... two hours or so I like things that are conceptual and show that they understand how to do the less-technically-specific things of the job. It's great if they remember what DNS stands for and can recite LVM commands off the top of their head, but both of those things are easy to quick reference so long as you understand DNS and/or LVM for example. Here's a couple (somewhat tweaked) examples that we've used before (job was a senior linux engineer/admin). I think I mentioned a couple of these in the thread around the post you quoted: * When designing an IT service or solution for a customer, what are some of the considerations that go into your design? (wanted to hear that they had an idea to design enterprise solutions factoring in scale, HA needs, cost, security, vendor requirements, etc) * When do you choose to write a script rather than implement a solution by hand (script could include config management stuff, basically anything was OK here so long as it made sense and showed that they at least knew how to do this stuff) * You have to do a critical security patch off-schedule - think heartbleed type severity. Please outline at a high level the technical steps and change management processes you'd follow. (they have to have some concept of test before prod, understand how change management works at least to some degree, and they should mention how the patch is deployed and show some degree of understanding how to do that on umpteen servers (yum, config management, etc etc))
|
# ? Dec 8, 2020 20:10 |
|
Methanar posted:lol if you've never had to debug an outage you caused while in the backseat of your boss's car as he road trips the SRE team from virginia to new york I once did a point of sale software push while sitting in the back seat of the company jet waiting for our clearance to takeoff. I was working on it in the waiting area as the final plane prep was being done and I managed to forget to hand in my rental car keys. Oops.
|
# ? Dec 8, 2020 20:16 |
|
My go-to is when I caused 40 doors to lock shut for two hours because they all failed 'safe' during a firmware update that I didn't know was about to happen.
|
# ? Dec 8, 2020 20:24 |
|
My only outage wasn't very interesting but I did learn an important lesson about documentation when the RCA team tried to go at me for causing one even though the only reason they found out in the first place is because I documented what happened.
|
# ? Dec 8, 2020 21:25 |
|
I app banned explorer.exe across the entire enterprise once.
|
# ? Dec 8, 2020 21:51 |
Mustache Ride posted:I app banned explorer.exe across the entire enterprise once. Back in the early 2000s I worked for a small game company who found that one of their game images was being hotlinked frequently all over the internet. This was back in the day when this was a Real Big Concern for everyone. The sysadmin decided to replace that image with tubgirl as a passive-aggressive way of forcing everyone to STOP USING OUR loving BANDWIDTH! ...he didn't realize that same image was used in a ton of our promotional materials. That was a fun conference call.
|
|
# ? Dec 8, 2020 21:59 |
|
ConfusedUs posted:Back in the early 2000s I worked for a small game company who found that one of their game images was being hotlinked frequently all over the internet. This was back in the day when this was a Real Big Concern for everyone. The sysadmin decided to replace that image with tubgirl as a passive-aggressive way of forcing everyone to STOP USING OUR loving BANDWIDTH! amazing
|
# ? Dec 8, 2020 22:26 |
|
ConfusedUs posted:Back in the early 2000s I worked for a small game company who found that one of their game images was being hotlinked frequently all over the internet. This was back in the day when this was a Real Big Concern for everyone. The sysadmin decided to replace that image with tubgirl as a passive-aggressive way of forcing everyone to STOP USING OUR loving BANDWIDTH! I know of a company where the admin did something similar, forgot about it, and tubgirl got into a distributors catalog.
|
# ? Dec 8, 2020 23:25 |
|
Thomamelas posted:Like someone who pre-buys crypto currency for this situation would put the wallet anywhere other than a share that gets hit. Also lol at you thinking they had a share that wouldn't get hit. 445 blocked at the border and nowhere else.
|
# ? Dec 9, 2020 00:00 |
|
My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first?
|
# ? Dec 9, 2020 05:59 |
|
Hit F5 on my Reddit tab.
|
# ? Dec 9, 2020 06:15 |
|
sfwarlock posted:My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first? go get a snack from the kitchen: time for a break
|
# ? Dec 9, 2020 06:48 |
sfwarlock posted:My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first? Put eight hours down on my time card and take off, that's what I call a win/win baby.
|
|
# ? Dec 9, 2020 07:24 |
|
My 2 go to outages are: 1. The time I got a call from a senior manager saying a large area of our workplace had gone offline - basically there are 6 buildings, each with their own data cab. Apparently the previous IT manager decided to pull fibres down there himself, then pull some random Cat5 network of cross links between the 6 buildings. My boss and I had never fully go to the bottom of where all those cables went until this day... It transpires I'd been in the server room and must have butt smashed the panel causing a fibre tail to break and in turn the 6 buildings all went offline. We very quickly had to cable test from cab to cab to work out where everything went and managed to find a way to get it all back on - my boss told the senior manager a rat had chewed a cable somewhere... thanks boss! Lesson learned - be careful in server rooms! 2. Our old web proxy did an AD sync once a day but also had to be purged manually if you changed anyone's username, no one apart from the Head of IT had access to the proxy and it was always a bit tricky to get him to actually do anything like press the purge button - he (wrongly) believed the AD overnight sync fixed all issues and would make you wait. Someone got married and I could not convince anyone the purge button needed to be pressed even though I knew I was right... so, that meant I had this user complaining their internet access wouldn't work... In my infinite wisdom, I decided the best way to resolve the proxy issue would be to delete her AD account and recreate her with the correct name, so it would sync straight in to the proxy and et voila - internet access restored! Whilst I worked around a problem, it was very soon after that I became intimately acquainted with exchange mailbox restores.
|
# ? Dec 9, 2020 10:35 |
|
Internet Explorer posted:Hit F5 on my Reddit tab.
|
# ? Dec 9, 2020 10:44 |
|
I ask them if they know which of the MITM hardware appliances has crashed, because it's probably one of the invasive proxy middleboxes and not the ISP going down
|
# ? Dec 9, 2020 10:45 |
|
As far as major outages go... Cut off an entire floor's network access by fatfingering a trunk command on a switch stack, which also dropped my own remote access to the switch stack and forced me to barge in on a board meeting with a laptop and serial cable. (the network closet for that floor was attached to the biggest meeting room) Nuked a production database because our test and prod environments weren't airgapped and it wasn't obvious that you could accidentally connect to the prod database from the test app server. Fortunately it had locally stored backups and was restored within an hour with no data loss.
|
# ? Dec 9, 2020 12:25 |
|
I don't recall any actual outages at my hand outside of "oops, this app is offline for a minute, now it's back", but I've caused a lot of mild panic over my multiple decades behind the keyboard. Most recently, I was trying to tftp boot a box to rebuild it and the tftp process kept loving up. I tried it about 6 times and someone from the monitoring team came over and said "We keep seeing blips on the prod server, can you see what's up?", and before I even looked I knew I fat-fingered the IP. So every time I tried to trigger the tftp boot, it caused an IP conflict for a moment. But nothing actually went down, and there was no actual external effects as it was 7am, and not enough people were online to notice.
|
# ? Dec 9, 2020 13:52 |
As someone who worked for FireEye back in the day (~2006-08) the recent headlines about the breach are pretty lol, and also When I was there the company was days away from going under, and laid me off along with the entire QA staff in a hail-mary to keep the doors open long enough to get the product we were working on out the door and qualify for like one more round of venture funding. Apparently it worked
|
|
# ? Dec 9, 2020 14:21 |
|
ran a sql query that contained a cartesian product on the live db and filled up a server drive, bringing down part of our LOB web app that was a fun 20 minutes
|
# ? Dec 9, 2020 15:43 |
sfwarlock posted:My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first? Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't).
|
|
# ? Dec 9, 2020 16:24 |
ConfusedUs posted:Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't).
|
|
# ? Dec 9, 2020 16:25 |
|
ConfusedUs posted:Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't). Are you looking for a change to a new underpaid job? Because you have passed the interview.
|
# ? Dec 9, 2020 16:45 |
shortspecialbus posted:Are you looking for a change to a new underpaid job? Because you have passed the interview. No. I just got a pseudo promotion without a raise but a much better title and no actual change in duties, so I'm good, thanks.
|
|
# ? Dec 9, 2020 17:05 |
|
sfwarlock posted:My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first? So I used to think that there were two kinds of IT people: those who would start at the user's workstation and troubleshoot "out" and those that would start at a high level and troubleshoot "in". Then I asked this question five-six years back, and the person leaned back in his chair, smiled benevolently, and said with a gentle smile, "I would calm her down by explaining that the internet is not down, it's her connection to the internet that is having issues. Otherwise, we would have world-wide chaos and panic. I'd have her go back and reboot, and I'd be along in a couple minutes to make sure she was okay." ConfusedUs posted:Wonder why I have internet--which I obviously do because I'm checking email--and they don't. Load a webpage to be sure. Ask my buddy next to me if he does. If we've got internet, it's not a site outage, it's just this guy. Then arrange for a ticket to be opened for the guy who ran in (give him phone number if that's a thing, open it manually for him if it doesn't). 10/10. Always scope the issue.
|
# ? Dec 9, 2020 17:19 |
|
"Just provision another TB on the db box, that should be trivial" Application people are the worst. Context: This is a physical box with on-board storage that was not put on the SAN because the nature of the app was that it was going away in a couple of years after we bought this hardware. They now want to more than triple the DB space. Because of course they do. sfwarlock posted:So I used to think that there were two kinds of IT people: those who would start at the user's workstation and troubleshoot "out" and those that would start at a high level and troubleshoot "in". Whenever I got a call with "Is the Internet down?", I would respond "Well, I certainly hope not" and laugh to myself because I'm a loving neckbeard only missing a fedora sometimes.
|
# ? Dec 9, 2020 17:20 |
|
sfwarlock posted:So I used to think that there were two kinds of IT people: those who would start at the user's workstation and troubleshoot "out" and those that would start at a high level and troubleshoot "in".
|
# ? Dec 9, 2020 17:21 |
|
sfwarlock posted:My favorite interview question is: you're sitting at your desk, catching up on all the email that you should have read at some point, when one of the staff runs in and tells you that the internet is down. What do you do first? Sarcastically respond: THE WHOLE INTERNET? OMFG IS IT ON CNN WHAT IS JEFF BEZOS DOING. Get the gently caress out of my office, Carl.
|
# ? Dec 9, 2020 17:21 |
|
|
# ? Mar 29, 2024 09:31 |
|
Honestly, "the internet is down" doesn't even bother me. It's not hard to understand what they mean when they say it. I'm sure many of us have even said "the internet is down" either at home or at work. Now, "the system is down" is a different story. That one makes me cringe. No, your home page not loading does not mean the system is down. Please don't go around saying that. https://www.youtube.com/watch?v=JwZwkk7q25I Internet Explorer fucked around with this message at 17:51 on Dec 9, 2020 |
# ? Dec 9, 2020 17:39 |