Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

mayodreams posted:

We only have 1 vCenter instance that serves both of our data centers, and since we use distributed switches, I'd like to have a failover instance in the other DC.

Literally 3 weeks after I started here, we lost our vCenter server and had to start from scratch and manually move the vms through scripting to the ~~new~~ distributed switches while praying we didn't have a host reboot or crash.

This is what backups are for. HA is for availability, not recovery, which is why it traditionally hasn't been critical for VCenter, because availability is not the biggest concern for most customers.

# ? Nov 18, 2016 05:07

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 17:57

some kinda jackal: Feb 25, 2003; �
�

Am I missing something incredibly obvious about moving VMs into VM folders on the new VCSA HTML5 UI or is it not supported? Every time I try to drag a VM to a VM folder in a datacenter I get a black (x) which I assume tells me there's something wrong.

All said, I still much prefer the HTML5 UI since I really only do basic tasks.

# ? Nov 18, 2016 05:07

mayodreams: Jul 4, 2003; Hello darkness,
my old friend

big money big clit posted:

This is what backups are for. HA is for availability, not recovery, which is why it traditionally hasn't been critical for VCenter, because availability is not the biggest concern for most customers.

It's being backed up to it's own instance of VDP

:smithicide:

# ? Nov 18, 2016 05:40

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

theperminator posted:

What issues have you seen with LACP? We recently deployed a new cluster and went with LACP for the first time so I'm hoping it will be stable...
We did have issues already but they turned out to be a driver issue for the Intel i40e NIC Driver, an update cured it.

Short/long timeout mismatches because vdSwitches default to long timeouts and you can't configure it without ssh-ing in to each host after startup, LACP drops negotiation on at least one link at least daily and VMware has absolutely no idea why it is doing it. It's been a trip. This is with the updated Intel driver, I did that with the default driver flooded the logs with nonsense.

# ? Nov 18, 2016 16:06

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

We run two clusters with their storage cross-replicating and standing up HA vCenter instances has been a priority so in the event that we have to implement our DR plan because the Bad Thing happened we don't have to monkey around with getting the management server online first. Saves an hour or two on the time to recovery which is important when we're shooting for having all services at least online inside 6 hours (though some things will need DB restores out of the hourly backups to get them more current). You should probably have similar concerns if you are running some kind of DR plan with site recovery.

# ? Nov 18, 2016 16:11

Thanks Ants: May 21, 2004; #essereFerrari

mayodreams posted:

It's being backed up to it's own instance of VDP

They seem to have addressed this in the latest VCSA

# ? Nov 18, 2016 17:30

mayodreams: Jul 4, 2003; Hello darkness,
my old friend

Our VMware environment was less than ideal when I came in. We are almost finished with rebuilding all of the hosts with 6.1U2 OEM specific ESXi images rather than a mix of vanilla and OEM 5.5, as well as getting everything on the SAME version of ESXi, which was also an issue.

The majority of the guests are Centos 6.x and were using E1000 NICs and did not have VMware tools running. I had to campaign HARD to get that initiative going, and a few vMotions where the VMs were unresponsive for a bit helped expedite that.

We also weren't updating firmware on the hosts so that was magical too.

Our environment was the epitome of someone who had a home lab and was tasked with building out a whole infrastructure based on that experience.

# ? Nov 18, 2016 18:21

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

BangersInMyKnickers posted:

We run two clusters with their storage cross-replicating and standing up HA vCenter instances has been a priority so in the event that we have to implement our DR plan because the Bad Thing happened we don't have to monkey around with getting the management server online first. Saves an hour or two on the time to recovery which is important when we're shooting for having all services at least online inside 6 hours (though some things will need DB restores out of the hourly backups to get them more current). You should probably have similar concerns if you are running some kind of DR plan with site recovery.

VEEAM, Zerto, vSphere replication, storage replication, etc, all provided methods to get vCenter back online relatively quickly. Or simply running vCenter from the DR site, if it's a single vCenter environment so there's no failover to be done. vCenter HA is a synchronous replication cluster so 10ms latency is the maximum allowed between all of the nodes, which is going to be a limiting factor for a lot of places that don't have fiber direct to their DR site. It's certainly a nice thing to have, but it's rarely been the thing that makes or breaks a DR plan.

# ? Nov 18, 2016 18:36

mayodreams: Jul 4, 2003; Hello darkness,
my old friend

More vSphere 6.5 notes:

1) The thick client won't connect to VCSA 6.5 anymore. It authenticates and errors out trying to load the data.
2) I just got burned by the root password being locked out after migration. I had to edit the grub loader (VERY hard to get into even with a boot delay) to load to bash, mount / with RW, and then do a passwd command. The root password thing burned me going to 5.5 --> 6.x because of the 90 day time out. VMware KB is not updated for VCSA which is different than 5.x/6.x because they moved to Photon instead of SLES.
3) Chrome doesn't work any better than with 6.1u2. The admin interface for VCSA would kick me out right after loading. Firefox works fine.

# ? Nov 18, 2016 23:03

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

There seems to be something goofy going on with password input on the appliance during setup. Neither worked properly when I did an auto-generated password out of keepass (20 char, w/ symbols) but knocking it back to 12 alphanum made it happy. No idea.

# ? Nov 19, 2016 00:12

theperminator: Sep 16, 2009; by Smythe; Fun Shoe

BangersInMyKnickers posted:

Short/long timeout mismatches because vdSwitches default to long timeouts and you can't configure it without ssh-ing in to each host after startup, LACP drops negotiation on at least one link at least daily and VMware has absolutely no idea why it is doing it. It's been a trip. This is with the updated Intel driver, I did that with the default driver flooded the logs with nonsense.

gently caress me, I wish we'd gone with just active/standby like our old cluster.

mayodreams posted:

We also weren't updating firmware on the hosts so that was magical too.

Dell firmware upgrades have caused so much pain for me that I usually err on the side of being conservative with firmware updates unless it's a security update or the release notes show something relevant to an issue we are having.

Equallogic 6.x firmwares were worse every time, from rebooting after 365 days of uptime to getting stuck in a reboot loop after upgrading.

theperminator fucked around with this message at 12:27 on Nov 19, 2016

# ? Nov 19, 2016 12:16

Moey: Oct 22, 2010; I LIKE TO MOVE IT

BangersInMyKnickers posted:

There seems to be something goofy going on with password input on the appliance during setup. Neither worked properly when I did an auto-generated password out of keepass (20 char, w/ symbols) but knocking it back to 12 alphanum made it happy. No idea.

VCSA?

# ? Nov 19, 2016 12:42

mayodreams: Jul 4, 2003; Hello darkness,
my old friend

theperminator posted:

gently caress me, I wish we'd gone with just active/standby like our old cluster.

Dell firmware upgrades have caused so much pain for me that I usually err on the side of being conservative with firmware updates unless it's a security update or the release notes show something relevant to an issue we are having.

Equallogic 6.x firmwares were worse every time, from rebooting after 365 days of uptime to getting stuck in a reboot loop after upgrading.

NOT doing Cisco UCS and HP Proliant updates have caused purple screens for me at 2 different stops.

# ? Nov 19, 2016 15:13

devmd01: Mar 7, 2006; Elektronik
Supersonik

Moey posted:

VCSA?

The vcenter server appliance. It's pretty great, I'm switching all of our vcenter instances to it.

# ? Nov 19, 2016 16:59

some kinda jackal: Feb 25, 2003; �
�

e: nvm

# ? Nov 19, 2016 18:34

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Moey posted:

VCSA?

Yep.

# ? Nov 20, 2016 16:36

Salt Fish: Sep 11, 2003; Cybernetic Crumb

Are there any companies that provide full installation and support for VMware on 3rd party hardware (I.E at a datacenter) or even better, any companies that white label such support? I reached out to VMware themselves with my business contact and haven't heard back in a week, and all my searching has turned up very few specifics. I might be bad at reading partnership agreements but it seems like most support providers are keeping their cards face-down regarding their dos and don'ts.

# ? Nov 21, 2016 16:37

devmd01: Mar 7, 2006; Elektronik
Supersonik

Talk to your preferred VAR and make them do the legwork. CDW, etc have huge consulting/services arms as well.

# ? Nov 21, 2016 17:10

devmd01: Mar 7, 2006; Elektronik
Supersonik

mayodreams posted:

More vSphere 6.5 notes:

1) The thick client won't connect to VCSA 6.5 anymore. It authenticates and errors out trying to load the data.

Boo. I use the web client for day to day but for initial host setup the fat client is way faster. I'd script them out with powerCLI but I don't have homogenous hosts.

# ? Nov 21, 2016 17:15

Potato Salad: Oct 23, 2014; nobody cares

Salt what kind of oddball hardware are you looking at? HPC perchance?

# ? Nov 21, 2016 17:36

mayodreams: Jul 4, 2003; Hello darkness,
my old friend

devmd01 posted:

Boo. I use the web client for day to day but for initial host setup the fat client is way faster. I'd script them out with powerCLI but I don't have homogenous hosts.

The thick client won't connect to vCenter 6.5, but it still works with ESXi 6.5 directly to the host. Although the host UI is default now on port 80/443 instead of the landing page.

# ? Nov 21, 2016 18:09

Salt Fish: Sep 11, 2003; Cybernetic Crumb

Potato Salad posted:

Salt what kind of oddball hardware are you looking at? HPC perchance?

Custom built would be ideal because that has the highest margin, but I'm open to vendor specific hardware if it's required. I reached out to CDW and they were super into it, so I'm getting a call together to explore what that would look like for us. I basically want whitelabel management on my hardware/DC space that I can deploy for other companies.

# ? Nov 21, 2016 18:29

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Something in the 6.5 broke the hell out of my storage network. I can no longer ping to my NetApp SVMs that are presenting NFS vols to mount, but I can ping the vmKernels for other hosts on the network. From a working 6.0 host, esxcfg-route -n gives me this:

code:

Neighbor         MAC Address        Interface      Expiry    Type
192.168.20.30    00:50:56:63:8c:ce  vmk1           19m58s    Unknown
192.168.20.10    02:a0:98:7e:7c:f3  vmk1           13m15s    Unknown
192.168.20.11    02:a0:98:7e:7c:0f  vmk1           13m22s    Unknown

.10 and .11 are NetApp IPs

buuuuut on my new 6.5 host

code:

Neighbor         MAC Address        Interface      Expiry    Type
192.168.20.31    00:50:56:68:c7:ec  vmk1            9m51s    Unknown
192.168.20.10    (incomplete)       vmk1           1193046h28Unknown
192.168.20.11    (incomplete)       vmk1           1193046h28Unknown
192.168.20.32    00:50:56:6d:22:5c  vmk1           16m31s    Unknown

How do you get an incomplete mac address? This must be something unique to how NetApp handles their cluster-mode stuff.

# ? Nov 21, 2016 18:48

unknown: Nov 16, 2002; Ain't got no stinking title yet!

(incomplete) means that the system didn't get (or parse) the arp reply. (ie: arp request was created and sent, but nothing has come back yet, so that's it's way of saying "in limbo").

Edit: That's a weird expiry timer. Try deleting the arp entries in case they're corrupted entries?

# ? Nov 21, 2016 19:14

xtal: Jan 9, 2011; by Fluffdaddy

Maybe this is the wrong thread but does anyone else feel like docker is the javascript of systems? So tired of my coworkers begging to use it for no reason other than hype. I see literally 0 benefit because 0 of its ideas are original

xtal fucked around with this message at 21:53 on Nov 21, 2016

# ? Nov 21, 2016 21:41

xarph: Jun 18, 2001

xtal posted:

Maybe this is the wrong thread but does anyone else feel like docker is the javascript of systems? So tired of my coworkers begging to use it for no reason other than hype. I see literally 0 benefit because 0 of its ideas are original

It has a shiny UI and lets engineers get around VM allocation quotas by requesting a VM, installing docker on it, and then blowing up the filesystem by running device-mapper in loopback and running the root filesystem out of space.

So, yes.

# ? Nov 21, 2016 22:20

theperminator: Sep 16, 2009; by Smythe; Fun Shoe

devmd01 posted:

Boo. I use the web client for day to day but for initial host setup the fat client is way faster. I'd script them out with powerCLI but I don't have homogenous hosts.

I found that the HTML5 interface for host configuration is actually quite fast and responsive, not sure if it has 100% of the fat client feature set though

xtal posted:

Maybe this is the wrong thread but does anyone else feel like docker is the javascript of systems? So tired of my coworkers begging to use it for no reason other than hype. I see literally 0 benefit because 0 of its ideas are original

You're not the only one, I've noticed that there's a lot of "lets do X because it's cool!" with no mention of how it's helpful to the business, it seems like people think work is like adult daycare now.

I think what they focus on when they think Docker though is the API etc rather than the actual virtualization aspect

There are some benefits in that it simplifies deployment & scaling of your application, and allows you to push the container through a pipeline i.e dev > test > prod knowing that it should work the same way in all of the stages.
If you have a microservices architecture or running web services that you want to be able to easily scale horizontally it can be quite useful I think.
But I believe that the biggest problems we found were that the security isn't good enough for what we do, the networking aspect is garbage & our applications aren't really conducive to horizontal scaling yet.

This is why I'm kinda excited about vSphere Integrated containers, everything runs as it's own VM and the networking is the same as any other VM rather than a bridge, which means we could use NSX to control access between each container.

theperminator fucked around with this message at 22:37 on Nov 21, 2016

# ? Nov 21, 2016 22:34

Pile Of Garbage: May 28, 2007

ratbert90 posted:

If I used docker I wouldn't turn off selinux because if you do you are a dumpster fire trash of admin and shouldn't touch Linux ever.

# ? Nov 21, 2016 23:47

xtal: Jan 9, 2011; by Fluffdaddy

Secure linux is an oxymoron

# ? Nov 21, 2016 23:53

Thanks Ants: May 21, 2004; #essereFerrari

Shaggar?

# ? Nov 22, 2016 00:05

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

VMware people: is there some secret way to make ESXi not freak out if you receive 802.1q encapsulated packets with the 802.1p priority field set, or are you basically just hosed if you run Arista?

# ? Nov 22, 2016 00:30

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

Distributed vSwitch can honor/set 802.1p values. Turn on network IO control I believe and you should see it in the dvSwitch settings somewhere.

What's the Arista sending you the traffic with? How is the machine freaking out?

# ? Nov 22, 2016 09:32

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

1000101 posted:

Distributed vSwitch can honor/set 802.1p values. Turn on network IO control I believe and you should see it in the dvSwitch settings somewhere.

What's the Arista sending you the traffic with? How is the machine freaking out?

We're on standard vSwitches, not dvSwitches, and we're unlikely to pay a five-figure license uplift just to make basic networking function without static ARP. The Arista sends L3 gateway replies with the priority field set and the ESXi host just gives up and drops the traffic. Effectively, we're able to communicate with other things on the subnet, but not the default gateway.

# ? Nov 22, 2016 14:27

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

unknown posted:

(incomplete) means that the system didn't get (or parse) the arp reply. (ie: arp request was created and sent, but nothing has come back yet, so that's it's way of saying "in limbo").

Edit: That's a weird expiry timer. Try deleting the arp entries in case they're corrupted entries?

I'm guessing its seeing the NetApps doing gratuitous arps and deciding its a malicious device on the network and blocking it. Either way I'm mucking back in to support hell.

# ? Nov 22, 2016 14:52

unknown: Nov 16, 2002; Ain't got no stinking title yet!

BangersInMyKnickers posted:

I'm guessing its seeing the NetApps doing gratuitous arps and deciding its a malicious device on the network and blocking it. Either way I'm mucking back in to support hell.

Or your switch is blocking broadcast packets from your servers, so the netapp never gets the arp request in the first place.

Simple test of that is to ping from the netapp to your servers (reverses the arp process). Your servers should then automatically learn the mac address from the netapp.

Other test would be to hardcode the arp on the server temporarily. If it works, you've got broadcast filtering in place.

# ? Nov 22, 2016 16:40

Docjowles: Apr 9, 2009

He said it worked fine before upgrading vSphere and changing nothing else, though.

However, the first rule of troubleshooting is "I didn't change anything else" means "I definitely changed something else".

# ? Nov 22, 2016 17:02

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

I have two other hosts on 6.0 running the same vdSwitch, same switch stack uplink between them and NetApp, still working fine. Its something with 6.5 but I can't tell what at this point besides arp being shot.

# ? Nov 22, 2016 17:21

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Okay, I don't have a reason why any of this happened at this point, but when I did the upgrade to 6.5 it decided to load both the latest 1.4.28 intel nic vib with 5.x/6.x compatibility and an old as dirt 1.1.0 one with only 6.5.0 compatibility and for some reason it decided it should be using the old lovely one that is broken. So yeah, check your vibs after upgrading if you're running Intel nics.

e: It's an issue with the Dell customized ISO. They loaded the latest driver in but didn't remove the default VMware one and it makes a goofy conflict. Be warned.

BangersInMyKnickers fucked around with this message at 17:58 on Nov 23, 2016

# ? Nov 23, 2016 17:36

devmd01: Mar 7, 2006; Elektronik
Supersonik

That is bizarre as hell, I hate figuring out issues like that. While I don't have the same setup you've described, definitely something to watch for if I stick around here long enough to upgrade us from 6.0u2 to 6.5.

# ? Nov 24, 2016 00:00

Adbot: ADBOT LOVES YOU

# ? Apr 26, 2024 17:57

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

To update Virtualbox on Windows (10) do I just run the new installer? Do I need to uninstall or anything first?

# ? Nov 27, 2016 19:25

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »