Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »

vty: Nov 8, 2007; oh dott, oh dott!

1000101 posted:

It's always been based on allocated RAM and features you provide for said VMs.

What sort of app is the VM running if you don't mind my asking?

It's a DC with a large file share (I know, I know).

It should be VERY low use after hours.

# ? Dec 12, 2012 07:07

Adbot: ADBOT LOVES YOU

# ? Apr 25, 2024 15:51

evil_bunnY: Apr 2, 2003

vty posted:

It's a DC with a large file share (I know, I know).

NO EXCUSES! :mad:

# ? Dec 12, 2012 10:43

Erwin: Feb 17, 2006

vty posted:

It's a DC with a large file share (I know, I know).

Come on man :smith:

Maybe attach a new VMDK on the target storage to the VM, robocopy the share, repoint the share, delete the old share and VMDK, then svMotion the system drive.

I mean, your share's not on the system drive at least, right? Right?

Erwin fucked around with this message at 14:41 on Dec 12, 2012

# ? Dec 12, 2012 14:21

Kachunkachunk: Jun 6, 2011

Corvettefisher posted:

I wouldn't hope for much update to the client, I have a STRONG feeling most(if not all)of the administration will be don't via web console in 6. I just really wish they didn't have it in flash.

Really my point was that the C# client install in this case was probably bugged/broken. My GA 5.1 clients in numerous installs/labs showed the hardware version 9 option without any weirdness.

But yes, it's all going into the browser in some form later.

# ? Dec 12, 2012 17:24

vty: Nov 8, 2007; oh dott, oh dott!

Erwin posted:

Come on man

Maybe attach a new VMDK on the target storage to the VM, robocopy the share, repoint the share, delete the old share and VMDK, then svMotion the system drive.

I mean, your share's not on the system drive at least, right? Right?

This is basically where I'm at now since my SvMotion won't work. Robocopying everything over.

C'est la vie.

# ? Dec 12, 2012 20:46

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Few quick easy questions if anyone cares to help me out before I begin testing. Trust me this may seem pretty stupid, but the orders came from above so I figure I will just do it on my way out the door here (last day in less than 2 weeks).

We have a 10 bay NAS unit that I have been tasked to setup as our "oh poo poo" box. My boss wants us to be able to dump whatever files we care to from the network onto this thing, as well as it house some replicated VMs (this would be like a 3rd backup of VMs). In the event of our building being evacuated, this would allow someone to grab this thing on the way out and bring it to our DR site. From there, they want to be able to access any files dumped to it, as well as power on any VMs that have been replicated.

Am I right in thinking that setting this thing up as a big NFS share would work? One folder setup for replicated VMs and one folder setup for the rest of the random crap.

# ? Dec 12, 2012 20:58

Kachunkachunk: Jun 6, 2011

Each folder on the NAS can be a share (export) with different settings, including ACLs, approving specific IPs or ranges.

For ESXi you would want one export or share with and you can define another for other use, especially if you have different security concerns between the two.
The two exports can certainly reside on the same volume or root-storage.

Or if you want, ESXi can leverage NFS, while you define another folder to be exported as CIFS/SMB. Or both.

# ? Dec 12, 2012 21:24

evil_bunnY: Apr 2, 2003

Moey posted:

this would allow someone to grab this thing on the way out

I'm pretty sure that's against the law.

Moey posted:

and bring it to our DR site. From there, they want to be able to access any files dumped to it, as well as power on any VMs that have been replicated.

Why are you not setting up this NAS *at* the DR site?

# ? Dec 12, 2012 21:34

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Thanks, that is what I was imagining, just I don't really deal with NFS ever (all iSCSI shop).

# ? Dec 12, 2012 21:48

Moey: Oct 22, 2010; I LIKE TO MOVE IT

evil_bunnY posted:

I'm pretty sure that's against the law.

Why are you not setting up this NAS *at* the DR site?

We currently do have shared storage out at the DR site, but the current link between the production and DR is too slow to have any worthwhile replication going. Currently backups are ran on site, then replicated to an external drive which is shipped off site every day. In the event of a DR, recovering from those backups would be fine, just time consuming. This is my bosses little scheme to have some "cold" VMs that are relatively up to date that could be brought online pretty instantly.

There is some logic to the idea, but it has too many holes to really ever be relied on. If the building is on fire, do I really want to lug out a 10 bay NAS with me?

# ? Dec 12, 2012 21:51

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Your oshit device sounds more like a backup device than anything.

What is your RTO and how much data loss is acceptable 1 hr, 4 hrs, day?

Dilbert As FUCK fucked around with this message at 22:10 on Dec 12, 2012

# ? Dec 12, 2012 22:06

Erwin: Feb 17, 2006

Moey posted:

We currently do have shared storage out at the DR site, but the current link between the production and DR is too slow to have any worthwhile replication going. Currently backups are ran on site, then replicated to an external drive which is shipped off site every day. In the event of a DR, recovering from those backups would be fine, just time consuming. This is my bosses little scheme to have some "cold" VMs that are relatively up to date that could be brought online pretty instantly.

How slow is slow? Products like Veeam can do a lot with a couple of Mb/s. And if the building is on fire at 2am, your NAS will burn with it while you're asleep.

# ? Dec 12, 2012 22:53

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Corvettefisher posted:

Your oshit device sounds more like a backup device than anything.

What is your RTO and how much data loss is acceptable 1 hr, 4 hrs, day?

I believe we are stating our RTO is 8 hours from when a disaster is declared and our RPO is 24 hours (since we are only getting nightly backups copied offsite).

Erwin posted:

How slow is slow? Products like Veeam can do a lot with a couple of Mb/s. And if the building is on fire at 2am, your NAS will burn with it while you're asleep.

I think we are currently on a 5 or a 10 meg line out there. Have not really had a chance to do any replication over that link, but our daily changes are pretty large on some of these VMs, so it will need a beefier connection.

As for the NAS and the whole main building burning in a fire overnight, we do have off site backup that goes out daily. This is just an additional line of cover your rear end that my boss wants.

# ? Dec 12, 2012 22:57

Kachunkachunk: Jun 6, 2011

You might be able to assess the replication cost by seeing how fast the VMs accumulate delta over your intended recovery objective times.

E.g. if you are doing 200GB/hr in writes, but the amount of disk delta ended up being only 1GB, that means only 1GB of unique blocks actually changed over that period of time. Setting up a long recovery time objective is probably quite doable on your link, in that case.

Replication might really be in the picture, but it also depends on what else that link is used for, and whether your changes are over wider ranges of blocks or not, I would guess.

# ? Dec 12, 2012 23:05

Harik: Sep 9, 2001; From the hard streets of Moscow
First dog to touch the stars; Plaster Town Cop

Corvettefisher posted:

IS AMD-V, Virtualization enabled in the bios? What stepping is the CPU?

Here is something that might help you
https://www.virtualbox.org/ticket/1933

You might want to update that CPU for VM work, but you probably already know that.

Ok, I won't be able to use this particular box, it's F2 stepping with the known virtualization bug, and no BIOS update to fix it.

Thanks! That was exactly what I needed to know.

# ? Dec 12, 2012 23:09

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Moey posted:

I believe we are stating our RTO is 8 hours from when a disaster is declared and our RPO is 24 hours (since we are only getting nightly backups copied offsite).

Honestly this sounds like it has all the greatest intentions, but have the worst outcome when poo poo hits the fan; and it will. Honestly, I would tell them to get to DD-160's spec'ed to handle your infrastructure, seed them local ship off to secondary site and run replications. I have yet to find a company that could actively predict when a disaster was going to happen at a site(excluding hurricanes). Just show it in a way of "If it was 4 a.m. and the roof caved in letting rain onto all the servers frying them how would a locally hosted box fix this?"

http://www.emc.com/backup-and-recovery/data-domain/data-domain.htm

They are really cost effective, if they go WHOA nearly 25k for all this? Ask them what a manual re-entry of records from paper(if they have those) would cost, how much would a fire do, etc etc. Quickly that 35k isn't such a big number to swallow.

# ? Dec 13, 2012 00:00

Docjowles: Apr 9, 2009

Who the hell is going to wait 5 minutes for the NAS to properly power off and spin down all the drives, unrack it and lug it down the block when the goddamn building is on fire. At best they probably just yank the power cable, drop it halfway out the door and trash the filesystem.

# ? Dec 13, 2012 00:55

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Once again this is a third backup of VMs and other stuff that is already stored off site. First copy stays in the server room on dedicated storage for this and the second copy gets written to a few disks for offsite storage with an archive company. This isn't the only backup set or the primary backup set. It is literally a 3rd copy of stuff. I don't see it as being important, but my current boss does.

Also the unit isn't a rack mount and will be housed in the IT room right near the door. Carrying this thing isn't a simple task though, especially when it is filled with 10 drives.

# ? Dec 13, 2012 01:18

evil_bunnY: Apr 2, 2003

Moey posted:

Once again this is a third backup of VMs and other stuff that is already stored off site.

Your boss is retarded, move on.

Look at your daily delta's and your connectivity options, see from there.

# ? Dec 13, 2012 01:33

Moey: Oct 22, 2010; I LIKE TO MOVE IT

evil_bunnY posted:

Your boss is retarded, move on.

Look at your daily delta's and your connectivity options, see from there.

I am aware of this. Our deltas every night exceed the size that could be pushed through our current connection to the DR site/Colo.

My last day there is in less than two weeks, so I will not be around for once a real (MetroE) connection exists between production and the DR/Colo site. I have put in reports showing that we need one, and it is expected sometime within the next few months.

I am doing this to mostly keep my boss happy, I am sure it will be scrapped sooner than later.

# ? Dec 13, 2012 02:12

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Then I would not touch it more than you have to so when poo poo hits the fan they aren't calling you or some poo poo.

# ? Dec 13, 2012 02:16

evil_bunnY: Apr 2, 2003

Sorry boss, got docs to write!

# ? Dec 13, 2012 02:24

Bitch Stewie: Dec 17, 2011

Moey posted:

There is some logic to the idea, but it has too many holes to really ever be relied on. If the building is on fire, do I really want to lug out a 10 bay NAS with me?

Seriously, this is your bosses idea of a loving backup plan "Grab the NAS Moey, grab the NAS"?

# ? Dec 13, 2012 11:12

KennyG: Oct 22, 2002; Here to blow my own horn.

Should I be concerned if my integrator/rep still thinks that I need vSphere Ent+ since I wanted to have 192GB of ram per host?

I freely admit that I know next to nothing about vSphere or any virtualization issues, but I don't want to blindly follow some idiot who doesn't know what he's doing.

It's Dell, btw. (Fed Gov, before someone starts trying to poach buisness.)

Second question, can you virtualize your vCenter Server within your cluster? Is that a good idea?

# ? Dec 13, 2012 20:14

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

KennyG posted:

Should I be concerned if my integrator/rep still thinks that I need vSphere Ent+ since I wanted to have 192GB of ram per host?

I freely admit that I know next to nothing about vSphere or any virtualization issues, but I don't want to blindly follow some idiot who doesn't know what he's doing.

It's Dell, btw. (Fed Gov, before someone starts trying to poach buisness.)

Second question, can you virtualize your vCenter Server within your cluster? Is that a good idea?

Well there are no more vram limits so, he needs to get updated. However Enterprise plus does have many nice benefits.
http://www.vmware.com/products/datacenter-virtualization/vsphere/compare-editions.html
^ Matrix comparison for reference.

Yes, you can and it is a very common practice, just make sure you get HA set up so in a host outage you aren't manually starting the vCenter server via powerCLI.

# ? Dec 13, 2012 20:19

DarkLotus: Sep 30, 2001; Lithium Hosting
Personal, Reseller & VPS Hosting
30-day no risk Free Trial &
90-days Money Back Guarantee!

If you ever have a need to migrate from Xen to KVM, take a look at virt-v2v.

I have an old Xen PV server that I need to decommission so I setup a new CentOS 6.3 + KVM server and wanted to migrate the old VMs instead of building new. So far it is working really well. Its pulling the VMs straight from the old Xen server to the new KVM server and creating the LVMs.

virt-v2v - Documentation

# ? Dec 13, 2012 20:41

KennyG: Oct 22, 2002; Here to blow my own horn.

I guss virta/HA/affinity for vCenter is better than buying a whole host and dedicating it. 4 host/~40VM setup. should run fine in virtual.

# ? Dec 13, 2012 20:49

1000101: May 14, 2003; BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

KennyG posted:

Should I be concerned if my integrator/rep still thinks that I need vSphere Ent+ since I wanted to have 192GB of ram per host?

I freely admit that I know next to nothing about vSphere or any virtualization issues, but I don't want to blindly follow some idiot who doesn't know what he's doing.

It's Dell, btw. (Fed Gov, before someone starts trying to poach buisness.)

Second question, can you virtualize your vCenter Server within your cluster? Is that a good idea?

Be mildly concerned they aren't up to speed on the new licensing terms from VMware.

Regarding your second question:

Yes that's fine. Recommend you create a DRS host affinity rule to pin vcenter to a couple of specific hosts so you know where to go when things go south. Also if you're connecting vcenter to a vDS make sure you use ephemeral binding for the vCenter server, it's database server and potentially at least 1 AD server.

# ? Dec 13, 2012 21:08

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

KennyG posted:

I guss virta/HA/affinity for vCenter is better than buying a whole host and dedicating it. 4 host/~40VM setup. should run fine in virtual.

Well that depends, there really isn't a host:vm ratio, look at what the VM's need and address the hosts accordingly.

# ? Dec 13, 2012 21:12

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Over the last 2-3 months we have been seeing an increasing number of VMs failing to complete the reboot after patching, instead getting dumped in to the recovery console. Looking at the VM logs, I can see that it has lost heartbeat with the VM for 2 minutes and forces a hard reset at that point. I dug up the screenshots it takes when it does the reset and they are consistently in that "Applying Registry Updates" or whatever stage of the patching process that now happens during the startup cycle of 2008/2008R2. Since the VMware Tools aren't loaded at that point it's assuming the system is hung and forces it down, Windows see that it fails a startup attempt and goes in to the recovery console.

Our NetApp is pretty old at this point and I'm sure the combination of its not-so-great performance, overnight backup jobs, and VMs patching all hammering on the environment are making those reboots take just long enough in some cases that they exceed that 2 minute window.

So at this point, I've manually extended the VM heartbeat detection window to 3 minutes which should hopefully be enough to get through patching without hitting this problem again. I'm also working on spreading out the patching window for my VM's a bit more, but there is a limit to how much I can do there. Is there anything else I should consider doing here? Is there a way in Windows I can change the behavior of my VM's so they don't go to the recovery console on the first failed boot attempt? Is there anything in the 5.x releases (like maybe the virtual UEFI version 8 machines) that helps minimize these false positives during a slow startup? Extending that window works, but seems pretty kludgey and is going to increase the amount of time a system hangs on an actual bluescreen before it cycles.

# ? Dec 13, 2012 21:26

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

You should be able to disable the recovery environment via this command to negate \Windows\System32\REAgentC.exe /disable

You can disable the monitoring of VM's and only have HA do host failure monitoring, however BSOD's and lockups within the guest won't be force restarted.

# ? Dec 13, 2012 21:38

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Thanks for that. I know its only treating the symptom, but it sure beats people complaining because something rebooted overnight and didn't make it back up.

# ? Dec 13, 2012 21:55

evil_bunnY: Apr 2, 2003

BangersInMyKnickers posted:

Our NetApp is pretty old at this point and I'm sure the combination of its not-so-great performance, overnight backup jobs, and VMs patching all hammering on the environment are making those reboots take just long enough in some cases that they exceed that 2 minute window.

This is easy enough to verify, even after the fact if you just log your datastore latencies.

# ? Dec 13, 2012 22:04

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

evil_bunnY posted:

This is easy enough to verify, even after the fact if you just log your datastore latencies.

We moved everything over to NFS because its an old NetApp that doesn't do VAAI and was making lun management a pain in the rear end. I've lost most of that visibility at the VMware level, but there's a server dedicated to doing performance monitoring so I can get stats from over there. While all this was happening, storage latency was sitting around 30ms which is about as high as I am comfortable with while running under a heavy load like this. That's about on par with what I have historically seen for an overnight peak load period considering our hardware and the amount of VM's we are running off of it. Not exactly a great thing, but they're not giving me the money to upgrade so it is what it is.

We've added a few more VMs to service in the last few months and Microsoft has been putting out patching cycles that do that registry changes at startup thing with an increasing regularity so I don't think there is any one thing to blame here. Just multiple smaller factors all compounding to push me over that 2min threshold on a more regular basis.

e: And I can't upgrade to 5.x yet because my licensing has been completely screwed up and missing by VMware for roughly a year and they keep promising to fix it but welp.

BangersInMyKnickers fucked around with this message at 22:30 on Dec 13, 2012

# ? Dec 13, 2012 22:21

Pantology: Jan 16, 2006; Dinosaur Gum

1000101 posted:

Also if you're connecting vcenter to a vDS make sure you use ephemeral binding for the vCenter server, it's database server and potentially at least 1 AD server.

I still do this out of superstition, but was under the impression that so long as you were using Static Binding you were typically okay--VMs on a port group set to static could start without vCenter being available, you just couldn't make changes to networking until vCenter was back online. Is that wrong?

# ? Dec 13, 2012 22:54

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Pantology posted:

I still do this out of superstition, but was under the impression that so long as you were using Static Binding you were typically okay--VMs on a port group set to static could start without vCenter being available, you just couldn't make changes to networking until vCenter was back online. Is that wrong?

That is correct, at least from the coursework I was trained on. Each host knows and stores its vDS configuration independently of the management server and will carry on operating if vCenter is down. Like you said, you lose the ability to manage or change anything and if you aren't careful you could create a situation where you lock yourself out of a vCenter VM but its pretty difficult to pull that off once the initial setup is done. But yeah, ephemeral binding gives you a safer recovery route for one of those Oh poo poo moments.

I was doing a test cluster setup a year or two ago where I screwed something up and moved my management port binding over to a vDS that wasn't configured quite right which caused the management interfaces to become isolated. You can get it back by jumping around your hosts in a console session and manually forcing it to dump the vDS configuration but it is a painful experience.

# ? Dec 13, 2012 23:20

BnT: Mar 10, 2006

Goons, I'm tasked with a vSphere 4.1 to 5.x upgrade (a dozen blades, 300 vms). Were you to be tasked with this would you (a) go to 5.0x or (b) go to 5.1x? I guess another question is now or later, as in go to a stable 5.0 version now or a 5.1 version as soon as they clean it up. I saw some 5.1 hate in the last couple pages, which is making me nervous.

# ? Dec 14, 2012 17:35

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

based on what I've read, we are waiting for the next release to upgrade from 5.0.

# ? Dec 14, 2012 18:01

KennyG: Oct 22, 2002; Here to blow my own horn.

Corvettefisher posted:

Well that depends, there really isn't a host:vm ratio, look at what the VM's need and address the hosts accordingly.

CF strikes again.

I realize you are normally helpful, but you have a reputation for machine gunning half-rear end'd answers too.

I was not asking about a host vm ratio. I was talking about virtualizing vCenter for 4 hosts running about 40 VMs. If it's ok to run 400+ hosts and potentially tens of thousands of guests on virtualized vCenter than it should work fine for 4/40.

# ? Dec 14, 2012 18:10

Adbot: ADBOT LOVES YOU

# ? Apr 25, 2024 15:51

Nitr0: Aug 17, 2005; IT'S FREE REAL ESTATE

5.1. Make sure you setup SSO properly. Make sure your host profiles are up to date. Use update manager.

We just did a 10 host upgrade - 700vm's from 5 to 5.1 and it went fine. I dunno what everyone else is complaining about.

# ? Dec 14, 2012 18:17

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »