Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

CtrlMagicDel: Nov 11, 2011

Just wanted to say thanks to everyone who has been contributing to this thread which I have been powering through over the last 2 months, it has been super helpful to get some idea of what other people's environments look like and the troubles that have been encountered. Our VMware guy jumped ship 2 weeks ago so I was "promoted" to being in charge of 50+ hosts, including a half-setup AutoDeploy environment. It's cool, I took the ICM class so I guess I am an expert right :suicide:

In all honesty though I'm super thrilled by the technology and the learning opportunities and look forward to contributing here/posting stupid questions that I will doubtlessly be ridiculed for asking.

# ¿ Oct 28, 2013 22:06

Adbot: ADBOT LOVES YOU

# ¿ Apr 28, 2024 10:26

CtrlMagicDel: Nov 11, 2011

I'm starting to look at build some sort of home server so that I can play around with ESXi and Hyper-V at home, and am trying to figure out what the best approach would be. What are the benefits of purchasing a specific "server" model from Newegg vs basically just building a PC and jamming 32 gigs of RAM into it? Would I be bottlenecking myself somehow with the PC approach? Obviously fancy high availability options such as redundant NIC's and such aren't necessary for this purpose. Just looking for some general guidance and wanting to make sure I'm not going down the wrong path.

# ¿ Nov 11, 2013 02:28

CtrlMagicDel: Nov 11, 2011

Daylen Drazzi posted:

Whitebox build is interesting to say the least. I went ahead and built my own whitebox (i7-4770, 32GB RAM, Raid Controller + 2x300GB WD Blues in a RAID 1, 2x2TB Seagate NAS drives in a RAID 0 and set up as RDM dedicated to my Linux VM, and 2 NIC cards, giving me a total of 3 NICs). It's not the flashiest ESXi box on the block, but considering I only run a firewall, Linux box and Windows box, it is plenty powerful for what I need. In the next few days I'm going to be building out a nested ESXi setup for labbing for my VCP5 exam, but it should still be able to handle everything and still have resources left over (although it might get a little tight).

I've actually been having quite a bit of fun with it, and of course work provides me a ton of opportunity to get additional practical experience in a huge virtual environment.

Something like that seems more than good enough. Since I'm not going to be doing anything that complicated right away, I'm guessing I could skimp on the RAID controller and additional NIC's at least initially, right? Otherwise, any suggestions for a cheap RAID controller?

evol262 posted:

Basically, ESXi in particular has really iffy support for consumer chipsets (especially newfangled stuff, where some USB3.0 chipsets still hang the boot process for 10 minutes then never work) and you're basically required to buy Intel NICs. Hyper-V takes (more or less) Windows driver support, so you're safer there (along with Xen[Server], KVM, et al), but ESXi whiteboxes are "server hardware, whitebox HCL wiki, or gamble on support".

Yeah, I've heard that in various places which is disappointing given my work is all ESXi and I'm looking for some experience installing it from scratch. Though getting some Hyper-V experience might be nice as well given I've suggested that it might be worth doing a cost savings analysis down the road for potentially moving less critical servers to Hyper-V, or at least bringing it into our lab environment to compare it to ESXi.

adorai posted:

I've never had an issue with a whitebox build running esxi. Just have an intel PCI express NIC on hand. I don't see the value in spending extra for a "server" for a home build anyway. Once it's running esxi, you are not going to mess with it outside of upgrades anyway.

Cool, that seems like a cheap and easy enough solution.

# ¿ Nov 11, 2013 05:39

CtrlMagicDel: Nov 11, 2011

evol262 posted:

I'm not saying it's worth it to spend extra for a "server" for a home lab, but it probably is worth it to check the whitebox HCL, because boot "hang" issues aren't exactly uncommon.

For the caveat, ESXi also runs on cheapass ASrock E-350 boards. You're probably fine. Just google "your motherboard+ESXi" before you buy one.

I've never understood the "installing it from scratch" argument for anything. A secretary could do it.

Good advice, I'll try to find a compatible motherboard right off the bat.

I suppose just the installation process itself isn't particularly valuable, but everything else that comes with it. I also feel more comfortable with a technology when I've walked through the installation/configuration of it myself, as minimal as that might be.

# ¿ Nov 11, 2013 17:54

CtrlMagicDel: Nov 11, 2011

So I'm working on configuring my first new host at work and am running into an issue with the DNS configuration setting in ESXi. The settings on the remaining hosts I have all are set to automatic and seem to resolve to the proper hostname, but whenever I reboot this host the setting seems to revert back to manual and the Hostname changes back to localhost, causing it to not join to the domain and apply a bunch of its host profile settings and various other things. All of my DHCP and DNS settings seem to be fine, and I can't find anything particularly weird with this host profile, so I'm trying to figure out where things are going wrong. Anyone seen this before or have any suggestions about what to check?

# ¿ Nov 21, 2013 16:55

CtrlMagicDel: Nov 11, 2011

evol262 posted:

Wait, what?

Are you handing out hostnames from AD over DHCP?
What does reverse DNS say for the ESXi host's IP address?
What does `vicfg-authconfig --authscheme AD --currentdomain` say?

I have what appear to be standard DHCP reservations and DNS entries for this host (compared to all of the other hosts anyway). Reverse DNS seems to have the correct IP address.

I'm not really familiar with ESXCLI and I can't find it installed on any of our servers, the previous guy used a lot of PowerCLI if there is an equivalent command there? The host doesn't join back up with AD after I reboot it until I change the hostname in ESXi and rejoin it in Virtual Center if that was info that command is supposed to yield.

# ¿ Nov 21, 2013 19:15

CtrlMagicDel: Nov 11, 2011

evol262 posted:

Not how reverse DNS works.

`nslookup ip.address.of.esxi` and it should return the hostname. This is how the host itself should be discovering whether or not it has a hostname it should be taking from DNS -- it resolves whatever name maps to that IP (canonically; not every operating system works this way, but RDNS is frequently broken and the cause of a lot of pain).

Oh, ok. nslookup resolves correctly. That is what the Reverse Lookup Zone entry in Windows DNS Manager references, correct?

evol262 posted:

It's installed on the ESXi server. Log into it, whether over some kind of OOB management, SSH, or a physical console. It'll tell you whether or not the ESXi server thinks it's joined to a domain or not.

You're going to need to get onto the actual ESXi servers on a CLI to check the hostname and other info if you want to get anywhere other than guessing at DNS.

I can't seem to get the syntax right to get this command to work on the ESXi server via Putty, is it under some specific namespace? I need to preface it with esxcli (namespace) command, right?

evol262 posted:

What do you mean whan you say "rejoin it"? Does it create a new computer object? Does the SID change? Does it just update the hostname? What changes in AD when you rejoin it?

By rejoining it I mean in vCenter via "Authentication Services -> Properties -> Change to Active Directory". That at least gets the "host is not joined in any domain currently" message to go away under profile compliance. The SID does not change.

# ¿ Nov 21, 2013 20:33

CtrlMagicDel: Nov 11, 2011

evol262 posted:

I have no idea what Windows' tools reference, honestly.

No, it doesn't need to be prefaced. Are you familiar with UNIX-like operating systems? (This isn't asked to be judgmental, just trying to gauge how comfortable you are)

Are you able to look at AD itself to see what changes in LDAP?

Not taking anything judgmentally at all, I appreciate any help. I'm no master but I can get around in Unix, I technically owned a couple of Solaris servers in my last position but didn't have to do much with them honestly.

Here's what I get when I try that command after SSH'ing to the host:

~ # vicfg-authconfig --authscheme AD --currentdomain
-sh: vicfg-authconfig: not found

I'm able to see that the SID on the compute object in AD has not changed, though whenever I rejoin the object's modified date updates. Beyond that my Active Directory troubleshooting skills are pretty limited, though I got my home lab domain controller up yesterday! So that's hopefully a start down the right path.

# ¿ Nov 21, 2013 23:08

CtrlMagicDel: Nov 11, 2011

Alright, finally got to the bottom of the problem. The issue had to do with enabling the coredump partition in the host profile before successfully booting up with the rest of the profile (which I guess creates all the proper partitions on the local disk? I'm still unclear on how exactly this part works). We disabled the coredump partition, successfully applied the host profile, joined to the domain, then reenabled the partition and were able to reboot successfully into a new healthy looking ESXi host. Now working on cleaning up my documentation so hopefully I can run through this easier next time and aren't chasing DNS ghosts.

# ¿ Nov 22, 2013 22:26

CtrlMagicDel: Nov 11, 2011

What Statistics level do people tend to keep their Statistics Intervals in vCenter Server set to?

We are currently setup as such:

Interval Duration/Save For/Statistics Level
5 Minutes/1 Day/3
30 Minutes/1 Week/3
2 Hours/1 Month/3
1 Day/1 Year/3

Previously we had everything set to Level 1 but shortly before I took over vSphere we bumped the statistics level up to 3 on everything in order to capture information like NIC utilization. The vSphere client estimated this would use somewhere in the neighborhood of 47GB. In the 3 months since this change our vCenter DB has ballooned from 5GB to 80GB with no end to the growth in sight, and VMware support was of no help in explaining why it was so much larger than the estimate other than repeatedly informing me it was "just an estimate" (Thanks...)

Does anyone else run at this high of a logging level and if so have you seen larger than expected amounts of growth or are we just dumb for logging at this high of a level?

# ¿ Dec 20, 2013 20:57

CtrlMagicDel: Nov 11, 2011

So I have a 2 node cluster that we run some Red Hat Linux Websphere stuff on it that needed some firmware patching, so I figured I would no big deal migrate all the VM's from one over to the other since there was plenty of memory. Moved all but 2 of them over before I got the "admission check" error message indicating they couldn't be migrated over due to memory reservations being too high. Wanting to explain this to the area who owns this particular cluster and configures the reservations, I ran a couple of PowerCLI commands to sum up the Memory reservations of all VM's on the cluster and was surprised to find the number lower than what I think the available memory should be. These are HP Blades with 512GB of RAM:

get-vm | where {$_.VMHost -like "BlahBlah*"} | Get-VMResourceConfiguration | select VM,MemReservationMB | Measure-Object MemReservationMB -Sum

Count : 28
Average :
Sum : 509952
Maximum :
Minimum :
Property : MemReservationMB

I also gathered the VM Memory overhead:

get-vm | where {$_.VMHost -like "BlahBlah*"} | select Name,@{N="MemoryOverhead";E={$_.ExtensionData.Runtime.MemoryOverhead/1MB}} | Measure-Object MemoryOverhead -Sum

Count : 28
Average :
Sum : 4736.25390625
Maximum :
Minimum :
Property : MemoryOverhead

The physical memory total according to vCenter is 524276.3MB, minus 383.3MB leaving 523893.0 MB for the VM's. Doing the math, I have

523893.0
- 509952
- 4736
-----------
9205MB

So by my calculations I should have 8.989258 GB beyond the memory that is reserved. Is there some other sort of overhead beyond the System and individual VM overhead that I'm not accounting for?

# ¿ Jan 22, 2014 22:23

CtrlMagicDel: Nov 11, 2011

Dilbert As gently caress posted:

IS ESXi set to reserve anything obscene?

There are some 100GB Reservations on a couple of VM's if that is what you mean.

# ¿ Jan 22, 2014 23:06

CtrlMagicDel: Nov 11, 2011

Dilbert As gently caress posted:

Oh you already accounted for system overhead of ESXi.

Silly questions but if it is a 2 node cluster and you are taking one offline, is admission control enabled? Admission control when enabled will reserve some of the host resources either by percentage or slot size.

Good though, but admission control is off on this cluster. Most of the VM's on this cluster are either non-production or load balanced at least so it's not the end of the world if one or two doesn't start right away in a host failure scenario, but mostly trying to figure it out from an understanding perspective.

# ¿ Jan 23, 2014 00:08

CtrlMagicDel: Nov 11, 2011

Has anyone tried upgrading to 5.1U2 yet? I've tried a couple times from 5.1 in our Dev environment and it is hosing up Update Manager/Web Client/AutoDeploy. Support had us go back and reset the admin password as it looks like it expired but that really didn't help much. My inclination is that SSO is getting hosed because after just upgrading SSO the web client is coming back with "Signature Validation Failed" when logging in with either a domain ID or the System-Domain admin ID.

# ¿ Feb 6, 2014 16:18

CtrlMagicDel: Nov 11, 2011

So FYI if you are planning on updating to vSphere 5.1U1 or U2 from 5.0 don't update Auto Deploy, it breaks itself and takes Update Manager down with it. Went back and updated everything except Auto Deploy and everything works like a charm except for the Web Client, which for some reason needed to be uninstalled and reinstalled before magically working again.

# ¿ Feb 11, 2014 23:18

CtrlMagicDel: Nov 11, 2011

I'm really struggling with the performance charts in vCenter vs. what the guest OS is reporting for memory usage. I have an 8GB Windows VM that when reviewing the vCenter usage metric for Memory appears to fluctuate between 15-25%. However looking at the guest OS it appears to be 80% utilized, and I can see 3 1GB processes and a 2GB process running, along with all of the usual Windows processes so it definitely appears to be using well more than 15-25% of memory. When I took the Optimize and Scale class the instructor really tried to drive home "Don't trust the guest OS it is lying to you only trust vCenter!" but I'm currently staring at a VM where these metrics don't seem to agree at all. Is there a better metric than usage I should be using in vCenter? Nothing is swapped out according to vCenter.

I'd also be curious to know how other people judge when a VM needs more space.

# ¿ Mar 6, 2014 16:28

CtrlMagicDel: Nov 11, 2011

Erwin posted:

Are you just looking at the overview section of the performance tab of the VM? That shows active memory, which is estimated by the hypervisor and is almost always lower than what's actually consumed. Use an advanced chart and look at the consumed metric. It'll be closer to what's granted to the VM, depending on transparent page sharing and the like. The guest OS's metrics don't actively reflect the physical memory used on the hypervisor (8GB does not necessarily mean it's using 8 of the 64GB or whatever on the host) but its shown percentage in the OS is the percentage it is using of what's been made available to it.

If the VM is paging to its disk, it's time to give it more. You may not want to give it more at 80%, though, if it's not paging. As it uses memory, the hypervisor will allocate physical memory to it, but it doesn't know when it has freed memory, so it can't actually free up the physical memory. Let's say it's using 6 of the 8GB it has. You give it 12GB, and through starting and stopping applications and whatever it does, it ends up writing to - and then freeing, most of that 12GB. It could be back down to 6GB used, but the hypervisor has it assigned to say 10GB of host memory. If it's out of physical memory, the hypervisor will inflate the balloon driver, and wherever it sees that appear in physical memory, it knows that the OS is no longer using that and can overwrite it with pages from another VM.

Check out this document: http://www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf

Man that is some dense material to read on a Friday.

So one thing to mention is that hosts in our environment is not overcommitted on memory and not really anywhere close to 100% memory utilization. So looking as the Consumed memory metric under the advanced charts it is basically always a flat line roughly equal to the amount of memory the VM is allocated with since no memory reclamation except maybe some small amount of transparent page sharing is taking place? The active chart fluctuates pretty wildly, though looking from the guest OS in task manager the memory usage is generally a lot more stable. In this sort of environment does it just make more sense to rely on the guest OS measurements since it is probably pretty accurate? The consumed and active memory measurements really don't seem to provide much value.

# ¿ Mar 7, 2014 17:51

CtrlMagicDel: Nov 11, 2011

KennyG posted:

Can anyone help me with:

I have 6 hosts in a cluster and 2 of the 6 display the message. The other 4 don't. I didn't do anything fancy to configure them. Can anyone explain what is going on here or how to fix it. I don't think the google diagnosis is what I should do as it's a rather involved SSH/CLI solution that sends me down a rabbit hole I didn't do for the other 4.

Do you configure any sort of network core dumping?
Do those 2 hosts not have internal hard drives and the other ones do?

# ¿ Mar 7, 2014 18:05

CtrlMagicDel: Nov 11, 2011

When you put an ESXi host into maintenance mode and reboot the host via Auto Deploy, is it supposed to come back into Virtual Center in maintenance mode or not in maintenance mode? Our hosts always came back in maintenance mode previously, and after upgrading to 5.1U2 they seem to be booting back up active and immediately having VM's move onto them via DRS. I've had different VMware support people who have told me that one or the other is the expected behavior, including one support guy who both linked me to and quoted some documentation basically verbatim except for a portion which stated that it was supposed to come up in maintenance mode except he had literally changed two words in his quote to indicate it was NOT supposed to come up in maintenance mode :wtc:

It is almost funny except for how infuriating it is.

Edit: We apply a host profile with an answer file that joins the host to an AD domain if that makes a difference.

CtrlMagicDel fucked around with this message at 00:20 on Jun 8, 2014

# ¿ Jun 8, 2014 00:13

CtrlMagicDel: Nov 11, 2011

Daylen Drazzi posted:

As I understand it, so long as the host profile does not require the user to specify any information then the host will not come up in maintenance mode. If it requires something, such as a static IP, then it will.

Here's the link to the VMware document on the Auto Deploy Boot Process. Look at item 7 under the First Boot Overview for where I found the answer.

Yeah, that seems reasonable. It looks like Hersey has a blog post confirming that behavior is normal: http://www.vhersey.com/2012/04/vsphere-auto-deploy-host-profiles-and-answer-files/

I haven't patched ESXi before and all I've seen is the opposite behavior so when I saw it behaving differently I assumed it was a bug, but I guess I've been living with the bug all along.

# ¿ Jun 8, 2014 02:12

CtrlMagicDel: Nov 11, 2011

complex posted:

Shows how you well understood Auto Deploy & Host Profiles are, even inside VMware. Good luck getting proper support for these features. (We've tried.)

The host comes out of maintenance mode if and only if the host profile is successfully and complete applied. Look into the F11 console while booting to see what the host profile is doing, if it is just spinning on, say, enumerating every disk. If this takes longer than the timeout, your host profile is considered "failed". The number of disks it takes for this to happen is surprisingly low.

I feel your pain. I've opened several cases about Auto Deploy/Host Profiles and gotten pretty poor responses from support. I guess that makes me not feel so bad about having my sales people escalate this support case I opened because my bug was fixed :ughh:

# ¿ Jun 9, 2014 16:56

CtrlMagicDel: Nov 11, 2011

Aaaaaand now support just called me back saying it should stay in maintenance mode again. I'm certainly glad everyone at VMware support is on the same page.

# ¿ Jun 9, 2014 23:11

CtrlMagicDel: Nov 11, 2011

parid posted:

June was a bad month for my VMware clusters. We had a series (at least 3) of network outages that prevented many of my hosts from talking to their storage (all NFS). Every time this happens I have to take time to get vCenter back up before we can dig in fixing the rest of the VMs. In on case, this was an extra hour of delay.

I'd love to fix the root of the problem (the network) but its out of my hands with another team. What I do have control over is how we have vCenter implemented. Right now the vcenter/sso server is physical. I had plans to virtualize it very soon. The SQL server for it is already a VM.

Making vCenter a VM is fine for many failure scenarios, but the kind of disruptive network failure were seeing more of would be challenging to deal with. I would like to find a way to make it multi-site. Or at least have some kind of cold standby if the app in the primary datacenter fails.

What is everyone doing to ensure the availability of their vCenter? It looks like VMware is dropping heartbeat soon. I see vCenter supports SQL clustering and microsoft cluster services now. Is this additional headache worth considering?

I'm kind of in the same boat as you with a physical vCenter and no HA/DR solution beyond "rebuild vCenter". I was actually arguing for us to buy Heartbeat until I found out at the VMUG that is was going end of sale.

We have two datacenters that share networks across a MAN, each of which is the failover DR site for the other and I'd like to solve both the HA and DR problems. My current plan is to virtualize the existing vCenter next time we upgrade and purchase a second vCenter to run at the other data center and run them in linked mode, and migrate all the hosts at the secondary site to the local vCenter. The only downside I can find is that then we can't vMotion between data centers (not really a big deal) and that I'm guessing we'd have to take some downtime to migrate VM's?

# ¿ Jul 5, 2014 20:46

CtrlMagicDel: Nov 11, 2011

parid posted:

Two vCenters isn't a terrible idea. Have you joined the 6 beta yet? This idea might be even better in the future.

Yeah, I joined the beta right after posting that, so obviously I'm now thrown for another loop and will have to do some testing of that functionality.

Dilbert As gently caress posted:

That feature won't work with a MAN, unless you are talking about somethingelse.

Maybe MAN is the wrong terminology, they are different physical locations but share the same network (like my hosts at both datacenters are on the same subnet range). I'm kind of assuming this is something weird my company does that no one else is doing.

Dilbert As gently caress posted:

site redundancy?

Hahaha

Dilbert As gently caress posted:

SRM has a 15 minute RTO, is that window to short? I'll give you DRS and vMotion are nice but is 15 minutes that critical?

They quoted us the prices for SRM this year and it basically got laughed out of the budget. I cobbled together a PowerCLI script that re-adds all of our VM's to inventory and powers them back on in the event of a DR scenario.

# ¿ Jul 6, 2014 16:57

CtrlMagicDel: Nov 11, 2011

parid posted:

Your environment sounds very similar to mine. The only difference so far is that we have synchronous storage in both sites. Unfortuneatly we don't have a 3rd site quorum. The switch is manual. Once we throw it on the storage, we can let HA do the rest.

Keep in touch on what you decide to do . Right now were going to virtualized the vcenter and fence it tightly with drs group host affinity. I might have to manually fix but at least I have the best shot possible to do so.

In version 6, two vcenters sounds like and neat solution.

Our hosts don't share the same storage between data centers so I'm guessing we are probably pretty similar. We replicate our datastores between the two data centers. Our recovery is currently pretty manual for DR.

I'm still not sure what exactly the implication of multiple vCenters is going to be now but I haven't done a ton of reading of the beta documentation. Will look forward to discussing it more after I have read up more and after it can be discussed publicly. On that note, anyone going to VMworld? We got a free ticket from one of our vendors so it sounds like I'll be attending and would love to meet some wise virtualization goons.

# ¿ Jul 6, 2014 20:48

CtrlMagicDel: Nov 11, 2011

Dilbert As gently caress posted:

Well shoot if srm or the like is out of the question that limits you quite a bit. Are you able to use vsphere replication at least?

We have array based replication but it isn't really automated at the moment or integrated with vcenter. I'm planning on arguing that we need another vcenter at our other site up and running for DR purposes as I think that will be reasonably enough priced I should be able to get that ordered. If we virtualized vcenter and all the db's it would be a step In the right direction but it would still require some recovery time id rather not have to spend.

Edit: fix stupid autocorrect from my phone because our new firewall blocks SA because it is "questionable".

CtrlMagicDel fucked around with this message at 22:15 on Jul 7, 2014

# ¿ Jul 7, 2014 21:24

CtrlMagicDel: Nov 11, 2011

I'm at VMworld for the first time, holy crap you can be drunk here 24/7 for free. Shoot me your contact info at my username dot gmail if you want to meet up, would love to meet another goon!

# ¿ Aug 26, 2014 05:01

Adbot: ADBOT LOVES YOU

# ¿ Apr 28, 2024 10:26

CtrlMagicDel: Nov 11, 2011

TeMpLaR posted:

Anyone install vSphere 6 on anything? Debating trying it in dev or just waiting for some more patches to come out first.

Got it running my home lab on VMware Workstation. Install (Windows vCenter) seemed a lot easier but haven't played much with more advanced features or changing certs around, which was one of the big pain points at least in 5.1 where work is at.

# ¿ Mar 24, 2015 23:37

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs