|
Where is the setting to specify a target IP to send heartbeats to in the high availability configuration on your ESXi hosts? I think I may have messed around with it on a host in the cluster a year or two ago and it has been causing problems with that one getting knocked in to isolation mode, and now I can't find the drat thing. e: N/M found it, das.isolationaddress under the HA advanced config. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006421 BangersInMyKnickers fucked around with this message at 15:24 on Oct 17, 2012 |
# ¿ Oct 17, 2012 15:19 |
|
|
# ¿ Apr 28, 2024 04:44 |
|
iSCSI traffic will try going out any kernel interface that is within the subnet mask of the target or (god help you) it will route to get there. So assuming your management network is something like 192.168.1.x/24 and the storage is 192.168.2.x/24 but they're all on the same physical network and the gateway can route the packets between the two then it is entirely possible that your bouncing your iSCSI traffic through a route hop. That's a bad thing and you don't want to do it.
|
# ¿ Oct 18, 2012 19:06 |
|
If that secondary route is available then it should get discovered during the discovery scan of the software iSCSI initiator (I doubt dynamic discovery will work since it is in a different subnet, so you'll be manually adding the target IP(s). At that point, the multiple paths will be listed under the initiator properties and it is up to you to configure in the lun multipathing options to define which would be the primary path/how it handles failover, or if you're just going to run it in round-robin. But again, routing iSCSI traffic is going to almost always a bad idea and should be avoided.
|
# ¿ Oct 18, 2012 20:42 |
|
I will say that we just moved off iSCSI for NFS in our environment and it is fantastic to not be beholden to SCSI lock contention and all this volume/lun/VMFS sizing nonsense. It simplified the heck out of our deduplication and disaster recovery colume replication schedules and we're seeing 80+% deduplication rates on the NFS volume dedicated to OS paritions and it has been only going up as we add more VMs.
|
# ¿ Oct 18, 2012 20:51 |
|
skipdogg posted:OK guys, need a little help. Our main VMWare guy is leaving in a month, I have a solid understanding on how Virtualization works, can do basic administrative tasks in VCenter, etc, but I need to get into the nuts and bolts of this poo poo in case something goes wrong. We have Premium support or whatever for all our production stuff, and believe it or not there's money for training. I'm a MS guy though, so my VMWare experience is limited. Was your VMware guy also managing the network configuration and/or storage systems? You're going to need to bone up on those since more likely than not those will be the things that bite you in the butt, especially if they were done wrong and you need to fix it.
|
# ¿ Oct 18, 2012 20:53 |
|
Corvettefisher posted:I am a bit confused here what is your setup look like? If it is 26k files that is a ton of I/O requests, you may be maxing out your IOPS. Basically DOS'ing the storage, causing the VM's to lose access to disk and funky stuff happens when datastores are DOS'd. Yeah but if that was the case it should be throwing a bunch of alarms about storage latency thresholds getting exceeded.
|
# ¿ Oct 18, 2012 21:12 |
|
skipdogg posted:The EMC SAN is FC and managed by my boss. Shouldn't be on my radar for a while I hope. Well at least you don't need to worry about that end of it. But the networking in front of the cluster can be equally important, because if you don't have proper redundancy set up on the switches with the management interfaces for your hosts you can easily hit a situation where hosts think they are isolated and start shutting down VMs out of nowhere. If you do need to manage the networking, make sure you are familiar with link aggregation and spanning tree.
|
# ¿ Oct 18, 2012 21:19 |
|
Mierdaan posted:If you're fast with labs, you can be done by 4PM every day. I guess it depends on how fast your instructor lectures too, though I have no idea how the online stuff works. I've done training courses where they were a combination of in-person and online, so you had the instructor in front of the class with a camera that fed to the online people. It was a pain in the rear end from the start because everything ran through the JRE and the first 2 hours were just trying to get the online people up and working. And then after every section there was a long pause while he waited for the online people to type questions, and answering them was painful because of that disconnect when you're trying to explain new concepts to people who aren't physically in front of you. And then the online people were constantly ducking out with phone calls from work or kid stuff because they were working from home, and the instructor was trying to get them caught up. It was annoying to say the least. But in both cases where the course was structured like that, all the labs and materials were laid open from the start so you could work ahead or poke through the optional labs and materials.
|
# ¿ Oct 19, 2012 14:45 |
|
Rhymenoserous posted:This sounds weird to me, I've grown iSCSI virtual filesystems fairly often without a hitch. Hell I just bumped one of my datastores a few hours ago. What problems do you guys have? Its a little annoying because there are multiple steps (grow volume, grow lun, grow VMFS), but the biggest issue is SCSI lock contention. Thin provisioned VMDK files grow in size in 1-8mb chunks depending on how you set up the VMFS. Because iSCSI was designed to really be a single host protocol, VMFS has to overcome that by being multi-host aware. Every time one of those blocks needs to grow it issues a SCSI lock command which can hold up any other vmdk trying to grow at that same time. This is the VMFS locking issue that you may have heard about, and why they recommend limiting the number of VMs to a iSCSI datastore to somewhere in the 10-15 range. That means you have to manage a bunch of different luns, which means more wasted space as you try to maintain adequate freespace on each of them. And more luns means it is harder to keep your OS vmdk files separate from data ones to minimize the time it takes to run deduplication cycles. And, at least on our NetApp, we could only grow luns to be 10x their initial size which caused some headaches when we were getting started and doing the initial P2V conversion. And if you are growing out luns you have to force a rescan on the iSCSI initiator which can be a slow, painful experience if you are in an environment with a lot of lun mappings. NFS just doesn't have any of these weird limitations because it is multi-host aware and way more flexible.
|
# ¿ Oct 19, 2012 17:00 |
|
Corvettefisher posted:Isn't that VMFS3 only? or am I thinking of something different. I'm still on on 4.2 with VMFS3 datastores so I guess in theory the problem could be minimized in the newer versions, but SCSI locking on new block allocation is very fundamental to how iSCSI luns operate and I don't see how it could be eliminated.
|
# ¿ Oct 19, 2012 17:29 |
|
Ha, my NetApp is 8 years old and couldn't run VAAI if its life depended on it. Oh well.
|
# ¿ Oct 19, 2012 17:57 |
|
DevNull posted:So is anyone using or thinking of using the 3D acceleration feature released with ESX5.1? I would be interested in hearing about people's experience with it so far. That kind of functionality has been a big sticking point with desktop virtualization for our CAD/Revit folks. Right now Hyper-V is the only game in town for that kind of thing, but I'm curious to see what VMware can come up with since there shouldn't really be anything preventing them from carving up a beefy Quadro like you would any other physical resource.
|
# ¿ Oct 23, 2012 23:02 |
|
What kind of OEM hardware is available to support that kind of config? Last I checked, the rack-mount Precision workstations came the closest and the biggest problem was most normal servers couldn't push enough power to the PCI-E bus to run the cards. Has that changed at all?
|
# ¿ Oct 26, 2012 16:51 |
|
DevNull posted:What the gently caress? That is probably left over from the days of it being really lovely and slow. I guess we need to find someone in charge of the docs and tell them to change that. if you're working in a shop with a bunch of different admins it can be a pain because people start fighting over the console session. I've had to beat it in to the head of the other guys here that they only use RDP to manage servers and if something is screwed up to the point that it needs console access I should be working on it anyway.
|
# ¿ Oct 31, 2012 18:04 |
|
I delegate out console/power access to responsible parties so they can force down one of their servers if it misbehaves on off-hours since none of my staff (myself included) want to be taking that call. Every time there was an major, minor, or update build change to vCenter the thick client had to be reinstalled on dozens of workstations which means packaging and re-deploying. It was a big pain in my rear end and I will be doing cartwheels once I do the upgrade to 5.x and don't have to do that crap any more.
|
# ¿ Oct 31, 2012 20:34 |
|
Noghri_ViR posted:Is anyone running Exchange over NFS? I know it's not supported but I've seen reports of people doing it and it running quite well. I was told that if you ever need support on it you just vmotion over to an iSCSI datastore and then call up support. Yeah, we do it and it runs exactly the same as iSCSI did. There's no technical difference to how Exchange operates, it's all abstracted and not visible at the OS level, so it's in my opinion more a CYA thing from Microsoft. There is slightly more overhead, but we're not hitting IOPS contention thresholds so who cares. If you're doing something unsupported it is good to keep the flexibility to throw up an iSCSI volume to test against the supported config, but in this particular case it seems extremely unlikely that a iSCSI vs NFS problem would manifest itself in a way that would affect just Exchange instead of having larger OS level implications. Microsoft also says you should be thick provisioning your Exchange data volumes, and my assumption has been that because with NFS you're almost always thin provisioning automatically that they are trying to avoid a situation where an overcommitted volume fills and can't grow the VMDK.
|
# ¿ Nov 5, 2012 17:39 |
|
wolrah posted:I was reading about the new ARM A15s having virtualization extensions and was wondering if anyone here had heard anything about this and if there were any known plans to make use of these features in a user-facing way. IIRC KVM and Xen both support it, but the ARM server market is still pretty small so I can't say that really catches my interest yet. I think the mobile stuff is why they are doing it now, but it will filter in to the server market arm fairly quickly. Frankly the whole idea seems terrible to me. VMware has been making pushes in that direction for a while because they see the dollar signs. More than likely this will target Android and BlackBerry 10. It might work okay on BlackBerry where you have a locked down and well maintained host layer and then jump between the isolated sessions for work and personal (but considering their track record for the last 5 years I am doubtful). On Android you have a massive clusterfuck of unmaintained handsets because the vendors don't give a singular poo poo. Throwing in a virtualization layer is going to complicate things even more as it will be sold as a security feature that actually poses an additional security risk to the device. I do not see that working out well.
|
# ¿ Nov 22, 2012 02:10 |
|
We're setting up storage replication between our primary and secondary NetApp units for a DR plan. Everything is in NFS volumes, so the plan is to replicate the changes nightly when activity is low. If the building burns, we mount up the the volumes on the backup hosts, import the VMs, and get back online in a couple hours. The question I have is should I be concerned with trying to quiesce traffic before replication kicks off? The NetApp units generates a volume delta while the replication is happening so you're moving stable data. My assumption is that the state of the VMDKs as I bring them up (in the hopefully non-existent occasion that I actually have to do this) is that they will just think they had a hard crash at the time of replication, and everything we run including databases seems pretty resilient to hard crashes these days. Sure, in the case of databases there is going to be a little data loss because the log marker hasn't incremented after a little bit of data was written out. But that's maybe a few seconds worth of data and we're going to be doing 12 hour replication schedules which means we're going to be losing on average 6 hours worth of stateful data anyhow. Is my gut right on this or do I have my head up my own rear end and really need to get the traffic quiesced with the NetApp VMware plugins?
|
# ¿ Dec 5, 2012 22:38 |
|
Moey posted:Have you though about segregating that traffic with VLANs? Yeah, good lord. Tag your traffic down to the host and just set up different virtual networks for each traffic tag.
|
# ¿ Dec 5, 2012 22:41 |
|
evil_bunnY posted:If it's all VM's why not SRM? Unless something changed recently, it is way outside our budget. e: There are also a few legacy non-VM iSCSI luns and CIFS volumes hanging around that need to be replicated. I'm not sure if I will be able to ever fully get rid of them so if I can do all my replication at the storage appliance level that seems easier. We've already paid for the licensing there. BangersInMyKnickers fucked around with this message at 00:31 on Dec 6, 2012 |
# ¿ Dec 6, 2012 00:29 |
|
Misogynist posted:You've clearly never worked with either Oracle or XFS. Ha, yeah. We have a couple Oracle middleware servers that like to die for absolutely no goddamn reason but I'll make the DBA clean that up.
|
# ¿ Dec 6, 2012 15:09 |
|
luminalflux posted:Got any numbers? Right now I kinda like the fact that it's easy to grow the vmdk to add add storage space instead of screwing around with the LeftHand. in fact, our LeftHand is only used for vmware storage VMDK encapsulation doesn't really pose a lot of overhead. In my case, the Oracle db's are pretty small and low load so I'm okay with throwing in that layer of abstraction but that's going to depend heavily on the particulars of your situations.
|
# ¿ Dec 6, 2012 15:12 |
|
Didn't they drop support for Win2000 P2V so you have to do it with an old version of converter?
|
# ¿ Dec 7, 2012 16:46 |
|
Over the last 2-3 months we have been seeing an increasing number of VMs failing to complete the reboot after patching, instead getting dumped in to the recovery console. Looking at the VM logs, I can see that it has lost heartbeat with the VM for 2 minutes and forces a hard reset at that point. I dug up the screenshots it takes when it does the reset and they are consistently in that "Applying Registry Updates" or whatever stage of the patching process that now happens during the startup cycle of 2008/2008R2. Since the VMware Tools aren't loaded at that point it's assuming the system is hung and forces it down, Windows see that it fails a startup attempt and goes in to the recovery console. Our NetApp is pretty old at this point and I'm sure the combination of its not-so-great performance, overnight backup jobs, and VMs patching all hammering on the environment are making those reboots take just long enough in some cases that they exceed that 2 minute window. So at this point, I've manually extended the VM heartbeat detection window to 3 minutes which should hopefully be enough to get through patching without hitting this problem again. I'm also working on spreading out the patching window for my VM's a bit more, but there is a limit to how much I can do there. Is there anything else I should consider doing here? Is there a way in Windows I can change the behavior of my VM's so they don't go to the recovery console on the first failed boot attempt? Is there anything in the 5.x releases (like maybe the virtual UEFI version 8 machines) that helps minimize these false positives during a slow startup? Extending that window works, but seems pretty kludgey and is going to increase the amount of time a system hangs on an actual bluescreen before it cycles.
|
# ¿ Dec 13, 2012 21:26 |
|
Thanks for that. I know its only treating the symptom, but it sure beats people complaining because something rebooted overnight and didn't make it back up.
|
# ¿ Dec 13, 2012 21:55 |
|
evil_bunnY posted:This is easy enough to verify, even after the fact if you just log your datastore latencies. We moved everything over to NFS because its an old NetApp that doesn't do VAAI and was making lun management a pain in the rear end. I've lost most of that visibility at the VMware level, but there's a server dedicated to doing performance monitoring so I can get stats from over there. While all this was happening, storage latency was sitting around 30ms which is about as high as I am comfortable with while running under a heavy load like this. That's about on par with what I have historically seen for an overnight peak load period considering our hardware and the amount of VM's we are running off of it. Not exactly a great thing, but they're not giving me the money to upgrade so it is what it is. We've added a few more VMs to service in the last few months and Microsoft has been putting out patching cycles that do that registry changes at startup thing with an increasing regularity so I don't think there is any one thing to blame here. Just multiple smaller factors all compounding to push me over that 2min threshold on a more regular basis. e: And I can't upgrade to 5.x yet because my licensing has been completely screwed up and missing by VMware for roughly a year and they keep promising to fix it but welp. BangersInMyKnickers fucked around with this message at 22:30 on Dec 13, 2012 |
# ¿ Dec 13, 2012 22:21 |
|
Pantology posted:I still do this out of superstition, but was under the impression that so long as you were using Static Binding you were typically okay--VMs on a port group set to static could start without vCenter being available, you just couldn't make changes to networking until vCenter was back online. Is that wrong? That is correct, at least from the coursework I was trained on. Each host knows and stores its vDS configuration independently of the management server and will carry on operating if vCenter is down. Like you said, you lose the ability to manage or change anything and if you aren't careful you could create a situation where you lock yourself out of a vCenter VM but its pretty difficult to pull that off once the initial setup is done. But yeah, ephemeral binding gives you a safer recovery route for one of those Oh poo poo moments. I was doing a test cluster setup a year or two ago where I screwed something up and moved my management port binding over to a vDS that wasn't configured quite right which caused the management interfaces to become isolated. You can get it back by jumping around your hosts in a console session and manually forcing it to dump the vDS configuration but it is a painful experience.
|
# ¿ Dec 13, 2012 23:20 |
|
We started running de-dupes on the OS partition VM volume on our NetApp about a month ago. The initial pass of 1.8TB took about 11 days and brought it down to about 650GB actual usage, so a pretty good dedupe ratio. Since then, he's been trying to run dedupes off and on as the change delta percentage starts creeping up but when they fire off they take another 10-11 days to complete which seems way too long. Other volumes containing CIFS shares and upwards of a TB take about 30 minutes or so (but I suspect the block fingerprinting has very little matching, requiring less of the block by block inspection pass, so a very different beast). Both the vswaps and pagefile (inside the boot volumes) reside there as well and he is under the impression that this would be destroying performance. I'm not that convinced since the vswap should be full of zeros since they've never been used and the pagefiles aren't being encrypted or dumped at reboot so that data should be relatively stable. Ideally I would like to move all the pagefiles from SATA to FC and possibly the vswap while I am at it, but we don't have the capacity to handle it right now until some more budget frees up and frankly I'm not convinced this is the source of our problem. Any thoughts?
|
# ¿ Jan 15, 2013 23:25 |
|
Corvettefisher posted:Yeah kinda figured this was one of the general knowledge things, but if you don't this is a great reason to. Wouldn't just bundling them in to vApps give you the same results without selecting specific hardware and making the load-balancing more difficult and manual? movax posted:Ooh, those are cheap, something to consider. What's the OS support like? I guess I'd have to buy a SFF<->4x SATA cable for those. Yes, absolutely. VMXnet3 every single time unless something is so horribly broken that you can't do it. As for the overhead, there are some VMware whitepapers regarding the VMFS overhead compared to direct block-level access and it was maybe 1-3% when properly configured with a paravirtual controller on absolutely insane configurations. Unless you are doing something that is entirely I/O bottlenecked (which I doubt since you are dedicating single spindles to VMs instead of pools) then the overhead likely won't even be detectable in your case. You'll just need to make sure you get the shares set up right so a single VM doesn't crowd out others with IOP requests since you no longer have that hard partitioning between spindles/VMs. BangersInMyKnickers fucked around with this message at 01:17 on Jan 16, 2013 |
# ¿ Jan 16, 2013 01:11 |
|
madsushi posted:I have no idea why your dedupes would be taking that long. I've done a 10TB dedupe job in 24 hours before on SATA disk. It's an old 3020c stuck on 7.something unfortunately and got cut out of NetApp's support, the aggregate is backed with 3 whole shelves of SATA disk (combination 320/500GB disk) and other volumes on that exact same aggregate take a fraction of the time. I have no loving clue here and we have a 3rd party company doing the OnTap support for us now but they're just stabbing in the dark from what I can see.
|
# ¿ Jan 16, 2013 06:23 |
|
NEVER defragment a VMDK as a general rule, and if you're virtualizing Windows 7 then make sure you disable the background defrag task. The only time I would consider this would be if it was a thick provisioned VMDK and the datastore was either a singular disk or a JBOD array where there is actual sequential addressing for it to optimize for. It might also work against a cheap RAID array but that is a bit of a jump. More expensive RAID controllers are going to know how to optimize themselves for the most part. Thin provisioned VMDKs abstract out the disk geometry so a defrag there is going to jumble things up worse than what you had before.
|
# ¿ Jan 16, 2013 19:56 |
|
I'd pull them out and save them for something else you might actually use them in. They're only going to waste power and possibly fail.
|
# ¿ Jan 24, 2013 06:17 |
|
The logs don't take up much space, why not just point them at a SAN datastore so they're actually accessible in the event of a failure? How does it handle things with the vswap running on local disk? I assumed that would cause a vmotion boundary.
|
# ¿ Jan 24, 2013 16:45 |
|
bull3964 posted:Windows 2012 DC is going to be purchased regardless. DC license grants unlimited virtualization rights per physical host (up to 2 procs) so it's way more economical than buying licenses piecemeal. At first I thought you were talking out of your rear end, but I ended up checking and sure enough each Datacenter 2012 licenses now covers 2 cpu packages per host instead of one. That's going to save me some money. As for cost savings with VMware, they can be there with very large installations because you're going to get a better consolidation ratios which is going to reduce rack space, hardware, and power/cooling load. For a small/medium shop where a handful of hosts can handle everything, that probably won't be significant compared to the savings of just virtualizing on anything to begin with. BangersInMyKnickers fucked around with this message at 23:54 on Jan 24, 2013 |
# ¿ Jan 24, 2013 23:51 |
|
ragzilla posted:Surprise, the 2012 DC licenses cost twice as much as 2008 ones, and were traded on a 2:1 ratio when upping from 2008->2012 on SA. My per-license cost didn't change but that's academic prices for you.
|
# ¿ Jan 25, 2013 15:01 |
|
IOwnCalculus posted:Out of curiosity - what's the reasoning for this? I'm using a network configuration that looks effectively identical and I've never enabled promiscuous mode, yet it has no problems that I've encountered. vSwitches (and regular switches), by default, will not forward a frame to the virtual switch port if the destination mac address is not on that receiving end. If you don't allow promiscuous mode and are doing passive, not-inline packet inspection, you'll only end up receiving broadcast frames or those specifically addressed to the MAC address on the vNic for the vm. Promiscuous mode basically turns your vSwitch in to a hub and allows all frames passing through it to be received by all vNics attached to the vSwitch. Then it is a matter of putting your nic in to promiscuous mode inside the OS if it isn't already, otherwise frames not addressed to that receiving MAC will get dropped there as well. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002934 madsushi: What you are describing is the MAC Address Changes policy, not Promiscuous Mode.
|
# ¿ Jan 30, 2013 18:09 |
|
If your pfsense box is a routable gateway then no, there is no need to have it running in promiscuous mode because it is funneling all your traffic regardless. If you were doing packet inspection then I would, because you could theoretically could have some kind of worm that knew to only go host to host and stay inside the subnet while avoiding gateway devices where the inspection typically happens. But that is a massive if and I doubt you need to be concerned about it.
|
# ¿ Jan 30, 2013 19:17 |
|
Erwin posted:Anybody have issues with Windows templates on vSphere 5.1? I had an existing 2008 R2 template that I brought over, and when I deploy a VM from it, it won't join the domain, and the local admin password is not correct (obviously the latter probably causes the former). I rebuilt it from scratch, but it does the same thing. Is there something new with 5.1 that I'm not seeing? You're letting it sit for a while to do the configuration and automatic reboot right? For whatever reason my templates are really slow so after the initial boot they will sit at a logon prompt for about 5 minutes before executing the automated sysprep and reboot. One time I foolishly logged in before that reboot happened which disrupted the whole process and left the VM in a state similar to what you are describing.
|
# ¿ Jan 30, 2013 22:18 |
|
What IP ranges and subnet masks are you using here? I'm assuming this is iSCSI and not NFS, correct?
|
# ¿ Feb 1, 2013 01:19 |
|
|
# ¿ Apr 28, 2024 04:44 |
|
Moey posted:It's iSCSI going between an ESXi host and a Nimble SAN. Entire iSCSI network is a single class C network 192.168.X.X/24 The /24 subnet mask is your problem. The iSCSI initiator wants to see every vm kernel reaching every accessible iSCSI target serving your luns. In your case, a vmkernel on 192.168.1.100 (for example) will only be able to hit targets in the 192.168.1.x subnet, 192.168.[2-254].x is completely inaccessible because the connection thinks in needs to route out through a gateway that doesn't exist. Change your subnet mask to /16 and it should work fine.
|
# ¿ Feb 1, 2013 06:34 |