Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›7 »

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

cheese-cube posted:

We run NetApp VSC in our environment so please report-back when you have a chance (We are only running 5.1 at the moment and an upgrade to 5.5 is a long way off).

You can play around with the VSC web in VMWare Hands on Lab HOL-PRT-1301. It very different as the functionality is largely integrated into the normal views, rather than being segregated into NetApp specific pages, so it can take a big of time to figure out things like modifying backup schedules, or listing available snapshots for restore. It should, however, be faster than the older VSC plug-in, which was painfully, abysmally slow if you were backing up a large number of VMs and datastores. VSC 5 is still only an incremental improvement though. The big changes are coming with VSC 6.

# ¿ Feb 17, 2014 21:23

Adbot: ADBOT LOVES YOU

# ¿ Apr 29, 2024 09:50

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

If your KAVG is low then it's not going to be a queue depth issue, at least not on the host side. You could hit qfull conditions on the array side, but those would still show up a device latency, not kernel latency. To properly tune queue depth you'll need to talk to your SAN admins anyway to determine the fan in ratio and how luns are distributed across ports.

I would do as DAF suggested and make sure you have the latest jab firmware, but it's pretty likely that the problem is on the fabric or array, and not the host.

# ¿ Apr 23, 2014 06:15

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dilbert As gently caress posted:

Is it a bad design decision to separate NFS and ISCSI?

I mean I know you can use NIOC and some things with VMkernels but I thought it was always best to separate storage protocols whenever possible.

The reasons for separating them are generally for ease of management or flexibility rather than performance. Different VLANs will allow you to apply different jumbo frames policies depending on the type of storage traffic, and the CoS example mentioned by 1000101 is another example. Sometimes it also makes sense to segregate not just by protocol but more granularly, by purpose or management domain. For instance we have different VLANs for client NFS, client iSCSI, VMWare NFS and VMWare iSCSI because people assigning IPs on the client side generally aren't the same as people assigning them on the ESX side and they're much more likely to use an IP that is already on the network and cause issues (IP conflicts that take datastores offline suck a lot).

I consider it a best practice because VLANs are free and private IP spaces are free and it makes things a lot cleaner, so why not do it? But it's not a requirement and it won't cause performance problems or anything if you don't do it.

# ¿ Apr 25, 2014 15:13

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

sanchez posted:

Something I've been pondering recently along those lines is what to do with 10GbE. Traditionally with gigabit I've always used separate physical ports for iscsi and had those switch ports on their own vlan to keep the storage traffic isolated from everything else. With 10x the bandwidth available though, is it wise to run LAN/Storage traffic together on the same interface? Still separated with VLANS, but sharing physical ports.

The idea makes me uneasy, but i can't think of a good reason why it wouldn't work as long as the link wasn't saturated.

The likelihood that you have a single host pushing the 1.25 GB/s of throughput required to saturate a single 10GbE link is basically non-existent. With bundled links and iSCSI load balancing your SAN will likely tap out before your network links.

Consolidate as much as possible (lights out will still require 1g, but everything else can be consolidated, for the most part) so you can spend less on ports, on rack space to house switches, on cabling, on management overhead from maintaining a billion different connections, and on unplanned downtime due to unnecessary complexity.

# ¿ Apr 25, 2014 17:15

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dilbert As gently caress posted:

Ehh vMotion can chew up some poo poo when you throw it into maintenance mode, especially if it's running view

Yea, but as you said, you can control that with NIOC so have methods of keeping vMotion from eating up all of your bandwidth. A single vMotion only requires 250 Mbs, or about 1/40th the bandwidth of a 10GbE link, and even then it only requires it for a very brief period. You can also set the number of concurrent vmotions such that they won't come anywhere near saturating the link. If your hosts take somewhat longer to go into maintenance mode it isn't really the end of the world. Especially given that even limiting vMotion traffic to a 1/4th a 10GbE link and limiting the rest for storage will still provide more bandwidth than a 1GbE link.

We have about 600 ESX hosts and all of them run everything other than ILO over dual 10GbE links to Nexus 7k and 5k switches in a vPC. Never had any issues with link saturation.

# ¿ Apr 25, 2014 22:32

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

If your VCenter VM is performing badly it's because you don't have enough host or storage resources, not because VCenter runs badly as a VM. Capacity planning is important!

# ¿ Jul 3, 2014 02:44

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

KS posted:

There's no space reclamation. If that D: drive was ever more full and files were deleted, VMware doesn't know that and can't recover the space at the VMDK level. Would recommend making a new virtual disk for a new D: drive and copying files over within the guest, then deleting the old VMDK if the space is important to you.

Some storage vendors support tools that run within the guest to reclaim space by looking at the block allocation table within the guest and zeroing out unused blocks then hole punching.

# ¿ Jul 21, 2014 21:47

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Number19 posted:

I'm pretty sure that if you svMotion it to a thin provisioned disk and then back to thick (if you care) it will reclaim all that space in the process.

Sorry for the double post, but sVmotion won't help. The blocks are marked as unused within the guest block table, but the hypervisor still considers them in use because it doesn't have access to the guest block map. They need to be zeroed to be reclaimed during a sVmotion and the guest won't zero them when it releases them.

# ¿ Jul 21, 2014 21:53

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Cidrick posted:

You can do this from the host as well, if you have the right VAAI extensions

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2014849

I've done it, once. It was pretty painless actually. Well, the actual SCSI UNMAP was painless, anyway. The part that sucked was when all my VMs on said datastore paused, because the array ran out of space even though the datastore still had over a terabyte of free space. Whoops!

This will reclaim blocks that VMFS has freed on a LUN but that have not been returned to the storage pool because the array does not know the blocks are no longer in use. But it still won't free up space released by the guest because VMFS doesn't know that those blocks are free (unless a guest aware tool tells it).

# ¿ Jul 21, 2014 22:13

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dr. Arbitrary posted:

Is this normal VMware stuff that I really ought to know (as a VCP), or is it more storage stuff that I'd need to know if I was working on the EMCISA?

There is some stuff on the 5.5 VCP exam about thin provisioning and space utilization as viewed at the array versus datastore level, but it doesn't get terribly in depth. And there are no questions about VAAI primitives like LUN Unmap which is used by VMFS to provide free block information back to the array. So it's not something you need to know to pass the VCP, but it is good to know anyway since you need to understand how the multiple layers of storage virtualization interact so you can address capacity issues.

# ¿ Jul 22, 2014 00:49

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dilbert As gently caress posted:

I'm not sure why but this is still a common thing for people to do, defrag their templates/golden images, then SvMotion them.

I am not sure why people do this.

The defrag improves file layout in the guest to resequentialize it, but if it's a thin vmdk it will still be fragmented at the vmfs level so they svmotion it to sequentialize the vmdk layout.

Of course most modern arrays will place blocks wherever they drat well please, especially when thin provisioning is used, so in most cases it has little affect on actual on disk layout and is just a waste of time.

# ¿ Jul 22, 2014 04:46

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Varkk posted:

We don't use Snapshots for our Hyper-V VMs. For backing up we just use the Windows Server Backup tool built in to Server 2012. Seems to do the job without disrupting the VMs etc. Just have it backup the VMs daily to a removable drive or a simple backup scheme.

This has to be an incredibly tiny environment if that is viewed as a sufficient backup scheme.

The ability to leverage snapshots is one of the great things about virtualization. If you aren't doing so then you're probably doing things less efficiently than you otherwise could. They aren't a complete backup solution, but they should be a part of most solutions.

# ¿ Jul 22, 2014 22:54

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Moey posted:

Can you explain this more?

Lets say I have a VM named VM-1 and it's folder is VM-1 and VM-1.vmx along with everything else related. If I change that VM name to VM-2, then vMotion it, it will fix all those file names?

If you SVmotion it the file names will get updated when it gets copied to the new datastore.

# ¿ Aug 4, 2014 22:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Why would you ever want a VM name that is different than the guests host name? What makes more sense than the servers actual name? A mismatch is the sort of thing that gets VMs destroyed accidentally during housecleaning.

# ¿ Aug 5, 2014 07:30

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Martytoof posted:

If my DAVG/rd and DAVG/cmd is through the roof, what are the chances that it could be anything but hardware related? I'm talking in the neighborhood of 400-1500 here. This is a new-ish DL380 G7, P410i with 8 256GB SSDs in a RAID-10 so this thing should be blazing fast, but my VM is choking waiting for I/O and esxtop shows DAVG issues.

All firmware up to date, latest HP ESXi drivers in place for HBA, etc. HP swears up and down that it's not their hardware now and I'm going to have to buy a per-incident support pack from VMware to help prove them wrong which isn't the worst use of $300, but I just know it's going to end up with HP pointing the finger at VMware and vice versa while I twiddle my thumbs.

What kind of workload is running on the disks?

# ¿ Aug 14, 2014 01:26

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Martytoof posted:

Here's a (not so) small excerpt from my vmkernel.log. Definitely seems like there's something going on. Still working on interpreting it.

http://pastebin.com/5910mKX2

All of the errors in there are related to the SCSI Log Sense, ATA Passthrough, and mode sense commands, and not to any actual data manipulation. Those would indicate to me that your raid controller does not support those SCSI commands or some portion of them as the sense data from the target indicates that it considers the Op code or one of the fields in the CBD to be invalid. This could indicate a driver issue with the raid controller, or problem with the way the device is configured to use SATP. What does the "esxcli storage nmp device list -d" command show for that device?

# ¿ Aug 14, 2014 18:22

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

PCjr sidecar posted:

I'm fairly confident that the P410i is choking on the IO that you're throwing at it. Entry-level embedded RAID that would be fine with HDDs falls over badly at SSD IO rates. Consider using straight SAS controllers and add RAID in the VM (if possible) or add more controllers.

This is possible, but without knowing how many IOPs it's dealing with you can't say. If it's really only 600 then it's very likely not simply the card bottle-necking, particularly on a Raid 10. There's also the fact that he said that write latencies are lower than read latencies, which is the opposite of what you would expect generally, unless the card has battery backed write cache, which is an option.

I'd like to see some read and write IOPS and latency metrics before making any judgements.

# ¿ Aug 14, 2014 23:18

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Maneki Neko posted:

This may be a dumb question, but is there a preferred "go to" USB drive for diskless ESXi machines, or should I just be buying a case of 16GB+ flash drives and swapping them out when they die?

Your ESXi boot partition is rarely written and rarely read so you're unlikely to wear out your flash drive any time soon.

# ¿ Aug 20, 2014 18:39

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Cidrick posted:

Do any of you guys use Nimble Storage? We're looking to move off of using Hitachi SAN and getting a dedicated storage array solely for VMs, and Nimble seems like a pretty attractive option, but I'm wondering if there's any horror stories out there since it's still a relatively young technology.

You should post this in the enterprise storage thread, but the general opinion on Nimble is very positive. I've had a couple of horror stories, but every vendor has those so it's not a big deal. If you're fine with iscsi and limited app integration (which you probably are if you're using Hitachi) then you'll probably like it.

# ¿ Aug 29, 2014 16:50

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Mulloy posted:

How well does vmware work with rtp? I'm assuming the issues I see with high co stop values / jitter have to do with optimization or configuration, but I thought I'd ask whether it's something that can work well or of rtp is best handled bare metal.

Cisco deploys deploys voice on UCS, so I'd say it works fine.

# ¿ Sep 17, 2014 01:42

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

VI client 5.5U2 allows you to edit the HW version 8 and below features of version 10 VMs.

http://wahlnetwork.com/2014/09/15/restricted-edit/

# ¿ Oct 24, 2014 03:38

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Virtual switches behave (mostly) like physical switches, so if both guests are on the same host the vSwitch will check its MAC address table to figure out where to forward the packet, and if it determines that the destination is attached to one of its virtual ports it will forward it out to that port. Traffic only crosses the uplink if it needs to leave the vSwitch to get where it's going.

Your second question is basically the same as your first. The guest to guest communication never leaves the host, and the storage traffic would be local to the host as well if passthrough is used.

# ¿ Oct 30, 2014 03:35

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Nukelear v.2 posted:

EVO as I understand is hyper convergence and would be comparable to the new Dell XC series which is their Nutanix platform.
Each blade has it own's storage and tries to keep it's running VM's on that local storage for vastly improved IO because of data locality. Basically a not lovely version of vsan.

EVO just uses vanilla VSAN, which does not enforce node locality for data. Nutanix does attempt to keep data local to the node that owns the VM, but I'm not really sure that's necessary as the latency penalties for cross node acces are pretty low.

# ¿ Nov 7, 2014 19:23

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Nukelear v.2 posted:

It basically takes you back to SAN level performance, which isn't terrible, but obviously local is better and that's a large selling point. Would say it's also critical to being able to build converged platforms at large scale, something everyone is working to get better at. I have no doubt that vsan will eventually get all the features nutanix has, they just aren't there yet.

The extra 50 microseconds of latency per transaction isn't going to be noticeable when disk latencies are going to be an order of magnitude larger, even with SSD or PCI flash. There is basically no performance differential between locally and remotely accessed data when the cache begin used is flash. For RAM the difference is substantial because the latencies involved are in the nanosecond range, so network latency is pre-dominant. Local caching creates bin-packing problems since cache workloads are pinned to hosts, and can't be distributed throughout the cluster to balance resources.

There's no uniform thing that could be described as "SAN level performance," because that's entirely dependent on a lot of factors and you can easily get pretty modest array these days that will give you tens of thousands IOPs at a reasonable block size and sub millisecond latencies. But SAN latency is generally limited by media servicing the request (PCI flash, SSD, spinning disk, ram) and not the method of connecting the SAN to the host. Latencies over FC or modern ethernet switches are much lower than latencies for any persistent media we still have available. And even when you enforce data locality like Nutanix that is still only affecting reads, as writes must be written to other nodes and acknowledged before they can be acknowledged back to the client. So you're basically saying that a few microseconds of latency on certain reads are the difference between good and bad performance in a world where anything less than a couple of milliseconds is considered very good. Whatever you gain on the front end from having data local to the node on Nutanix (which is only true if the VM hasn't moved recently) you probably lose due to having to pass the IO through a VM anyway, since they don't hook in to the kernel and run their storage in user space.

The problems with VSAN are that it's immature and has no *good* integrated solutions for backup and replication. Leveraging VMware snapshots is no good because VMware snapshots suck. Storage level snapshots and replication are a much better proposition and that's where Simplivity and Nutanix are still ahead of the game.

# ¿ Nov 7, 2014 22:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

What lovely SAN takes more than 30 minutes to upgrade?

# ¿ Nov 27, 2014 03:29

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dilbert As gently caress posted:

Equallogic with 8 enclosures

You guys should invest in something not horrible so that upgrading your storage isn't worthy of note.

# ¿ Nov 27, 2014 03:36

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dilbert As gently caress posted:

Wow that is great, I love it, man I wish I worked at a place that gave IT as much budget per the storage capacity it needed!

There are some good options out there now that are as cheap as EQL. It was fine for its time, but it's basically dead as a storage platform moving forward and it's really showing it's age. But even if it wasn't old it's still inexcusable that upgrades are that onerous. Literally everything I've ever worked with takes maybe a half hour and doesn't involve sweaty palms wondering what will break.

# ¿ Nov 27, 2014 03:46

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dilbert As gently caress posted:

Yes I am well aware of those, but when you are told "move poo poo to AWS", and "no budget till 2015 Q2" you make do. Honeslty I don'y understand why SA has such a hard time understanding budget constraints; yes everything could be fixed by money but engineer a solution around those who don't for that project.

I understand budgets, I just don't understand why a storage firmware upgrade is worth bragging about. I'd put that somewhere around patching an ESX cluster on the difficulty scale.

If it's hard enough to merit bragging about then it's bad enough that you should be able to make a case for replacement.

# ¿ Nov 27, 2014 04:01

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

EoRaptor posted:

Having dealt with Equallogic before, DaF is probably not doing anything wrong here. EQL boxes that are grouped together form a storage grid, with each box acting independently to serve its own data, and sharing management info with other boxes to direct requests for data that it doesn't have to the correct box. It means you can add more i/o by adding more boxes in a linear fashion, but the downside is you need to upgrade each box one at a time and in the proper firmware sequence, or the inter-box communication will break and everything will poo poo the bed.

The upgrade process is mostly reliably boring, but it can a take a long, long time to get it done without missing any steps and avoiding rushing any of it.

I understand how EQL works and don't think DAF is doing anything wrong. I just think EQL is bad. You can do scale out, even tightly coupled scale out, without making upgrades long and painful.

# ¿ Nov 28, 2014 22:34

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Tab8715 posted:

I don't know, I'm storage illiterate. You tell me.

No, it's not odd. Storage upgrades don't get harder when the data they hold is worth more, and they don't generally get harder as the amount of data held increases, other than taking more time as you add controllers. It's about as "difficult" as upgrading an ESX cluster or something similar. You load the new software/firmware, fail over in some capacity, and reboot, then repeat on all controllers. It can be pucker inducing, especially the first time or two, but it's not really tricky.

Internet Explorer posted:

I have had EQL, EMC VNX, and HP LeftHand firmware upgrades all go south and cause unexpected downtime.

Sure, things can go wrong. I'd understand a post about that. When things break it's interesting, but when something that works right most of the time works right again it's not really that interesting or instructive. If it's your first time doing it then I can understand wanting to make a little happy, self-congratulatory post, but DAF is past that point in his career.

As far as why you'd be cautioned not to upgrade if you don't have a reason to...well, because you don't have a reason to. Upgrades themselves are often trivial, but bugs and regressions are common. If it ain't broke, don't fix it is generally a good rule for any critical infrastructure. Since storage isn't generally as exposed to security flaws and new features aren't added at the rate you get with something like ESX, the reasons to upgrade are usually limited to mitigating bugs or performance issues.

YOLOsubmarine fucked around with this message at 06:59 on Nov 29, 2014

# ¿ Nov 29, 2014 06:52

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

adorai posted:

Gotta love silos. I can assure you, it's always the storage.

Nonsense, it's always the network. Unless you're using FC, then it's the queue-depth.

# ¿ Jan 6, 2015 01:43

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Maneki Neko posted:

Not sure if this is a better storage or virtualization thread question, but anyone running Simplivity that would be willing to share their thoughts?

Probably a better question for the storage thread. What are your questions? It's hyperconverged with the usual pro's and con's of hyperconverged. It's doesn't have a large presence so there's not a ton of information out there. We're looking at partnering with them, but haven't gotten any demo equipment yet unfortunately. Our Nutanix demo didn't go very well.

# ¿ Jan 22, 2015 00:09

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Moey posted:

Care to expand upon this?

It was before my time, but the guys here doing our testing saw really poor performance in benchmarks. An order of magnitude lower than expected, and than other similar hybrid products. Apparently they spent a lot of time working with Nutanix support on it trying to figure out why it was doing so poorly, but could never get it to perform as expected. It was a poor enough experience that we don't partner with them.

# ¿ Jan 22, 2015 03:06

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

TeMpLaR posted:

Anyone ever heard of a place that uses NFS for all VM Guest OS's but uses software iscsi inside the guest for all additional drives?

I've seen that done for Exchange before, but not for everything. Any ideas why anywhere would do this and not convert everything over to NFS that isn't a cluster?

I am finding lots of examples of how to stop doing it, saying that it is antiquated. Maybe this is just a status quo kind of thing.

http://www.virtuallanger.com/2014/03/04/converting-in-guest-iscsi-volumes-to-native-vmdks/

Lots of reasons, most having to due with using array snapshots for application level backup and cloning. If I have a particular set of data that I want to clone (test/dev data refresh, file level restore from snapshot) then it's easier to deal with that data if it lives on a LUN attached to a host than if it's in a VMDK in a datastore.

Cidrick posted:

I recently started in a shop and inherited this kind of mess. Most of our Microsoft infrastructure is running on NFS-backed datastores, but our company DFS infrastructure for home drives and shared drives and whatnot are all connected to the same NetApp via iSCSI. I believe it was done for control reasons - really, political reasons - because the team that owned the Microsoft stuff at the time wanted to be able to control their own vfiler, so they were given their own aggr and vfiler on the netapp and given the keys to do whatever they needed to with it, and the storage team got to be hands-off on supporting that slice of the NetApp.

There's not really any technical reason I can think of why you'd want to do that, though. Not that I can think of, anyway.

The technical reason for doing it would be to streamline the restore workflow if you're using storage snapshots to recover that data. If you're attempting to restore data from the LUN (users home directory deleted accidentally) you would clone the LUN, map it to the host, rescan disks, and mount it on a mountpoint, then you can copy off whatever you want. Snapdrive can do all of this for you from a simple GUI. If the data is in a VMDK you have to clone the datastore, mount the cloned datastore to the ESX host/cluster, go to the ESX cli and use vmkfstools to modify the UUID of the cloned disk so it doesn't conflict with the active disk, select attach disk in the guest settings, browse to the cloned VMDK and select it, then go to the guest and rescan disks, and mount the disk. The cleanup process for both has a similar number of steps. The process when the data is on a LUN is cleaner and simpler, as well as being easier to script. It also doesn't require logging in to VCenter and modifying guest properties so it doesn't require administrative rights to VCenter, which is useful when you want to allow the data owner to perform restores without giving them VCenter access and the ability to edit settings on their VMs.

YOLOsubmarine fucked around with this message at 01:28 on Feb 11, 2015

# ¿ Feb 11, 2015 01:10

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

syg posted:

Ok heres a question.

SQL-based application that has one SQL enterprise server with 6 app servers, all windows.

I'm reconfiguring the volumes/datastores for SRM and trying to decide between putting all 7 servers on one datastore for SRM simplicity or splitting them up to DATA and APP datastores. SRM seems to imply that the best practice is grouping app-related guests into a datastore together, but I have heard that there can be contention for access to a datastore between guests when load is heavy. Any idea which way is best practice? The app servers interface with the clients via the internet and pull data from the SQL server.

Whether or not there are problems with contention depends on your storage. From an IO perspective different datastores may all be containers virtualized on the same set of spindles, so splitting them won't lessen spindle contention. If you're using block storage there can also be lock contention but that is much improved with the VAAI ATS primitive, if your storage and Vsphere version support it.

So really you need to give more detail. From an SRM perspective keeping everything that is part of the application "bundle" together makes things easier.

# ¿ Feb 14, 2015 00:53

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

evol262 posted:

Hardware assisted locking and copy offload both sometimes misbehave.

Broadly, some unnamed vendors are really good at pushing APIs that other companies have a lot of trouble implementing, especially in the software defined networking and storage api spaces.

I think SCSI UNMAP is really the poster child for this. It absolutely destroyed array performance when first introduced, to the point where VMware disabled the functionality and turned it into a manual, CLI only option.

# ¿ Feb 20, 2015 06:00

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

BangersInMyKnickers posted:

My big warning with host caching tech is that its only going to help with "normal" workloads.

It's not just normal workloads, it's random read workloads, specifically. Most host based flash will only cache reads, and only random reads. Some, like Pernix, can cache writes as well, but write caching comes with some tradeoffs and is generally less beneficial than read caching because the data must destage to HDD eventually, so it doesn't remove load from the backing storage, it just smooths it some.

It's still useful for writes in the sense that the IO it removes from the disks that would normally be used to service random reads can now be used to process write IO, but it's still important to know what your workloads are to determine what sort of benefits you will see.

Stuff like VDI can be a lot more write intensive than people think, so it's important to make sure that you've got enough IO in your stable storage to support that write activity, irrespective of what you're doing with host caching.

# ¿ Mar 11, 2015 23:37

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Zero VGS posted:

I'm highly discouraged from using VMware under any circumstance (my company competes with one of their products, rather antagonistically), so I'm gonna have to figure out how to do it all on Hyper-V.

First, I did buy a couple spares of every component so it won't be a scramble if anything goes.

Second, the phones and service is on a support contract. The phone management/voicemail/UC server that we have is a Dell Celeron with a single HDD, modifying it voids warranty, and they have no option for anything nicer. They said I could give them a VM to migrate to and keep the application support while giving up the hardware support on what was a time bomb anyway. Considering the circumstances I found that more prudent since the servers I make can run other stuff too.

I swear I'm not as kamikaze as I sound and I'm pretty drat resourceful in practice. I don't want to rely on these support contracts and SLAs because so far Microsoft, HP, Shoretel, and our ISP have all treated them like toilet paper on multiple occasions.

Still, sorry for the ranting, I expected a heap of criticism and naturally I've got a lot more reading to do before I finalize the design of all this. I appreciate the inputs and I at least have a few months more lead time to play around with all this stuff and get some sanity checks before I flip the switch.

Buy a storage array with redundancy and non-disruptive failover if 100% uptime is a requirement. You don't have any other option unless you can write your own high performance shared-nothing clustered filesystem and run it on the underpowered hardware you're purchasing.

# ¿ Mar 25, 2015 22:55

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Misogynist posted:

Why would you have to write your own when Ceph and GlusterFS work fine? Not that I'd recommend them for anyone who doesn't know what they're getting into.

Most software based scale out architectures are tuned for the sort of access patterns driven by big data, not low latency small io random workloads like a general purpose VMWare cluster is going to run. You could certainly make them work for that purpose if you were willing to put enough hardware in place, but then why not just buy a small storage array to run two hosts worth of VMs?

Nobody who is talking about putting together a two node cluster out of spare eBay parts is in the market for ceph or gluster, hence the tongue in cheek comment that he'd need to write his own software to do want he wants: give him really cheap, highly available shared storage.

# ¿ Mar 26, 2015 01:22

Adbot: ADBOT LOVES YOU

# ¿ Apr 29, 2024 09:50

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Dr. Arbitrary posted:

Anyone have an opinion Tintri? I got a call from one of their salespeople but I figured I'd ask around before heading their pitch to make sure I'm not wasting my time.

I like their stuff. We run out lab and training on it. It's pretty fast, easy to use, and integrates well with VMWare. It is virtual storage only though, and the company is still fairly young so who knows if they will still exist in five years.

# ¿ Mar 29, 2015 17:26

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›7 »