Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

greatapoc: Apr 4, 2005

We've just recently bought a bunch of new Dell R640 servers to replace our aging HP c7000 blade chassis hosting our Hyper-V infrastructure. We're experiencing poor network performance on the new servers though and I'm trying to track down the source of it. Our original environment was 2012r2 hosts. We had to inplace upgrade these to 2016 to raise the functional level before adding the new 2019 Dell hosts to the cluster and then live migrating all the guests over to the new hardware. We've still got some VMs stuck on one of the old hosts but we have a plan around that. Anyway, this gives us something to work with and compare for the poor network issues we're seeing.

Differences I can see between the old host and the new: jumbo frames disabled on the new, receive and transmit buffers set to 512 on the new and auto on the old. Old host has three 1GB NICs in a LACP team to two Cisco switches, new host has 2 10GB NICs in a LACP team to the same switches. iPerf test from a VM on the old host to my PC saturates the 1Gbps link into my PC but the new host only pushes about 500mbps. Transfer between VMs on the same host is around 2gbps, transfer between VMs on different hosts is around 600mbps. VMQ is enabled, NICs are Intel X710 on the Dells.

Not sure what else to mention. I was going to try changing the NIC team from LACP to "Switch Embedded Teaming" tonight in an outage window as well as enabling jumbo frames to see if it makes a difference. Does anyone have any ideas of things to look at?

# ¿ May 6, 2020 07:48

Adbot: ADBOT LOVES YOU

# ¿ Apr 25, 2024 15:52

greatapoc: Apr 4, 2005

BangersInMyKnickers posted:

The x710's are unstable trashfires, especially on bonded links.

Well that's just bloody great, we just took delivery of the things two weeks ago. Looks like we might have to go back to Dell and ask for something else. The thing is we've never had any problem with it dropping the connection, the port-channels are rock solid and haven't missed a beat. The performance is just terrible.

# ¿ May 6, 2020 23:58

greatapoc: Apr 4, 2005

Touch wood it looks like I may have fixed it but I'm not sure exactly which part did it.

Removed the team and recreated it (still using LACP)
Enabled jumbo frames on both NICs
Increased receive and transfer buffers to 4096
Added reg key HKLM\SYSTEM\CurrentControlSet\Services\VMSMP\Parameters\TenGigVmqEnabled=1 (VMQ was already enabled on the VMs)
Rebooted host

iperf and file transfers are now flying like they should but failover cluster manager is throwing up it's hands so I need to do more with that.

Edit: Here's a capture from where I have it running on one of the Dells then live migrate it to the one I've just (hopefully) fixed.

[ 4] 2.00-3.00 sec 54.5 MBytes 456 Mbits/sec
[ 4] 3.00-4.00 sec 25.9 MBytes 217 Mbits/sec
[ 4] 4.00-5.00 sec 49.8 MBytes 419 Mbits/sec
[ 4] 5.00-6.00 sec 43.4 MBytes 364 Mbits/sec
[ 4] 6.00-7.00 sec 48.2 MBytes 405 Mbits/sec
[ 4] 7.00-8.00 sec 49.4 MBytes 414 Mbits/sec
[ 4] 8.00-9.00 sec 39.8 MBytes 334 Mbits/sec
[ 4] 9.00-12.49 sec 7.50 MBytes 18.1 Mbits/sec
[ 4] 12.49-12.49 sec 0.00 Bytes 0.00 bits/sec
[ 4] 12.49-12.49 sec 0.00 Bytes 0.00 bits/sec
[ 4] 12.49-13.00 sec 2.62 MBytes 43.1 Mbits/sec
[ 4] 13.00-14.00 sec 111 MBytes 929 Mbits/sec
[ 4] 14.00-15.00 sec 112 MBytes 936 Mbits/sec
[ 4] 15.00-16.00 sec 110 MBytes 919 Mbits/sec
[ 4] 16.00-17.00 sec 110 MBytes 922 Mbits/sec
[ 4] 17.00-18.00 sec 105 MBytes 880 Mbits/sec
[ 4] 18.00-18.58 sec 56.5 MBytes 813 Mbits/sec

greatapoc fucked around with this message at 01:28 on May 7, 2020

# ¿ May 7, 2020 01:05

greatapoc: Apr 4, 2005

greatapoc posted:

Touch wood it looks like I may have fixed it but I'm not sure exactly which part did it.

Removed the team and recreated it (still using LACP)
Enabled jumbo frames on both NICs
Increased receive and transfer buffers to 4096
Added reg key HKLM\SYSTEM\CurrentControlSet\Services\VMSMP\Parameters\TenGigVmqEnabled=1 (VMQ was already enabled on the VMs)
Rebooted host

iperf and file transfers are now flying like they should but failover cluster manager is throwing up it's hands so I need to do more with that.

So it looks like I spoke too soon on this one. Although iperf and file transfers were a lot better, once we moved SQL over to it some applications couldn't connect to it and others were showing very slow queries. It appears we've fixed it by disabling RSC on the virtual switch.

# ¿ May 13, 2020 01:49

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs