Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »

Perplx: Jun 26, 2004; Best viewed on Orgasma Plasma; Lipstick Apathy

Hughlander posted:

I'm trying to get nested virtualization (Proxmox=>Win10=>Virtualbox=>Linux) working on a Ryden 3900.

I've followed the guide at https://pve.proxmox.com/wiki/Nested_Virtualization I've read the forums such as https://forum.proxmox.com/threads/windows-10-1809-nested-virtualization-does-not-work.52554/ but it still doesn't work. Currently, I have the VM set as:
code:
agent: 1
args: -cpu 'host,+svm,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,hypervisor=off'
bios: ovmf
bootdisk: scsi0
cores: 4
cpu: host,hidden=1,flags=+virt-ssbd;+amd-ssbd
efidisk0: rpool:vm-113-disk-1,size=1M
hostpci0: 2b:00,pcie=1
ide0: local:iso/virtio-win.iso,media=cdrom,size=363020K
machine: q35
memory: 32768
name: t-pain
net0: virtio=D6:13:96:FB:4A:7B,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
scsi0: rpool:vm-113-disk-0,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=5f58ad26-c659-4542-8fe4-536d73cd0323
sockets: 1
vga: none
vmgenid: 51e24c5f-5fa2-4a4f-a625-60b31dcdd633
runs fine and GPU passthrough on it is working. However, when I go to start any Virtualbox Linux VM inside it, the kernel locks up during the boot process. I suspect that the AMD path is less documented than the Intel one and I'm missing something simple. Has anyone gotten nested virtualization to work on a Ryzen with Proxmox + Win10 + Virtualbox? Or any ideas how to troubleshoot this?

I had to add vmx,rdtscp to the cpu line to get docker working on fedora kvm =>OSX=> Docker.

# ? Mar 26, 2020 16:55

Adbot: ADBOT LOVES YOU

# ? May 4, 2024 14:40

Hughlander: May 11, 2005

Perplx posted:

I had to add vmx,rdtscp to the cpu line to get docker working on fedora kvm =>OSX=> Docker.

Thanks. I�m trying something a bit different now and am running into crappy x11 driver issues. Trying to do a kvm => Linux => virtualbox. I�ll look into those two flags however.

# ? Mar 26, 2020 18:58

GrandMaster: Aug 15, 2004; laidback

Bob Morales posted:

I have a Linux VM that will not stay on the network. It drops off, the MAC disappears from the switches. I can't talk to it until I start a ping from that virtual machine, using the console.

We had this with RHEL vms, they would randomly drop off the network, but if you vmotion the vm to another node it comes back. Starting a ping from the guest vm didn't fix it though.

Turned out to be an esx driver issue.
If you're running HPE servers/qlogic nics it might be this.

https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-a00076313en_us

# ? Mar 28, 2020 02:34

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Has anyone else been experiencing system stalls and nmi softlocks on Epyc1 servers? We've been chasing this problem for going on for almost a year now through RHEL 7.6-7.8 and its been driving us crazy. Seems to be IOMMU related and we've only now started to see some relief from it by forcing the amd iommu driver in to fullflush mode on the kernel parameters.

# ? Mar 28, 2020 16:19

Actuarial Fables: Jul 29, 2014; Taco Defender

I was fighting with GPU passthrough again with kvm/qemu, figured I'd share in case anyone else runs into issues.

Setup:
OS: Proxmox 6.1 (Debian)
CPU: Ryzen 2700x
Motherboard: Asus X470-Pro Prime, 1.0.0.4 patch B firmware
Host GPU: Nvidia Quadro K600
Guest GPU: Nvidia GTX 1080

Kernel cli params, vfio, and lspci stuff: https://pastebin.com/fgXuk1X1

Everything works flawlessly when I don't have a display device attached to the guest GPU when the host is booting. Attaching a monitor afterwards doesn't mess it up and the guest VM is as happy as can be, but this manual intervention isn't desired.

Booting the host with a monitor attached to the guest GPU will prevent passthrough as the framebuffer claims some memory, even with "video=vesafb:off,efifb:off" kernel parameters supposedly disabling them. Without the parameters there is a lot more log output, with the parameters there's only a few lines, either way the GPU can not be passed through. I was able to solve this issue previously by selecting which GPU the motherboard uses and eventually hands over to the OS, but the Asus board doesn't have that feature.

Trying to start the guest VM will output this message in dmesg

code:

[   87.889480] vfio-pci 0000:09:00.0: BAR 1: can't reserve [mem 0xe0000000-0xefffffff 64bit pref]

Looking at /proc/iomem...

code:

c0000000-fec2ffff : PCI Bus 0000:00
  ...
  e0000000-f1ffffff : PCI Bus 0000:09
    e0000000-efffffff : 0000:09:00.0
      e0000000-e02fffff : efifb
    f0000000-f1ffffff : 0000:09:00.0

The solution was to remove vesafb:off from the kernel parameters, leaving video=efifb:off :doh:

# ? Apr 8, 2020 13:20

Mr Shiny Pants: Nov 12, 2012

Actuarial Fables posted:

I was fighting with GPU passthrough again with kvm/qemu, figured I'd share in case anyone else runs into issues.

Setup:
OS: Proxmox 6.1 (Debian)
CPU: Ryzen 2700x
Motherboard: Asus X470-Pro Prime, 1.0.0.4 patch B firmware
Host GPU: Nvidia Quadro K600
Guest GPU: Nvidia GTX 1080

Kernel cli params, vfio, and lspci stuff: https://pastebin.com/fgXuk1X1

Everything works flawlessly when I don't have a display device attached to the guest GPU when the host is booting. Attaching a monitor afterwards doesn't mess it up and the guest VM is as happy as can be, but this manual intervention isn't desired.

Booting the host with a monitor attached to the guest GPU will prevent passthrough as the framebuffer claims some memory, even with "video=vesafb:off,efifb:off" kernel parameters supposedly disabling them. Without the parameters there is a lot more log output, with the parameters there's only a few lines, either way the GPU can not be passed through. I was able to solve this issue previously by selecting which GPU the motherboard uses and eventually hands over to the OS, but the Asus board doesn't have that feature.

Trying to start the guest VM will output this message in dmesg
code:
[   87.889480] vfio-pci 0000:09:00.0: BAR 1: can't reserve [mem 0xe0000000-0xefffffff 64bit pref]
Looking at /proc/iomem...
code:
c0000000-fec2ffff : PCI Bus 0000:00
  ...
  e0000000-f1ffffff : PCI Bus 0000:09
    e0000000-efffffff : 0000:09:00.0
      e0000000-e02fffff : efifb
    f0000000-f1ffffff : 0000:09:00.0
The solution was to remove vesafb:off from the kernel parameters, leaving video=efifb:off

Weird, I never had to do this. I do have some initramfs conf which controls what gets loaded first. VFIO before NVidia etc.

# ? Apr 9, 2020 07:20

wargames: Mar 16, 2008; official yospos cat censor

I am running proxmox on a dell r610 with a h700 raid controller in raid 50. the issue that i am having is that the windows 10 vm i have installed has terrible write performance, and I want to know what i did wrong because the read performance is great. I do have writeback cache enabled.

# ? Apr 15, 2020 23:31

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

wargames posted:

I am running proxmox on a dell r610 with a h700 raid controller in raid 50. the issue that i am having is that the windows 10 vm i have installed has terrible write performance, and I want to know what i did wrong because the read performance is great. I do have writeback cache enabled.

Define terrible

What drives are you using and how many drives are in your raid

# ? Apr 15, 2020 23:34

wargames: Mar 16, 2008; official yospos cat censor

Bob Morales posted:

Define terrible

What drives are you using and how many drives are in your raid

6 drives are in my raid
PNY cs900

I really do not understand the write latency

did i build the hard disk wrong by making it a sata device, and qcow2 format?

wargames fucked around with this message at 23:52 on Apr 15, 2020

# ? Apr 15, 2020 23:45

Potato Salad: Oct 23, 2014; nobody cares

Now just what in the devil is happening with those 4K blocks

What's piquing my interest is the fact that the 512 blocks are doing fine

# ? Apr 16, 2020 00:21

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Your controller might be disabling the cache onboard the ssd which is probably important for a low-end TLC drive like that to get good write performance. I'd also try whatever permutations of write-back/write-through and drive cache enabled/disabled to see what works best for your workload. What stripe size did you end up using?

e: Also try running the bench with a much smaller test file size, if its dramatically faster then we're likely hitting cache exhaustion and the limits of what those TLC chips can do in a raid5

# ? Apr 16, 2020 00:29

wargames: Mar 16, 2008; official yospos cat censor

Potato Salad posted:

Now just what in the devil is happening with those 4K blocks

What's piquing my interest is the fact that the 512 blocks are doing fine

512 writes still seem very slow.

BangersInMyKnickers posted:

Your controller might be disabling the cache onboard the ssd which is probably important for a low-end TLC drive like that to get good write performance. I'd also try whatever permutations of write-back/write-through and drive cache enabled/disabled to see what works best for your workload. What stripe size did you end up using?

e: Also try running the bench with a much smaller test file size, if its dramatically faster then we're likely hitting cache exhaustion and the limits of what those TLC chips can do in a raid5

This i do not know as i didn't mess with the advance options in the h700 settings, also its raid 50 so i think its better for right then plain raid 5.

edit:
ran some tests on proxmox itself to see if the write speeds are also terrible, and they are!

root@proxmox:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 80.8637 s, 13.3 MB/s
root@proxmox:~#
root@proxmox:~# dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 7.85938 s, 65.1 kB/s

edit2:
going to raid0 these disk and just see how fast raid can go.

wargames fucked around with this message at 01:28 on Apr 16, 2020

# ? Apr 16, 2020 00:29

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Flash that bitch to IT mode, run JBOD and do the RAID with Linux software RAID

# ? Apr 16, 2020 02:51

wargames: Mar 16, 2008; official yospos cat censor

Bob Morales posted:

Flash that bitch to IT mode, run JBOD and do the RAID with Linux software RAID

h700 can't be flashed to IT mode.

# ? Apr 16, 2020 02:55

wargames: Mar 16, 2008; official yospos cat censor

Raid 50 was the issue for writes even if it has insane good reads

raid0

# ? Apr 16, 2020 04:07

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

wargames posted:

root@proxmox:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 80.8637 s, 13.3 MB/s
root@proxmox:~#
root@proxmox:~# dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 7.85938 s, 65.1 kB/s

These tests aren't good. In general dd is not a great way to test real work disk performance, but these especially are poor indicators. The first one asks for a single synchronous 1GB write, which doesn't reflect anything in the real world and is probably choking on any number of things including the 512MB write back cache in your raid controller.

The other test is doing tiny tiny writes in a single serial stream which is probably exposing the limits of your dd job to dispatch writes quickly enough as soon as an acknowledge write comes in, more than anything else. Filesystems and storage devices do all sorts of tricks to batch and coalesce writes to make things perform better and you're disabling all of them with this type of job.

This is that same test on a GCP instance with locally attached SSD:

quote:

[root@instance-2 ~]# dd if=/dev/zero of=/dev/sdb1 count=1000 bs=512 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 7.21606 s, 71.0 kB/s

It's quite slow, which it shouldn't be because I know GCP provides performance characteristics for their local SSDs that are well above this. If we look at blktrace we see:

quote:

[root@instance-2 ~]# btt -i bp.bin
==================== All Devices ====================
ALL MIN AVG MAX N
--------------- ------------- ------------- ------------- -----------
Q2Q 0.000181229 0.007864226 0.020587810 7363
Q2G 0.000000419 0.000001735 0.000021167 7364
G2I 0.000001118 0.000003019 0.000025433 7364
I2D 0.000000405 0.000001162 0.000015231 7364
D2C 0.000067840 0.000120699 0.001485551 7364
Q2C 0.000070723 0.000126614 0.001493032 7364

The rows that matter here are Q2Q, D2C and Q2C.

The D2C row shows how long it is taking the actual disk to respond once the IO has passed through the queue and been issued to it. We are waiting an average of 120us for the disk and a maximum of 1.4ms, so our disk is responding about as quickly as we'd expect a SAS SSD to respond.

The Q2C row shows how long it takes the total process from IO request being put on the queue by the application (dd in this case) to when it is completed (an acknowledgement that the data was written is given to the application). We're averaging about 126us for total time, so we aren't spending much time in the filesystem layer at all, only about 6us.

The Q2Q row tells us that the vast majority of our waiting time, 7.8ms on average, is spent waiting for IO to come in from the application. So dd is taking about 7.8ms between ever IO it issues, despite the fact that IOs are being completed in well under 1ms on average. This is basically what happens when you have no concurrency because you have a single thread issuing IOs serially and then waiting for them to complete.

This is a test of the same disk with FIO, a workload generation tool which is much better for benchmarking since it supports multiple simultaneous jobs for concurrency, and also using a sane block size (I've cut out most of the details but left the bottom line numbers):

quote:

root@instance-2 ~]# fio --filename=/root/test/testfile --size=10GB --direct=1 --rw=randwrite --bs=32k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 --time_based --group_reporting
--name=throughput-test-1
throughput-test-1: (g=0): rw=randwrite, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=64
...
fio-3.7
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=390MiB/s][r=0,w=12.5k IOPS][eta 00m:00s]
throughput-test-1: (groupid=0, jobs=4): err= 0: pid=2550: Thu Apr 16 02:57:07 2020
write: IOPS=12.5k, BW=391MiB/s (410MB/s)(45.8GiB/120020msec)
slat (usec): min=3, max=20522, avg=10.45, stdev=121.03
clat (usec): min=1532, max=55321, avg=20462.02, stdev=1116.78
lat (usec): min=1543, max=55329, avg=20472.58, stdev=1113.00

Run status group 0 (all jobs):
WRITE: bw=391MiB/s (410MB/s), 391MiB/s-391MiB/s (410MB/s-410MB/s), io=45.8GiB (49.2GB), run=120020-120020msec
Disk stats (read/write):
sdb: ios=0/1498710, merge=0/23, ticks=0/30438735, in_queue=30441377, util=100.00%

My advice would be to test using FIO or some other actual benchmarking tool, with sane block sizes (somewhere between 4k and 64k generally), preferably with fairly long running jobs (at least a few minutes, low end SSDs have issues with sustained performance that you want to suss out), and on a single disk using multiple concurrent jobs. See where that gets you and then start building up from there to test multiple disks in a raid configuration and then virtual disks running on the hypervisor.

There are so many things that could be at issue:
- Inexpensive SSDs that are likely to fall over under sustained write workloads
- H700 raid controller caching policy
- H700 interaction with drive firmware and drive firmware settings (I.e. disk cache enabled/disabled...these drives don't have a cache, so who knows whether it matters or not)
- H700 stripe size
- Filesystem parameters like block size or cluster size (what filesystem is Promox using?)
- Kernel parameters controller IO
- Virtual Machine OS and filesystem parameters
- Poorly constructed benchmarks

# ? Apr 16, 2020 05:17

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

In addition to all that, check out what IO scheduler is being used for the block device. cfq is probably the default which is essentially making the OS attempt to re-create functionality that already exists on your raid controller. noop is probably the best choice for raid groups like you are using.

# ? Apr 16, 2020 14:48

Mr Shiny Pants: Nov 12, 2012

You using ZFS on Proxmox? Turn-on async for a laugh.

# ? Apr 16, 2020 15:03

Tatsuta Age: Apr 21, 2005; so good at being in trouble

How are you guys configuring your containers in proxmox? Are you using mostly VMs and ignoring the LXC parts? It seems like getting a new container as-code deployed/configured requires a bunch of things that get 80% of the way there and then shoehorning in other stuff around it. I'm about to give up and just go with ubuntu server instead.

(My main problem after getting through other issues is now that I can't properly update the container.conf for a new container with bind mounts to pass /dev/dri for quicksync stuff, which was my whole main reason for wanting to do this in the first place)

# ? Apr 16, 2020 17:08

Hughlander: May 11, 2005

Mr Shiny Pants posted:

You using ZFS on Proxmox? Turn-on async for a laugh.

Tatsuta Age posted:

How are you guys configuring your containers in proxmox? Are you using mostly VMs and ignoring the LXC parts? It seems like getting a new container as-code deployed/configured requires a bunch of things that get 80% of the way there and then shoehorning in other stuff around it. I'm about to give up and just go with ubuntu server instead.

(My main problem after getting through other issues is now that I can't properly update the container.conf for a new container with bind mounts to pass /dev/dri for quicksync stuff, which was my whole main reason for wanting to do this in the first place)

Note this is home use:

I have two nodes, one is the NAS that has a couple of LXCs for keeping the system as pure as possible:
- FileServices doing NFS and smbd
- Print service doing cups and airprint/google cloud print (Still need to replace that.)
- Plex media server

The second one is much bigger but only has 2 M2.s mirrored so storage is on the main system. It has a mixture of LXCs and VMs as well as is running docker raw on proxmox for zfs filesystems. (This part is a mistake and will likely go away soon.)
LXCs:
- Ansible development
- Soon to be LPMUD with VPN to digital ocean
- Minecraft Bedrock servers
VMs:
- Windows 10 Enterprise - PCI passthrough of video card
- Ubuntu Desktop
- 4 nodes kubernetes cluster
- 3 Android-x86 instances I'm experimenting with
Docker:
- 53 containers across 3-4 compose files, I'm trying to switch to be portainer managed but not ther eyet.

What's your problem with the container? I pass usb through to the print server just fine, here's the config:

quote:

lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
lxc.cgroup.devices.allow: c 189:* rwm
lxc.mount.entry: /dev/bus/usb dev/bus/usb none bind,optional,create=dir
lxc.mount.entry: /dev/usb/lp1 dev/usb/lp1 none bind,optional,create=file

# ? Apr 16, 2020 19:32

Tatsuta Age: Apr 21, 2005; so good at being in trouble

How are you setting up that configuration file? I've tried a couple different things for "configure deploy container on proxmox side, then configure from inside the container itself", and every way to do it had a bunch of issues, but using ansible with the proxmoxer module got me MOST of the way there...

https://docs.ansible.com/ansible/latest/modules/proxmox_module.html

So I can't configure additional lines in the conf for mounting, it just does the basics like cores, memory, etc etc. So I figured I would just directly call the proxmox host and edit the conf file directly after creation to add a couple lines, and I got weird permissions errors I couldn't seem to resolve, and it was pretty kludgy that way anyway. I can edit manually on the host but I don't really want to get in the habit of doing things manually, since half the reason I'm going this route is to learn a new thing and get more comfortable with using ansible.

I just didn't know if people were doing their configs using ansible/proxmoxer as well, or there was a better tool for it.

# ? Apr 16, 2020 19:42

Hughlander: May 11, 2005

Tatsuta Age posted:

How are you setting up that configuration file? I've tried a couple different things for "configure deploy container on proxmox side, then configure from inside the container itself", and every way to do it had a bunch of issues, but using ansible with the proxmoxer module got me MOST of the way there...

https://docs.ansible.com/ansible/latest/modules/proxmox_module.html

So I can't configure additional lines in the conf for mounting, it just does the basics like cores, memory, etc etc. So I figured I would just directly call the proxmox host and edit the conf file directly after creation to add a couple lines, and I got weird permissions errors I couldn't seem to resolve, and it was pretty kludgy that way anyway. I can edit manually on the host but I don't really want to get in the habit of doing things manually, since half the reason I'm going this route is to learn a new thing and get more comfortable with using ansible.

I just didn't know if people were doing their configs using ansible/proxmoxer as well, or there was a better tool for it.

Ok, this predates when I was using Ansible. Basically proxmoxer is just a layer over the API, and the API https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/lxc/{vmid}/config says that you can use called 'lxc' But I'd just get the drat thing working first since it's a 1 off. Then go and get it repeatable. Particularly since I assume you're only passing this through to one container right?

# ? Apr 16, 2020 19:56

Tatsuta Age: Apr 21, 2005; so good at being in trouble

Hughlander posted:

Ok, this predates when I was using Ansible. Basically proxmoxer is just a layer over the API, and the API https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/lxc/{vmid}/config says that you can use called 'lxc' But I'd just get the drat thing working first since it's a 1 off. Then go and get it repeatable. Particularly since I assume you're only passing this through to one container right?

Yeah, it's just for this one container, but I am still trying to get it all immutably configured if possible. Because why not?

And I learned that the configuration stuff I need is just a single file in /etc/pve/lxc/container_id.conf, so it's easy enough to just do a file interaction on it with ansible before starting up, and it appears to be working mostly fine. So, yay!

# ? Apr 17, 2020 01:48

Mr Shiny Pants: Nov 12, 2012

Hughlander posted:

?

It was for wargames, if you are running ZFS on proxmox you might try zfs sync disabled and check the performance. It will probably be magnitudes faster. Mine went from 150 - to over 700MB sec.

As I favour data integrity above speed, I enabled sync again but it was nice seeing what it was capable off.

# ? Apr 17, 2020 06:56

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

poo poo pissing me off.

I finally got our Veeam backups running again. We had one job with 57 VM's which would run fine but the resulting backup file was too big to run in one batch, especially to the cloud. I split it up into a couple smaller backups, finally got the Synology to work faster than 20-30MBs, and then I started having this error:

We have a Barracuda Email Archiver, and a 1.8TB drive on our SAN was mapped to our Veeam server and then the Archiver was using that as a 'RAID' disk as a mirror of the internal drive. wtf

Another drive on the datastore was thin provisioned and it ran out of space, causing that error. After allocating some more room on the SAN I got the backups running again.

Plan is to remove that mapping and just backup the Barracuda using SMB through Veeam, no need to make a datastore for it.

Then I had a notification that "Virtual machine disks consolidation is needed". Alright, started that, and the host in my cluster that the Veeam VM is on ended up going 'Not Responding' for a while, and then the consolidation task said it's unable to reach the host. It got to 20% after about 3 hours of running...

It appears that the task is still running. I can still use the VM's on the host just fine. Do I just let this puppy run for another 12-15 hours and see if the host pops back online?

# ? Apr 17, 2020 16:48

Internet Explorer: Jun 1, 2005

Do you have a bunch of orphaned snapshots on that VM, or a snapshot that is old on a machine that does a lot of writes? Snapshot consolidation can get very taxing when they've had a lot of writes since their inception.

In Veeam, there is an option to split each VM into it's own backup file, even in the same job. I much preferred that set up.

https://helpcenter.veeam.com/docs/backup/vsphere/per_vm_backup_files.html?ver=100

# ? Apr 17, 2020 16:56

Thanks Ants: May 21, 2004; #essereFerrari

'Decide what you're doing after you've done the operation you took the snapshot for' was something that got drummed into me ever since I started using VMware, and not to just treat them like recovery points to leave hanging around for ages.

# ? Apr 17, 2020 17:08

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Internet Explorer posted:

Do you have a bunch of orphaned snapshots on that VM, or a snapshot that is old on a machine that does a lot of writes? Snapshot consolidation can get very taxing when they've had a lot of writes since their inception.

code:

ls -alh
total 2354791552
drwxr-xr-x    1 root     root       72.0K Apr 17 11:39 .
drwxr-xr-t    1 root     root       72.0K Dec 17  2018 ..
-rw-------    1 root     root      421.8G Apr 17 11:39 HSS-VBR01-000002-sesparse.vmdk
-rw-------    1 root     root         393 Apr 16 16:54 HSS-VBR01-000002.vmdk
-rw-------    1 root     root        1.8T Apr 17 15:32 HSS-VBR01-flat.vmdk
-rw-------    1 root     root         556 Apr 17 11:39 HSS-VBR01.vmdk

There weren't any snapshots as far as I could tell.

But that thing just gets an email journal written to it every day. all day.

# ? Apr 17, 2020 17:12

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Thanks Ants posted:

'Decide what you're doing after you've done the operation you took the snapshot for' was something that got drummed into me ever since I started using VMware, and not to just treat them like recovery points to leave hanging around for ages.

heh. Every one of our VM's has a 'fresh install' snapshot.

WHAT GOOD IS THAT

(I've been deleting those as I find them)

# ? Apr 17, 2020 17:15

Internet Explorer: Jun 1, 2005

Bob Morales posted:

heh. Every one of our VM's has a 'fresh install' snapshot.

WHAT GOOD IS THAT

(I've been deleting those as I find them)

Yeah, that sesparse file is a snapshot.

Be very, very careful consolidating these snapshots. They can easily make your guest unresponsive for days at a time and there's no great way to pause them. I'd do some research and come up with a plan.

# ? Apr 17, 2020 17:18

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Internet Explorer posted:

Yeah, that sesparse file is a snapshot.

Be very, very careful consolidating these snapshots. They can easily make your guest unresponsive for days at a time and there's no great way to pause them. I'd do some research and come up with a plan.

The other day that file was...20GB when this first happened and it was running out of space. 400GB is roughly the size of the data on the drive.

I shut the VM down for now.

If the last 75% takes as long as the first 25% it'll be done in 6-9 hours.

# ? Apr 17, 2020 17:31

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Bob Morales posted:

The other day that file was...20GB when this first happened and it was running out of space. 400GB is roughly the size of the data on the drive.

I shut the VM down for now.

If the last 75% takes as long as the first 25% it'll be done in 6-9 hours.

Came up after 11 hours. Whew.

Oh wait, it got in a fight with another VM over a locked file. How the gently caress. (Our Aruba Clearpass VM of all things)

Disconnected that HD from the VM and now I can get them both to boot simultaneously.

# ? Apr 18, 2020 01:17

TraderStav: May 19, 2006; It feels like I was standing my entire life and I just sat down

Just realized that there was a virtualization thread after posting this in the NAS/Homelab thread. Someone mentioned building a dedicated computer to physically attach to two sets of KVM that virtualizes but still connects directly, but my original inquiry was to do GPU pass throughs without connecting directly to the server for video/etc.

Would love some input on this, if I�m in the right thread, if not tell me to go away! Thanks!

TraderStav posted:

Hey all, figured this belonged here since this is the undercover SA Homelab thread.

I'm at a crossroads with a decision and wanted to get folks input here on whether I should spend more money and go the easy route (Option 1) or see if I can bend my UnRaid server to do something neat without too much complexity (Option 2).

We are looking to get my 8 year old twins some computers now that we are definitely going to be doing school from home (but not homeschooling!), Google Classroom/Meet calls, Google docs, etc. for quite a while. Even as this shifts my wife and I really believe that we'll be back at home in the Fall. On top of that, they're getting old enough that they are feeling the limitations of Minecraft on the iPad as my older son in Middle school plays on servers, uses mods, etc. So we want to get them a computer capable of playing Minecraft.

Here's the options as I see it, greatly appreciate any input on either and checking on whether my thoughts are on track:

Option 1: Buy or build the full computer set up, I have hardly researched this but think I could get this done properly somewhere between $450-750 for each. If I go down this route I'll take the discussion to the appropriate BYO computer / hardware threads as it wouldn't touch the Homelab component at all

Option 2: I have recently picked up a T7810 (with specs below) that is running UnRaid. It is my understanding that I can build a few Windows VMs with GPUs passed through (need to make sure I can split the 8-pin connector to support two video cards) that a very limited computer could be used to access and play Minecraft at a higher capability. They could use the local environment of whatever physical hardware they are running for their schooling, word processing, etc. and then remote into the server to game. Unless the spare GPUs I have lying around are good enough and able to work, I would purchase new GPUs to work.
Questions:
1. Is this a sustainable plan or after getting it working will there be a high probability of janitoring / troubleshooting issues with the VM that prohibit them from playing, etc.

2. My understanding is that all of this can be accomplsihed with no direct connection to the UnRaid server itself and purely over the network. My concern is that many YouTube videos I've seen where this was done successful with the GPU passing through to the VM they were also directly connected to the servers GPU display ports

3. What software is used to connect so that graphics and audio is passed through. SpaceInvader recommends Splashtop but so far I haven't been able to get that set up enough without any audio delays and other issues. I also have not successfully set up a GPU passthrough yet and still experimenting, so could be a factor

4. On the hardware side, I think as long as I use something basic with Windows should be sufficient. I'd consider raspberry pis but I think there may be some issues getting proper performance from the GPU remotely, however I could be wrong. Also may not be able to run all of what's needed for their schooling. Looking for any counter arguments to this, I could slap something pretty cheaply together as a WYSIWYG interface

5. As they'd all be in the same room I would want them having headsets on to reduce noise and interference when they're all on the same call, can a USB headset be passed through properly so sound/microphone works seamlessly?

Server Specs:
Intel� Xeon� CPU E5-2643 v3 @ 3.40GHz
64GB RAM
GPU1: Nvidia Quadro K4200
GPU2: GeForce 760 2GB (not installed, pulled from an older gaming machine that could also be adapted if option 1 is selected)

So the bottom line decision I'm trying to make is if Option 2 is worth the squeeze. Would save some cash, utilize the server for some fun and clever uses, and reduce hardware foot prints throughout the home. But, if it's building a headache for myself and could put in roadblocks for my kids, I'm not sure I'm willing to tolerate that.

Thanks!

# ? Apr 27, 2020 20:03

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

TraderStav posted:

Just realized that there was a virtualization thread after posting this in the NAS/Homelab thread. Someone mentioned building a dedicated computer to physically attach to two sets of KVM that virtualizes but still connects directly, but my original inquiry was to do GPU pass throughs without connecting directly to the server for video/etc.

Would love some input on this, if I�m in the right thread, if not tell me to go away! Thanks!

This is the right thread as much as there is a right thread.

You�re trying to build a gaming VDI service on a five (?) year old server with a mishmash of old, slow GPUs.

Try to do it it might be fun if you like months of extremely fiddly challenges and want to learn a lot about linux virtualization and pcie and memory mapping and spend a lot of time with your children troubleshooting, but you are not going to get this stable by this fall.

# ? Apr 27, 2020 20:45

TraderStav: May 19, 2006; It feels like I was standing my entire life and I just sat down

PCjr sidecar posted:

This is the right thread as much as there is a right thread.

You�re trying to build a gaming VDI service on a five (?) year old server with a mishmash of old, slow GPUs.

Try to do it it might be fun if you like months of extremely fiddly challenges and want to learn a lot about linux virtualization and pcie and memory mapping and spend a lot of time with your children troubleshooting, but you are not going to get this stable by this fall.

Thanks for the fair and candid feedback, this may not be the right application to learn this. I was considering picking up some newer video cards if needed but if it�s still going to be a PITA I may go ahead and build them their own and fiddle with it for myself. I am interested in learning about this as one day I may go big and do one of those fun 4 gamers on one monster PC deals, but not currently interested in that. I also imagine that the machines doing that aren�t necessarily serving primary duty for NAS/plex/etc which was my primary purpose of the server, with the remaining to tinker and play.

Thanks!

# ? Apr 27, 2020 21:00

skipdogg: Nov 29, 2004; Resident SRT-4 Expert

Stav,

This isn't the answer to the question you asked, but get the kids some off lease used business class laptops. You can grab a refurb/off lease Dell 5000 e series or a thinkpad 470 for less than 400 bucks off ebay that will handle school work and play. My kids are on Dell E5450 laptops and they handle all the school work and Roblox they want to play.

They come in handy as well as they can take the laptops with them on trips, to friends houses, etc.

I'd buy these 2 laptops and call it a day honestly. I have no idea if the Intel 5500 graphics will run minecraft though. I like business class laptops because they're built more solid and parts are usually easy to find and/or upgrade. My kids haven't managed to break them yet.

https://www.ebay.com/itm/DELL-LATIT...mCondition=2500

# ? Apr 27, 2020 21:14

TraderStav: May 19, 2006; It feels like I was standing my entire life and I just sat down

skipdogg posted:

Stav,

This isn't the answer to the question you asked, but get the kids some off lease used business class laptops. You can grab a refurb/off lease Dell 5000 e series or a thinkpad 470 for less than 400 bucks off ebay that will handle school work and play. My kids are on Dell E5450 laptops and they handle all the school work and Roblox they want to play.

They come in handy as well as they can take the laptops with them on trips, to friends houses, etc.

I'd buy these 2 laptops and call it a day honestly. I have no idea if the Intel 5500 graphics will run minecraft though. I like business class laptops because they're built more solid and parts are usually easy to find and/or upgrade. My kids haven't managed to break them yet.

https://www.ebay.com/itm/DELL-LATIT...mCondition=2500

quote:

Minecraft is among the top titles on popular PC games charts for years. On the reviewed GPU, it runs very smooth on �Fancy� (high) settings, with fps values gravitating toward 35 fps. If you set Minecraft graphics to �Fast� (low), frame rates will reach above 40 fps.

https://laptoping.com/gpus/product/intel-hd-5500-graphics-reviews-and-specs/

That sounds like it really could fit the bill. Outfit them with a cheap but decent monitor and USB headset and be good to go. Appreciate that input, I may go that route.

# ? Apr 27, 2020 21:21

Thanks Ants: May 21, 2004; #essereFerrari

They're eight years old, get them a Chromebook each for school stuff, and a Pi 4 each to satisfy the need to mess around with computers.

# ? Apr 27, 2020 21:26

skipdogg: Nov 29, 2004; Resident SRT-4 Expert

Thanks Ants posted:

They're eight years old, get them a Chromebook each for school stuff, and a Pi 4 each to satisfy the need to mess around with computers.

I looked into the Chromebook, and unfortunately they didn't cut it when it came to Roblox gaming. No idea about minecraft though.

TraderStav posted:

https://laptoping.com/gpus/product/intel-hd-5500-graphics-reviews-and-specs/

That sounds like it really could fit the bill. Outfit them with a cheap but decent monitor and USB headset and be good to go. Appreciate that input, I may go that route.

You can adjust the price point of course and get something newer if you like. A 7th gen intel processor will have something like the HD 630 graphics which should be plenty for minecraft.

# ? Apr 27, 2020 21:37

Adbot: ADBOT LOVES YOU

# ? May 4, 2024 14:40

Wicaeed: Feb 8, 2005

I have somewhat stupidly volunteered myself for a VMware upgrade Project of our aged vCenter 6.0 installation.

The advisor recommendations are saying we should install the 6.5.0 GA version of vCenter, but I don't see any mention of vCenter 6.7.

We do have some older hosts that can only go to 6.0.0 U2 version of VMware, however these should be compatible with vCenter 6.7 according to the VMware docs.

Am I missing anything super obvious as to why 6.7 wouldn't be showing as a recommended upgrade for us?

I do have a VMW support ticket created as well, just figured SA may have a quicker turnaround than VMware support nowadays...

# ? May 5, 2020 01:29

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »