|
Hughlander posted:I'm trying to get nested virtualization (Proxmox=>Win10=>Virtualbox=>Linux) working on a Ryden 3900. I had to add vmx,rdtscp to the cpu line to get docker working on fedora kvm =>OSX=> Docker.
|
# ? Mar 26, 2020 16:55 |
|
|
# ? May 4, 2024 14:40 |
|
Perplx posted:I had to add vmx,rdtscp to the cpu line to get docker working on fedora kvm =>OSX=> Docker. Thanks. I’m trying something a bit different now and am running into crappy x11 driver issues. Trying to do a kvm => Linux => virtualbox. I’ll look into those two flags however.
|
# ? Mar 26, 2020 18:58 |
|
Bob Morales posted:I have a Linux VM that will not stay on the network. It drops off, the MAC disappears from the switches. I can't talk to it until I start a ping from that virtual machine, using the console. We had this with RHEL vms, they would randomly drop off the network, but if you vmotion the vm to another node it comes back. Starting a ping from the guest vm didn't fix it though. Turned out to be an esx driver issue. If you're running HPE servers/qlogic nics it might be this. https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-a00076313en_us
|
# ? Mar 28, 2020 02:34 |
|
Has anyone else been experiencing system stalls and nmi softlocks on Epyc1 servers? We've been chasing this problem for going on for almost a year now through RHEL 7.6-7.8 and its been driving us crazy. Seems to be IOMMU related and we've only now started to see some relief from it by forcing the amd iommu driver in to fullflush mode on the kernel parameters.
|
# ? Mar 28, 2020 16:19 |
|
I was fighting with GPU passthrough again with kvm/qemu, figured I'd share in case anyone else runs into issues. Setup: OS: Proxmox 6.1 (Debian) CPU: Ryzen 2700x Motherboard: Asus X470-Pro Prime, 1.0.0.4 patch B firmware Host GPU: Nvidia Quadro K600 Guest GPU: Nvidia GTX 1080 Kernel cli params, vfio, and lspci stuff: https://pastebin.com/fgXuk1X1 Everything works flawlessly when I don't have a display device attached to the guest GPU when the host is booting. Attaching a monitor afterwards doesn't mess it up and the guest VM is as happy as can be, but this manual intervention isn't desired. Booting the host with a monitor attached to the guest GPU will prevent passthrough as the framebuffer claims some memory, even with "video=vesafb:off,efifb:off" kernel parameters supposedly disabling them. Without the parameters there is a lot more log output, with the parameters there's only a few lines, either way the GPU can not be passed through. I was able to solve this issue previously by selecting which GPU the motherboard uses and eventually hands over to the OS, but the Asus board doesn't have that feature. Trying to start the guest VM will output this message in dmesg code:
code:
|
# ? Apr 8, 2020 13:20 |
|
Actuarial Fables posted:I was fighting with GPU passthrough again with kvm/qemu, figured I'd share in case anyone else runs into issues. Weird, I never had to do this. I do have some initramfs conf which controls what gets loaded first. VFIO before NVidia etc.
|
# ? Apr 9, 2020 07:20 |
|
I am running proxmox on a dell r610 with a h700 raid controller in raid 50. the issue that i am having is that the windows 10 vm i have installed has terrible write performance, and I want to know what i did wrong because the read performance is great. I do have writeback cache enabled.
|
# ? Apr 15, 2020 23:31 |
|
wargames posted:I am running proxmox on a dell r610 with a h700 raid controller in raid 50. the issue that i am having is that the windows 10 vm i have installed has terrible write performance, and I want to know what i did wrong because the read performance is great. I do have writeback cache enabled. Define terrible What drives are you using and how many drives are in your raid
|
# ? Apr 15, 2020 23:34 |
|
Bob Morales posted:Define terrible 6 drives are in my raid PNY cs900 I really do not understand the write latency did i build the hard disk wrong by making it a sata device, and qcow2 format? wargames fucked around with this message at 23:52 on Apr 15, 2020 |
# ? Apr 15, 2020 23:45 |
|
Now just what in the devil is happening with those 4K blocks What's piquing my interest is the fact that the 512 blocks are doing fine
|
# ? Apr 16, 2020 00:21 |
|
Your controller might be disabling the cache onboard the ssd which is probably important for a low-end TLC drive like that to get good write performance. I'd also try whatever permutations of write-back/write-through and drive cache enabled/disabled to see what works best for your workload. What stripe size did you end up using? e: Also try running the bench with a much smaller test file size, if its dramatically faster then we're likely hitting cache exhaustion and the limits of what those TLC chips can do in a raid5
|
# ? Apr 16, 2020 00:29 |
|
Potato Salad posted:Now just what in the devil is happening with those 4K blocks 512 writes still seem very slow. BangersInMyKnickers posted:Your controller might be disabling the cache onboard the ssd which is probably important for a low-end TLC drive like that to get good write performance. I'd also try whatever permutations of write-back/write-through and drive cache enabled/disabled to see what works best for your workload. What stripe size did you end up using? This i do not know as i didn't mess with the advance options in the h700 settings, also its raid 50 so i think its better for right then plain raid 5. edit: ran some tests on proxmox itself to see if the write speeds are also terrible, and they are! root@proxmox:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 80.8637 s, 13.3 MB/s root@proxmox:~# root@proxmox:~# dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 7.85938 s, 65.1 kB/s edit2: going to raid0 these disk and just see how fast raid can go. wargames fucked around with this message at 01:28 on Apr 16, 2020 |
# ? Apr 16, 2020 00:29 |
|
Flash that bitch to IT mode, run JBOD and do the RAID with Linux software RAID
|
# ? Apr 16, 2020 02:51 |
|
Bob Morales posted:Flash that bitch to IT mode, run JBOD and do the RAID with Linux software RAID h700 can't be flashed to IT mode.
|
# ? Apr 16, 2020 02:55 |
|
Raid 50 was the issue for writes even if it has insane good reads raid0
|
# ? Apr 16, 2020 04:07 |
|
wargames posted:root@proxmox:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync These tests aren't good. In general dd is not a great way to test real work disk performance, but these especially are poor indicators. The first one asks for a single synchronous 1GB write, which doesn't reflect anything in the real world and is probably choking on any number of things including the 512MB write back cache in your raid controller. The other test is doing tiny tiny writes in a single serial stream which is probably exposing the limits of your dd job to dispatch writes quickly enough as soon as an acknowledge write comes in, more than anything else. Filesystems and storage devices do all sorts of tricks to batch and coalesce writes to make things perform better and you're disabling all of them with this type of job. This is that same test on a GCP instance with locally attached SSD: quote:[root@instance-2 ~]# dd if=/dev/zero of=/dev/sdb1 count=1000 bs=512 oflag=dsync It's quite slow, which it shouldn't be because I know GCP provides performance characteristics for their local SSDs that are well above this. If we look at blktrace we see: quote:[root@instance-2 ~]# btt -i bp.bin The rows that matter here are Q2Q, D2C and Q2C. The D2C row shows how long it is taking the actual disk to respond once the IO has passed through the queue and been issued to it. We are waiting an average of 120us for the disk and a maximum of 1.4ms, so our disk is responding about as quickly as we'd expect a SAS SSD to respond. The Q2C row shows how long it takes the total process from IO request being put on the queue by the application (dd in this case) to when it is completed (an acknowledgement that the data was written is given to the application). We're averaging about 126us for total time, so we aren't spending much time in the filesystem layer at all, only about 6us. The Q2Q row tells us that the vast majority of our waiting time, 7.8ms on average, is spent waiting for IO to come in from the application. So dd is taking about 7.8ms between ever IO it issues, despite the fact that IOs are being completed in well under 1ms on average. This is basically what happens when you have no concurrency because you have a single thread issuing IOs serially and then waiting for them to complete. This is a test of the same disk with FIO, a workload generation tool which is much better for benchmarking since it supports multiple simultaneous jobs for concurrency, and also using a sane block size (I've cut out most of the details but left the bottom line numbers): quote:root@instance-2 ~]# fio --filename=/root/test/testfile --size=10GB --direct=1 --rw=randwrite --bs=32k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 --time_based --group_reporting My advice would be to test using FIO or some other actual benchmarking tool, with sane block sizes (somewhere between 4k and 64k generally), preferably with fairly long running jobs (at least a few minutes, low end SSDs have issues with sustained performance that you want to suss out), and on a single disk using multiple concurrent jobs. See where that gets you and then start building up from there to test multiple disks in a raid configuration and then virtual disks running on the hypervisor. There are so many things that could be at issue: - Inexpensive SSDs that are likely to fall over under sustained write workloads - H700 raid controller caching policy - H700 interaction with drive firmware and drive firmware settings (I.e. disk cache enabled/disabled...these drives don't have a cache, so who knows whether it matters or not) - H700 stripe size - Filesystem parameters like block size or cluster size (what filesystem is Promox using?) - Kernel parameters controller IO - Virtual Machine OS and filesystem parameters - Poorly constructed benchmarks
|
# ? Apr 16, 2020 05:17 |
|
In addition to all that, check out what IO scheduler is being used for the block device. cfq is probably the default which is essentially making the OS attempt to re-create functionality that already exists on your raid controller. noop is probably the best choice for raid groups like you are using.
|
# ? Apr 16, 2020 14:48 |
|
You using ZFS on Proxmox? Turn-on async for a laugh.
|
# ? Apr 16, 2020 15:03 |
|
How are you guys configuring your containers in proxmox? Are you using mostly VMs and ignoring the LXC parts? It seems like getting a new container as-code deployed/configured requires a bunch of things that get 80% of the way there and then shoehorning in other stuff around it. I'm about to give up and just go with ubuntu server instead. (My main problem after getting through other issues is now that I can't properly update the container.conf for a new container with bind mounts to pass /dev/dri for quicksync stuff, which was my whole main reason for wanting to do this in the first place)
|
# ? Apr 16, 2020 17:08 |
|
Mr Shiny Pants posted:You using ZFS on Proxmox? Turn-on async for a laugh. ? Tatsuta Age posted:How are you guys configuring your containers in proxmox? Are you using mostly VMs and ignoring the LXC parts? It seems like getting a new container as-code deployed/configured requires a bunch of things that get 80% of the way there and then shoehorning in other stuff around it. I'm about to give up and just go with ubuntu server instead. Note this is home use: I have two nodes, one is the NAS that has a couple of LXCs for keeping the system as pure as possible: - FileServices doing NFS and smbd - Print service doing cups and airprint/google cloud print (Still need to replace that.) - Plex media server The second one is much bigger but only has 2 M2.s mirrored so storage is on the main system. It has a mixture of LXCs and VMs as well as is running docker raw on proxmox for zfs filesystems. (This part is a mistake and will likely go away soon.) LXCs: - Ansible development - Soon to be LPMUD with VPN to digital ocean - Minecraft Bedrock servers VMs: - Windows 10 Enterprise - PCI passthrough of video card - Ubuntu Desktop - 4 nodes kubernetes cluster - 3 Android-x86 instances I'm experimenting with Docker: - 53 containers across 3-4 compose files, I'm trying to switch to be portainer managed but not ther eyet. What's your problem with the container? I pass usb through to the print server just fine, here's the config: quote:lxc.apparmor.profile: unconfined
|
# ? Apr 16, 2020 19:32 |
|
How are you setting up that configuration file? I've tried a couple different things for "configure deploy container on proxmox side, then configure from inside the container itself", and every way to do it had a bunch of issues, but using ansible with the proxmoxer module got me MOST of the way there... https://docs.ansible.com/ansible/latest/modules/proxmox_module.html So I can't configure additional lines in the conf for mounting, it just does the basics like cores, memory, etc etc. So I figured I would just directly call the proxmox host and edit the conf file directly after creation to add a couple lines, and I got weird permissions errors I couldn't seem to resolve, and it was pretty kludgy that way anyway. I can edit manually on the host but I don't really want to get in the habit of doing things manually, since half the reason I'm going this route is to learn a new thing and get more comfortable with using ansible. I just didn't know if people were doing their configs using ansible/proxmoxer as well, or there was a better tool for it.
|
# ? Apr 16, 2020 19:42 |
|
Tatsuta Age posted:How are you setting up that configuration file? I've tried a couple different things for "configure deploy container on proxmox side, then configure from inside the container itself", and every way to do it had a bunch of issues, but using ansible with the proxmoxer module got me MOST of the way there... Ok, this predates when I was using Ansible. Basically proxmoxer is just a layer over the API, and the API https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/lxc/{vmid}/config says that you can use called 'lxc' But I'd just get the drat thing working first since it's a 1 off. Then go and get it repeatable. Particularly since I assume you're only passing this through to one container right?
|
# ? Apr 16, 2020 19:56 |
|
Hughlander posted:Ok, this predates when I was using Ansible. Basically proxmoxer is just a layer over the API, and the API https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/lxc/{vmid}/config says that you can use called 'lxc' But I'd just get the drat thing working first since it's a 1 off. Then go and get it repeatable. Particularly since I assume you're only passing this through to one container right? Yeah, it's just for this one container, but I am still trying to get it all immutably configured if possible. Because why not? And I learned that the configuration stuff I need is just a single file in /etc/pve/lxc/container_id.conf, so it's easy enough to just do a file interaction on it with ansible before starting up, and it appears to be working mostly fine. So, yay!
|
# ? Apr 17, 2020 01:48 |
|
It was for wargames, if you are running ZFS on proxmox you might try zfs sync disabled and check the performance. It will probably be magnitudes faster. Mine went from 150 - to over 700MB sec. As I favour data integrity above speed, I enabled sync again but it was nice seeing what it was capable off.
|
# ? Apr 17, 2020 06:56 |
|
poo poo pissing me off. I finally got our Veeam backups running again. We had one job with 57 VM's which would run fine but the resulting backup file was too big to run in one batch, especially to the cloud. I split it up into a couple smaller backups, finally got the Synology to work faster than 20-30MBs, and then I started having this error: We have a Barracuda Email Archiver, and a 1.8TB drive on our SAN was mapped to our Veeam server and then the Archiver was using that as a 'RAID' disk as a mirror of the internal drive. wtf Another drive on the datastore was thin provisioned and it ran out of space, causing that error. After allocating some more room on the SAN I got the backups running again. Plan is to remove that mapping and just backup the Barracuda using SMB through Veeam, no need to make a datastore for it. Then I had a notification that "Virtual machine disks consolidation is needed". Alright, started that, and the host in my cluster that the Veeam VM is on ended up going 'Not Responding' for a while, and then the consolidation task said it's unable to reach the host. It got to 20% after about 3 hours of running... It appears that the task is still running. I can still use the VM's on the host just fine. Do I just let this puppy run for another 12-15 hours and see if the host pops back online?
|
# ? Apr 17, 2020 16:48 |
|
Do you have a bunch of orphaned snapshots on that VM, or a snapshot that is old on a machine that does a lot of writes? Snapshot consolidation can get very taxing when they've had a lot of writes since their inception. In Veeam, there is an option to split each VM into it's own backup file, even in the same job. I much preferred that set up. https://helpcenter.veeam.com/docs/backup/vsphere/per_vm_backup_files.html?ver=100
|
# ? Apr 17, 2020 16:56 |
|
'Decide what you're doing after you've done the operation you took the snapshot for' was something that got drummed into me ever since I started using VMware, and not to just treat them like recovery points to leave hanging around for ages.
|
# ? Apr 17, 2020 17:08 |
|
Internet Explorer posted:Do you have a bunch of orphaned snapshots on that VM, or a snapshot that is old on a machine that does a lot of writes? Snapshot consolidation can get very taxing when they've had a lot of writes since their inception. code:
But that thing just gets an email journal written to it every day. all day.
|
# ? Apr 17, 2020 17:12 |
|
Thanks Ants posted:'Decide what you're doing after you've done the operation you took the snapshot for' was something that got drummed into me ever since I started using VMware, and not to just treat them like recovery points to leave hanging around for ages. heh. Every one of our VM's has a 'fresh install' snapshot. WHAT GOOD IS THAT (I've been deleting those as I find them)
|
# ? Apr 17, 2020 17:15 |
|
Bob Morales posted:heh. Every one of our VM's has a 'fresh install' snapshot. Yeah, that sesparse file is a snapshot. Be very, very careful consolidating these snapshots. They can easily make your guest unresponsive for days at a time and there's no great way to pause them. I'd do some research and come up with a plan.
|
# ? Apr 17, 2020 17:18 |
|
Internet Explorer posted:Yeah, that sesparse file is a snapshot. The other day that file was...20GB when this first happened and it was running out of space. 400GB is roughly the size of the data on the drive. I shut the VM down for now. If the last 75% takes as long as the first 25% it'll be done in 6-9 hours.
|
# ? Apr 17, 2020 17:31 |
|
Bob Morales posted:The other day that file was...20GB when this first happened and it was running out of space. 400GB is roughly the size of the data on the drive. Came up after 11 hours. Whew. Oh wait, it got in a fight with another VM over a locked file. How the gently caress. (Our Aruba Clearpass VM of all things) Disconnected that HD from the VM and now I can get them both to boot simultaneously.
|
# ? Apr 18, 2020 01:17 |
|
Just realized that there was a virtualization thread after posting this in the NAS/Homelab thread. Someone mentioned building a dedicated computer to physically attach to two sets of KVM that virtualizes but still connects directly, but my original inquiry was to do GPU pass throughs without connecting directly to the server for video/etc. Would love some input on this, if I’m in the right thread, if not tell me to go away! Thanks! TraderStav posted:Hey all, figured this belonged here since this is the undercover SA Homelab thread.
|
# ? Apr 27, 2020 20:03 |
|
TraderStav posted:Just realized that there was a virtualization thread after posting this in the NAS/Homelab thread. Someone mentioned building a dedicated computer to physically attach to two sets of KVM that virtualizes but still connects directly, but my original inquiry was to do GPU pass throughs without connecting directly to the server for video/etc. This is the right thread as much as there is a right thread. You’re trying to build a gaming VDI service on a five (?) year old server with a mishmash of old, slow GPUs. Try to do it it might be fun if you like months of extremely fiddly challenges and want to learn a lot about linux virtualization and pcie and memory mapping and spend a lot of time with your children troubleshooting, but you are not going to get this stable by this fall.
|
# ? Apr 27, 2020 20:45 |
|
PCjr sidecar posted:This is the right thread as much as there is a right thread. Thanks for the fair and candid feedback, this may not be the right application to learn this. I was considering picking up some newer video cards if needed but if it’s still going to be a PITA I may go ahead and build them their own and fiddle with it for myself. I am interested in learning about this as one day I may go big and do one of those fun 4 gamers on one monster PC deals, but not currently interested in that. I also imagine that the machines doing that aren’t necessarily serving primary duty for NAS/plex/etc which was my primary purpose of the server, with the remaining to tinker and play. Thanks!
|
# ? Apr 27, 2020 21:00 |
|
Stav, This isn't the answer to the question you asked, but get the kids some off lease used business class laptops. You can grab a refurb/off lease Dell 5000 e series or a thinkpad 470 for less than 400 bucks off ebay that will handle school work and play. My kids are on Dell E5450 laptops and they handle all the school work and Roblox they want to play. They come in handy as well as they can take the laptops with them on trips, to friends houses, etc. I'd buy these 2 laptops and call it a day honestly. I have no idea if the Intel 5500 graphics will run minecraft though. I like business class laptops because they're built more solid and parts are usually easy to find and/or upgrade. My kids haven't managed to break them yet. https://www.ebay.com/itm/DELL-LATIT...mCondition=2500
|
# ? Apr 27, 2020 21:14 |
|
skipdogg posted:Stav, quote:Minecraft is among the top titles on popular PC games charts for years. On the reviewed GPU, it runs very smooth on “Fancy” (high) settings, with fps values gravitating toward 35 fps. If you set Minecraft graphics to “Fast” (low), frame rates will reach above 40 fps. https://laptoping.com/gpus/product/intel-hd-5500-graphics-reviews-and-specs/ That sounds like it really could fit the bill. Outfit them with a cheap but decent monitor and USB headset and be good to go. Appreciate that input, I may go that route.
|
# ? Apr 27, 2020 21:21 |
|
They're eight years old, get them a Chromebook each for school stuff, and a Pi 4 each to satisfy the need to mess around with computers.
|
# ? Apr 27, 2020 21:26 |
|
Thanks Ants posted:They're eight years old, get them a Chromebook each for school stuff, and a Pi 4 each to satisfy the need to mess around with computers. I looked into the Chromebook, and unfortunately they didn't cut it when it came to Roblox gaming. No idea about minecraft though. TraderStav posted:https://laptoping.com/gpus/product/intel-hd-5500-graphics-reviews-and-specs/ You can adjust the price point of course and get something newer if you like. A 7th gen intel processor will have something like the HD 630 graphics which should be plenty for minecraft.
|
# ? Apr 27, 2020 21:37 |
|
|
# ? May 4, 2024 14:40 |
|
I have somewhat stupidly volunteered myself for a VMware upgrade Project of our aged vCenter 6.0 installation. The advisor recommendations are saying we should install the 6.5.0 GA version of vCenter, but I don't see any mention of vCenter 6.7. We do have some older hosts that can only go to 6.0.0 U2 version of VMware, however these should be compatible with vCenter 6.7 according to the VMware docs. Am I missing anything super obvious as to why 6.7 wouldn't be showing as a recommended upgrade for us? I do have a VMW support ticket created as well, just figured SA may have a quicker turnaround than VMware support nowadays...
|
# ? May 5, 2020 01:29 |