bryansteiner / gpu-passthrough-tutorial Goto Github PK

License: GNU General Public License v3.0

Shell 100.00%

gpu-passthrough-tutorial's Introduction

Introduction
Tutorial
Credits & Resources
Footnotes

Introduction

In this post, I will be giving detailed instructions on how to run a KVM setup with GPU passthrough. This setup uses a Linux host installed with Pop!_OS 20.10 (kernel v5.8.0) and a guest VM running Windows 10.

Source: Open Virtualization Alliance (Jollans, IBM , Kadera, Intel)

Considerations

The main reason I wanted to get this setup working was because I found myself tired of using a dual-boot setup. I wanted to launch a Windows VM specifically for gaming while still being able to use my Linux host for development work (simultaneously).

At this point, you might be wondering... Why not just game on Linux? This is definitely an option for many people, but not one that suited my particular needs. Gaming on Linux requires the use of tools like Wine which act as a compatabilty layer for translating Windows system calls to Linux system calls. On the other hand, a GPU passthrough setup utilizes KVM as a hypervisor to launch individual VMs with specific hardware attached to them. Performance wise, there are pros and cons to each approach.¹

In this tutorial, I will create a GPU passthrough setup. Specifically, I will be passing through an NVIDIA GPU to my guest VM while using an AMD GPU for my host. You could easily substitute an iGPU for the host but I chose to use a dGPU for performance reasons.²

Hardware Requirements

You're going to need the following to achieve a high-performance VM:

Two graphics cards.
Hardware that supports IOMMU.
A monitor with two inputs³ or multiple monitors.

If you haven't built a PC yet but want it to be KVM/VFIO-focused, check out this list of parts suggested by The Passthrough Post.

Hardware Setup

CPU:
- AMD Ryzen 9 3900X
Motherboard:
- Gigabyte X570 Aorus Pro Wifi
GPUs:
- NVIDIA RTX 3080
- AMD RX 5700
Memory:
- Corsair Vengeance LPX DDR4 3200 MHz 32GB (2x16)
Disk:
- Samsung 970 EVO Plus SSD 500GB - M.2 NVMe (host)
- Samsung 970 EVO Plus SSD 1TB - M.2 NVMe (guest)

Tutorial

Part 1: Prerequisites

Before we begin, let's install some necessary packages:

$ sudo apt install libvirt-daemon-system libvirt-clients qemu-kvm qemu-utils virt-manager ovmf

Restart your machine and boot into BIOS. Enable a feature called IOMMU. You'll also need to enable CPU virtualization. For Intel processors, look for something called VT-d. For AMD, look for something called AMD-Vi. My motherboard is unique so I had to enable a feature called SVM Mode. Save any changes and restart the machine.

Once you've booted into the host, make sure that IOMMU is enabled: $ dmesg | grep IOMMU

Also check that CPU virtualization is enabled:

For Intel: $ dmesg | grep VT-d
For AMD: $ dmesg | grep AMD-Vi

Now you're going to need to pass the hardware-enabled IOMMU functionality into the kernel as a kernel parameter. For our purposes, it makes the most sense to enable this feature at boot-time. Depending on your boot-loader (i.e. grub, systemd, rEFInd), you'll have to modify a specific configuration file. Since my machine uses systemd and these configuration files are often overwritten on updates, I will be using a tool called kernelstub:

For Intel: $ sudo kernelstub --add-options "intel_iommu=on"
For AMD: $ sudo kernelstub --add-options "amd_iommu=on"

Similarly, if your system is configured with GRUB2, you can achieve the same result by editing the /etc/default/grub file with sudo permissions and including the kernel parameter as follows:

For Intel: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on"
For AMD: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on"

When planning my GPU passthrough setup, I discovered that many tutorials at this point will go ahead and have you blacklist the nvidia/amd drivers. The logic stems from the fact that since the native drivers can't attach to the GPU at boot-time, the GPU will be freed-up and available to bind to the vfio drivers instead. Most tutorials will have you add a kernel parameter called pci-stub with the PCI bus ID of your GPU to achieve this. I found that this solution wasn't suitable for me. I prefer to dynamically unbind the nvidia/amd drivers and bind the vfio drivers right before the VM starts and subsequently reversing these actions when the VM stops (see Part 2). That way, whenever the VM isn't in use, the GPU is available to the host machine to do work on its native drivers.⁴

Next, we need to determine the IOMMU groups of the graphics card we want to pass through to the VM. For those of you who don't already know, IOMMU refers to the chipset device that maps virtual addresses to physical addresses on your I/O devices (i.e. GPU, disk, etc.). Its function is analogous to the memory management unit (MMU) that maps virtual addresses to physical addresses on your CPU.

We want to make sure that our system has an appropriate IOMMU grouping scheme. Essentially, we need to remember that devices residing within the same IOMMU group need to be passed through to the VM (they can't be separated). To determine your IOMMU grouping, use the following script:

iommu.sh:

#!/bin/bash
for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

For Intel systems, here's some sample output:

...
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [8086:1905] (rev 07)
...
IOMMU Group 30 0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1)
IOMMU Group 30 0d:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
IOMMU Group 30 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU Group 31 0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
...

Here we see that both the NVIDIA and AMD GPUs reside in IOMMU group 30. This presents a problem. If you want to use the AMD GPU for the host machine while passing through the NVIDIA GPU to the guest VM, you need to figure out a way to separate their IOMMU groups.

One possible solution is to switch the PCI slot to which the AMD graphics card is attached. This may or may not produce the desired solution.
An alternative solution is something called the ACS Override Patch. For an in-depth discussion, it's definitely worth checking out this post from Alex Williamson. Make sure to consider the risks.⁵

For my system, I was lucky⁶ because the NVIDIA and AMD GPUs resided in different IOMMU groups:

...
IOMMU Group 30 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU Group 31 0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
IOMMU Group 32 0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1)
IOMMU Group 32 0d:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
...

If your setup is like mine⁷ and you had isolated IOMMU groups, feel free to skip the following section. Otherwise, please continue reading...

ACS Override Patch (Optional):

For most linux distributions, the ACS Override Patch requires you to download the kernel source code, manually insert the ACS patch, compile + install the kernel, and then boot directly from the newly patched kernel.⁸

Since I'm running a Debian-based distribution, I can use one of the pre-compiled kernels with the ACS patch already applied. After extracting the package contents, install the kernel and headers:

$ sudo dpkg -i linux-headers-5.3.0-acso_5.3.0-acso-1_amd64.deb
$ sudo dpkg -i linux-image-5.3.0-acso_5.3.0-acso-1_amd64.deb
$ sudo dpkg -i linux-libc-dev_5.3.0-acso-1_amd64.deb

Navigate to /boot and verify that you see the new initrd.img and vmlinuz:

$ ls
config-5.3.0-7625-generic    initrd.img-5.3.0-7625-generic  vmlinuz
config-5.3.0-acso            initrd.img-5.3.0-acso          vmlinuz-5.3.0-7625-generic
efi                          initrd.img.old                 vmlinuz-5.3.0-acso
initrd.img                   System.map-5.3.0-7625-generic  vmlinuz.old
initrd.img-5.3.0-24-generic  System.map-5.3.0-acso

We still have to copy the current kernel and initramfs image onto the ESP so that they are automatically loaded by EFI. We check the current configuration with kernelstub:

$ sudo kernelstub --print-config
kernelstub.Config    : INFO     Looking for configuration...
kernelstub           : INFO     System information:

    OS:..................Pop!_OS 19.10
    Root partition:....../dev/dm-1
    Root FS UUID:........2105a9ac-da30-41ba-87a9-75437bae74c6
    ESP Path:............/boot/efi
    ESP Partition:......./dev/nvme0n1p1
    ESP Partition #:.....1alt="virtman_3"
    NVRAM entry #:.......-1
    Boot Variable #:.....0000
    Kernel Boot Options:.quiet loglevel=0 systemd.show_status=false splash amd_iommu=on
    Kernel Image Path:.../boot/vmlinuz
    Initrd Image Path:.../boot/initrd.img
    Force-overwrite:.....False

kernelstub           : INFO     Configuration details:

   ESP Location:................../boot/efi
   Management Mode:...............True
   Install Loader configuration:..True
   Configuration version:.........3

You can see that the "Kernel Image Path" and the "Initrd Image Path" are symbolic links that point to the old kernel and initrd.

$ ls -l /boot
total 235488
-rw-r--r-- 1 root root   235833 Dec 19 11:56 config-5.3.0-7625-generic
-rw-r--r-- 1 root root   234967 Sep 16 04:31 config-5.3.0-acso
drwx------ 6 root root     4096 Dec 31  1969 efi
lrwxrwxrwx 1 root root       29 Dec 20 11:28 initrd.img -> initrd.img-5.3.0-7625-generic
-rw-r--r-- 1 root root 21197115 Dec 20 11:54 initrd.img-5.3.0-24-generic
-rw-r--r-- 1 root root 95775016 Jan 17 00:33 initrd.img-5.3.0-7625-generic
-rw-r--r-- 1 root root 94051072 Jan 18 19:57 initrd.img-5.3.0-acso
lrwxrwxrwx 1 root root       29 Dec 20 11:28 initrd.img.old -> initrd.img-5.3.0-7625-generic
-rw------- 1 root root  4707483 Dec 19 11:56 System.map-5.3.0-7625-generic
-rw-r--r-- 1 root root  4458808 Sep 16 04:31 System.map-5.3.0-acso
lrwxrwxrwx 1 root root       26 Dec 20 11:28 vmlinuz -> vmlinuz-5.3.0-7625-generic
-rw------- 1 root root 11398016 Dec 19 11:56 vmlinuz-5.3.0-7625-generic
-rw-r--r-- 1 root root  9054592 Sep 16 04:31 vmlinuz-5.3.0-acso
lrwxrwxrwx 1 root root       26 Dec 20 11:28 vmlinuz.old -> vmlinuz-5.3.0-7625-generic

Let's change that:

$ sudo rm /boot/vmlinuz
$ sudo ln -s /boot/vmlinuz-5.3.0-acso /boot/vmlinuz
$ sudo rm /boot/initrd.img
$ sudo ln -s /boot/initrd.img-5.3.0-acso /boot/initrd.img

Verify that the symbolic links now point to the correct kernel and initrd images:

$ ls -l /boot
total 235488
-rw-r--r-- 1 root root   235833 Dec 19 11:56 config-5.3.0-7625-generic
-rw-r--r-- 1 root root   234967 Sep 16 04:31 config-5.3.0-acso
drwx------ 6 root root     4096 Dec 31  1969 efi
lrwxrwxrwx 1 root root       27 Jan 18 20:02 initrd.img -> /boot/initrd.img-5.3.0-acso
-rw-r--r-- 1 root root 21197115 Dec 20 11:54 initrd.img-5.3.0-24-generic
-rw-r--r-- 1 root root 95775016 Jan 17 00:33 initrd.img-5.3.0-7625-generic
-rw-r--r-- 1 root root 94051072 Jan 18 19:57 initrd.img-5.3.0-acso
lrwxrwxrwx 1 root root       29 Dec 20 11:28 initrd.img.old -> initrd.img-5.3.0-7625-generic
-rw------- 1 root root  4707483 Dec 19 11:56 System.map-5.3.0-7625-generic
-rw-r--r-- 1 root root  4458808 Sep 16 04:31 System.map-5.3.0-acso
lrwxrwxrwx 1 root root       24 Jan 18 20:02 vmlinuz -> /boot/vmlinuz-5.3.0-acso
-rw------- 1 root root 11398016 Dec 19 11:56 vmlinuz-5.3.0-7625-generic
-rw-r--r-- 1 root root  9054592 Sep 16 04:31 vmlinuz-5.3.0-acso
lrwxrwxrwx 1 root root       26 Dec 20 11:28 vmlinuz.old -> vmlinuz-5.3.0-7625-generic

Finally, add the ACS Override Patch to your list of kernel parameter options:

$ sudo kernelstub --add-options "pcie_acs_override=downstream"

Reboot and verify that the IOMMU groups for your graphics cards are different:

...
IOMMU Group 30 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU Group 31 0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
IOMMU Group 32 0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2206] (rev a1)
IOMMU Group 32 0d:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
...

Download ISO files (Mandatory):

Since we're building a Windows VM, we're going to need to download and use the virtIO drivers. virtIO is a virtualization standard for network and disk device drivers. Adding the virtIO drivers can be done by attaching its relevant ISO to the Windows VM during creation. Fedora provides the virtIO drivers for direct download.

Since I am passing through an entire NVMe SSD (1TB), I won't need to install any 3rd party drivers on top of the virtIO driver. Passing through the SSD as a PCI device lets Windows deal with it as a native NVMe device and therefore should offer better performance. If you choose to use a raw disk image instead, things are going to be a little different... Make sure to follow the instructions in this guide. The guide will show you how to add 3rd party drivers on top of the existing virtIO drivers by rebuilding the ISO.

For the final step, we're going to need to download the Windows 10 ISO from Microsoft which you can find here.

Part 2: VM Logistics

As mentioned earlier, we are going to dynamically bind the vfio drivers before the VM starts and unbind these drivers after the VM terminates. To achieve this, we're going to use libvirt hooks. Libvirt has a hook system that allows you to run commands on startup or shutdown of a VM. All relevant scripts are located within the following directory: /etc/libvirt/hooks. If the directory doesn't exist, go ahead and create it. Lucky for us, The Passthrough POST has a hook helper tool to make our lives easier. Run the following commands to install the hook manager and make it executable:

$ sudo wget 'https://raw.githubusercontent.com/PassthroughPOST/VFIO-Tools/master/libvirt_hooks/qemu' \
     -O /etc/libvirt/hooks/qemu
$ sudo chmod +x /etc/libvirt/hooks/qemu

Go ahead and restart libvirt to use the newly installed hook helper:

$ sudo service libvirtd restart

Let's look at the most important hooks:

# Before a VM is started, before resources are allocated:
/etc/libvirt/hooks/qemu.d/$vmname/prepare/begin/*

# Before a VM is started, after resources are allocated:
/etc/libvirt/hooks/qemu.d/$vmname/start/begin/*

# After a VM has started up:
/etc/libvirt/hooks/qemu.d/$vmname/started/begin/*

# After a VM has shut down, before releasing its resources:
/etc/libvirt/hooks/qemu.d/$vmname/stopped/end/*

# After a VM has shut down, after resources are released:
/etc/libvirt/hooks/qemu.d/$vmname/release/end/*

If we place an executable script in one of these directories, the hook manager will take care of everything else. I've chosen to name my VM "win10" so I set up my directory structure like this:

$ tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        └── release
            └── end

It's time to get our hands dirty... Create a file named kvm.conf and place it under /etc/libvirt/hooks/. Add the following entries to the file:

## Virsh devices
VIRSH_GPU_VIDEO=pci_0000_0a_00_0
VIRSH_GPU_AUDIO=pci_0000_0a_00_1
VIRSH_GPU_USB=pci_0000_0a_00_2
VIRSH_GPU_SERIAL=pci_0000_0a_00_3
VIRSH_NVME_SSD=pci_0000_04_00_0

Make sure to substitute the correct bus addresses for the devices you'd like to passthrough to your VM (in my case a GPU and SSD). Just in case it's still unclear, you get the virsh PCI device IDs from the iommu.sh script's output. Translate the address for each device as follows: IOMMU Group 1 01:00.0 ... --> VIRSH_...=pci_0000_01_00_0. Now create two bash scripts:

bind_vfio.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO
virsh nodedev-detach $VIRSH_GPU_USB
virsh nodedev-detach $VIRSH_GPU_SERIAL
## Unbind ssd from nvme and bind to vfio
virsh nodedev-detach $VIRSH_NVME_SSD

unbind_vfio.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO
virsh nodedev-reattach $VIRSH_GPU_AUDIO
virsh nodedev-reattach $VIRSH_GPU_USB
virsh nodedev-reattach $VIRSH_GPU_SERIAL
## Unbind ssd from vfio and bind to nvme
virsh nodedev-reattach $VIRSH_NVME_SSD

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

Don't forget to make these scripts executable with chmod +x <script_name>. Then place these scripts so that your directory structure looks like this:

$ tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       └── bind_vfio.sh
        └── release
            └── end
                └── unbind_vfio.sh

We've succesfully created libvirt hook scripts to dynamically bind the vfio drivers before the VM starts and unbind these drivers after the VM terminates. At the moment, we're done messing around with libvirt hooks. We'll revisit this topic later on when we make performance tweaks to our VM (see Part 4).

Part 3: Creating the VM

We're ready to begin creating our VM. There are basically two options for how to achieve this: (1) If you prefer a GUI approach, then follow the rest of this tutorial. (2) If you prefer bash scripts, take a look at YuriAlek's series of GPU passthrough scripts and customize them to fit your needs. The main difference between these two methods lies with the fact that the scripting approach uses bare QEMU commands⁹, while the GUI approach uses virt-manager. Virt-manager essentially builds on-top of the QEMU base-layer and adds other features/complexity.¹⁰

Go ahead and start virt-manager from your list of applications. Select the button on the top left of the GUI to create a new VM:

Select the "Local install media" option. My ISOs are stored in my home directory /home/user/.iso, so I'll create a new pool and select the Windows 10 ISO from there:

Configure some custom RAM and CPU settings for your VM:

Next, the GUI asks us whether we want to enable storage for the VM. As already mentioned, my setup will be using SSD passthrough so I chose not to enable virtual storage. However, you still have the option to enable storage and create a RAW disk image which will be stored under the default path of /var/lib/libvirt/images:

On the last step, review your settings and select a name for your VM. Make sure to select the checkbox "Customize configuration before installation" and click Finish:

A new window should appear with more advanced configuration options. You can alter these options through the GUI or the associated libvirt XML settings. Make sure that on the Overview page under Firmware you select UEFI x86_64: /usr/share/OVMF/OVMF_CODE.fd:

Go to the CPUs page and remove the check next to Copy host CPU configuration and under Model type host-passthrough. Also make sure to check the option for Enable available CPU security flaw mitigations to prevent against Spectre/Meltdown vulnerabilities.

I've chosen to remove several of the menu options that won't be useful to my setup (feel free to keep them if you'd like):

Let's add the virtIO drivers. Click 'Add Hardware' and under 'Storage', create a custom storage device of type CDROM. Make sure to locate the ISO image for the virtIO drivers from earlier:

Under the NIC menu, change the device model to virtIO for improved networking performance:

Now it's time to configure our passthrough devices! Click 'Add Hardware' and under 'PCI Host Device', select the Bus IDs corresponding to your GPU.

Make sure to repeat this step for all the devices associated with your GPU in the same IOMMU group (usually VGA, audio controller, etc.):

Since I'm passing through an entire disk to my VM, I selected the Bus ID corresponding to the 1TB Samsung NVMe SSD which has Windows 10 (and my games) installed on it.

Then under the 'Boot Options' menu, I added a check next to Enable boot menu and reorganized the devices so that I could boot directly from the 1TB SSD:

You can now go ahead and select the USB Host Devices you'd like to passthrough to your guest VM (usually a keyboard, mouse, etc.). Please note that these devices will be held by the guest VM from the moment it's created until it's stopped and will be unavailable to the host.¹¹

Unfortunately, not everything we need can be accomplished within the virt-manager GUI. For the rest of this section, we'll have to do some fine-tuning by directly editing the XML (make sure to "Enable XML settings" under Edit -> Preferences -> General or use $ sudo virsh edit win10 for a command-line approach):

If you're like me and you're passing through an NVIDIA GPU to your VM, then you might run into the following common roadblock. Error 43 occurs because NVIDIA intentionally disables virtualization features on its GeForce line of cards. The way to deal with this is to have the hypervisor hide its existence. Inside the hyperv section, add a tag for vendor_id such that state="on" and value is any string up to 12 characters long:

<features>
    ...
    <hyperv>
        <relaxed state="on"/>
        <vapic state="on"/>
        <spinlocks state="on" retries="8191"/>
        <vendor_id state="on" value="kvm hyperv"/>
    </hyperv>
    ...
</features>

In addition, instruct the kvm to hide its state by adding the following code directly below the hyperv section:

<features>
    ...
    <hyperv>
        ...
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    ...
</features>

Finally, if you're using QEMU 4.0 with the q35 chipset you also need to add the following code at the end of <features>:

<features>
    ...
    <ioapic driver="kvm"/>
</features>

Now you should have no issues with regards to the NVIDIA Error 43. Later on, we will be making more changes to the XML to achieve better performance (see Part 4). At this point however, you can apply the changes and select "Begin Installation" at the top left of the GUI. Please be aware that this may take several minutes to complete.

Part 4: Improving VM Performance

None of the following performance optimizations are necessary to get a working GPU passthrough system. However, these tweaks will make a difference if you're at all concerned about reaching buttery-smooth gaming performance. Though some of these changes are more difficult than others, I highly advise you to at least consider them.

Hugepages

Memory (RAM) is divided up into basic segments called pages. By default, the x86 architecture has a page size of 4KB. CPUs utilize pages within the built in memory management unit (MMU). Although the standard page size is suitable for many tasks, hugepages are a mechanism that allow the Linux kernel to take advantage of large amounts of memory with reduced overhead. Hugepages can vary in size anywhere from 2MB to 1GB. Hugepages are enabled by default but if they aren't, make sure to download the package: $ sudo apt install libhugetlbfs-bin.¹²

Go back to your VM's XML settings by either using the virt-man GUI or the command: $ sudo virsh edit {vm-name}. Insert the memoryBacking lines so that your configuration looks like this:

<memory unit="KiB">16777216</memory>
<currentMemory unit="KiB">16777216</currentMemory>
<memoryBacking>
    <hugepages/>
</memoryBacking>

Many tutorials will have you reserve hugepages for your guest VM at host boot-time. There's a significant downside to this approach: a portion of RAM will be unavailable to your host even when the VM is inactive. In my setup, I've chose to allocate hugepages before the VM starts and deallocate those pages on VM shutdown through the use of two additional executable scripts¹³ inside libvirt hooks (see Part 2):

$ tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       ├── ...
        │       └── alloc_hugepages.sh
        └── release
            └── end
                ├── ...
                └── dealloc_hugepages.sh

alloc_hugepages.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Calculate number of hugepages to allocate from memory (in MB)
HUGEPAGES="$(($MEMORY/$(($(grep Hugepagesize /proc/meminfo | awk '{print $2}')/1024))))"

echo "Allocating hugepages..."
echo $HUGEPAGES > /proc/sys/vm/nr_hugepages
ALLOC_PAGES=$(cat /proc/sys/vm/nr_hugepages)

TRIES=0
while (( $ALLOC_PAGES != $HUGEPAGES && $TRIES < 1000 ))
do
    echo 1 > /proc/sys/vm/compact_memory            ## defrag ram
    echo $HUGEPAGES > /proc/sys/vm/nr_hugepages
    ALLOC_PAGES=$(cat /proc/sys/vm/nr_hugepages)
    echo "Succesfully allocated $ALLOC_PAGES / $HUGEPAGES"
    let TRIES+=1
done

if [ "$ALLOC_PAGES" -ne "$HUGEPAGES" ]
then
    echo "Not able to allocate all hugepages. Reverting..."
    echo 0 > /proc/sys/vm/nr_hugepages
    exit 1
fi

dealloc_hugepages.sh

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

echo 0 > /proc/sys/vm/nr_hugepages

CPU Governor

This performance tweak¹⁴ takes advantage of the CPU frequency scaling governor in Linux. It's a feature that is often ofterlooked in many passthrough tutorials, but we include it here because it's recommended. Once again, we'll be utilizing libvirt's hook system (see Part 2):

$ tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       ├── ...
        │       └── cpu_mode_performance.sh
        └── release
            └── end
                ├── ...
                └── cpu_mode_ondemand.sh

cpu_mode_performance.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Enable CPU governor performance mode
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

cpu_mode_ondemand.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Enable CPU governor on-demand mode
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "ondemand" > $file; done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

CPU Pinning

This performance tweak applies only to those of you whose processors are multithreaded. My setup has an AMD Ryzen 9 3900X which has 12 physical cores and 24 threads (i.e. logical cores).

VMs are unable to distinguish between these physical and logical cores. From the guest's perspective, virt-manager sees that there are 24 virtual CPUs (vCPUs) available. From the host's perspective however, two virtual cores map to a single physical core on the CPU die.

It's very important that when we passthrough a core, we include its sibling. To get a sense of your cpu topology, use the command $ lscpu -e". A matching core id (i.e. "CORE" column) means that the associated threads (i.e. "CPU" column) run on the same physical core.¹⁵

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 3800.0000 2200.0000
  1    0      0    1 1:1:1:0          yes 3800.0000 2200.0000
  2    0      0    2 2:2:2:0          yes 3800.0000 2200.0000
  3    0      0    3 3:3:3:1          yes 3800.0000 2200.0000
  4    0      0    4 4:4:4:1          yes 3800.0000 2200.0000
  5    0      0    5 5:5:5:1          yes 3800.0000 2200.0000
  6    0      0    6 6:6:6:2          yes 3800.0000 2200.0000
  7    0      0    7 7:7:7:2          yes 3800.0000 2200.0000
  8    0      0    8 8:8:8:2          yes 3800.0000 2200.0000
  9    0      0    9 9:9:9:3          yes 3800.0000 2200.0000
 10    0      0   10 10:10:10:3       yes 3800.0000 2200.0000
 11    0      0   11 11:11:11:3       yes 3800.0000 2200.0000
 12    0      0    0 0:0:0:0          yes 3800.0000 2200.0000
 13    0      0    1 1:1:1:0          yes 3800.0000 2200.0000
 14    0      0    2 2:2:2:0          yes 3800.0000 2200.0000
 15    0      0    3 3:3:3:1          yes 3800.0000 2200.0000
 16    0      0    4 4:4:4:1          yes 3800.0000 2200.0000
 17    0      0    5 5:5:5:1          yes 3800.0000 2200.0000
 18    0      0    6 6:6:6:2          yes 3800.0000 2200.0000
 19    0      0    7 7:7:7:2          yes 3800.0000 2200.0000
 20    0      0    8 8:8:8:2          yes 3800.0000 2200.0000
 21    0      0    9 9:9:9:3          yes 3800.0000 2200.0000
 22    0      0   10 10:10:10:3       yes 3800.0000 2200.0000
 23    0      0   11 11:11:11:3       yes 3800.0000 2200.0000

If you're more of a visual learner, perhaps a diagram of your CPU architecture will help you visualize what's going on. Download the hwloc package with $ sudo apt install hwloc. Then simply type the command $ lstopo:

It's time to edit the XML configuration of our VM. I've added the following lines of code to pass physical cores #6-11 to the guest and leave physical cores #0-5 with the host (customize for your processor):

<vcpu placement="static">12</vcpu>
<cputune>
    <vcpupin vcpu="0" cpuset="6"/>
    <vcpupin vcpu="1" cpuset="18"/>
    <vcpupin vcpu="2" cpuset="7"/>
    <vcpupin vcpu="3" cpuset="19"/>
    <vcpupin vcpu="4" cpuset="8"/>
    <vcpupin vcpu="5" cpuset="20"/>
    <vcpupin vcpu="6" cpuset="9"/>
    <vcpupin vcpu="7" cpuset="21"/>
    <vcpupin vcpu="8" cpuset="10"/>
    <vcpupin vcpu="9" cpuset="22"/>
    <vcpupin vcpu="10" cpuset="11"/>
    <vcpupin vcpu="11" cpuset="23"/>
    <emulatorpin cpuset="0-3"/>
    <iothreadpin iothread='1' cpuset='4-5,12-17'/>
</cputune>

If you're wondering why I tuned my CPU configuration this way, I'll refer you to this section of the Libvirt domain XML format.¹⁶ More specifically, consider the cputune element and its underlying vcpupin, emulatorpin, and iothreadpin elements. The Arch Wiki recommends to pin the emulator and iothreads to host cores (if available) rather than the VCPUs assigned to the guest. In the example above, 12 out of my 24 threads are assigned as vCPUs to the guest and from the remaining 12 threads on the host, 4 are assigned to the emulator and 8 are assigned to an iothread see below.

Go ahead and edit <cpu> to formally define the CPU topography of your VM. In my case, I'm allocating 1 socket with 6 physical cores and 2 threads per core:

<cpu mode="host-passthrough" check="none">
  <topology sockets="1" cores="6" threads="2"/>
  <cache mode='passthrough'/>
  <feature policy='require' name='topoext'/>
</cpu>

Disk Tuning

As you may or may not remember, my setup passes control of an SSD device controller to the VM. This bypasses any need or concern I'd have with improving virtualized disk performance (I/O reads + writes). If this is not the case for your setup, then you probably allocated a virtual storage disk on your host device. For the rest of this section, let's assume my setup uses a RAW virtual disk image stored at /var/lib/libvirt/images/pool/win10.img on which I'd like to improve I/O performance.

KVM and QEMU provide two paravirtualized storage backends: the older virtio-blk (default) and the more modern virtio-scsi. Although it's beyond the scope of this tutorial to discuss their differences, this post highlights the main architectural difference between the two:

virtio-blk:

guest: app -> Block Layer -> virtio-blk
host: QEMU -> Block Layer -> Block Device Driver -> Hardware

virtio-scsi:

guest: app -> Block Layer -> SCSI Layer -> scsi_mod
host: QEMU -> Block Layer -> SCSI Layer -> Block Device Driver -> Hardware

In essence, virtio-scsi adds an additional complexity layer that provides it with more features and flexibility than virtio-blk.¹⁷ Whichever paravirtualized storage type you decide to go with is entirely up to you; I suggest you run performance tests on both. Make sure that in your CPU configuration, you've assigned an IOThread:

<vcpu placement="static">12</vcpu>
<iothreads>1</iothreads>
<cputune>
    ...
    <emulatorpin cpuset="0-3"/>
    <iothreadpin iothread='1' cpuset='4-5,12-17'/>
</cputune>

Here you can see that I've included an iothreads element with a value of 1. I've also included the iothreadpin element to define the number of CPU pins applied to the single iothread. I highly recommend reviewing this section of the Arch Wiki to decide on your CPU pinning strategy. Ultimately, it's up to you on how you want to divide the CPU pins among the emulator and iothreads.

The final step is to either: (1) create the virtio-scsi controller and attach our disk or (2) make sure our disk is defined correctly for virtio-blk (default). Note that you can only have one iothread per disk controller.

virtio-scsi:

<domain type="kvm">
    ...
    <devices>
        ...
        <disk type='file' device='disk'>
            <driver name='qemu' type='raw' cache='none' io='threads' discard='unmap' queues='8'/>
            <source dev='/var/lib/libvirt/images/pool/win10.img'/>
            <target dev='sdc' bus='scsi'/>
            <address type='drive' controller='0' bus='0' target='0' unit='2'/>
        </disk>
        ...
        <controller type='scsi' index='0' model='virtio-scsi'>
            <driver iothread='1' queues='8'/>
            <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
        </controller>       
        ...
      </devices>
</domain>

virtio-blk:

<domain type="kvm">
    ...
    <devices>
        ...
        <disk type='file' device='disk'>
            <driver name='qemu' type='raw' cache='none' io='native' discard='unmap' iothread='1' queues='8'/>
            <source dev='/var/lib/libvirt/images/pool/win10.img'/>
            <target dev='vdc' bus='virtio'/>
        </disk>
        ...
    </devices>
    ...
</domain>

The final thing to remember is that during the windows installation on your virtual disk, you need to include the virtio-iso as the second CDROM to load the drivers (we've already completed this in a previous section).

Hyper-V Enlightenments

Hyper-V enlightenments help the guest VM handle virtualization tasks. Libvirt has a detailed breakdown of these features. I've chosen to go with the set of features recommended in this tutorial due to hardware similarities:

<features>
    ...
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="kvm hyperv"/>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <frequencies state='on'/>
    </hyperv>
    ...
</features>

Part 5: Benchmarks

Congrats! You've finished setting up your Windows gaming VM! But now comes the most important part... Let's compare the bare-metal performance of Windows against our KVM. If everything goes according to plan, we can expect somewhat close to native performance on the VM. In order to test this theory, I used the following benchmark software: UserBenchmark. Check out the results¹⁸ for yourself:

Hopefully your results are as good as mine, if not better!

Part 6: Software Licensing Considerations

When running in the qemu environment, as described above, unique system identifiers are set by the virtual environment. These identifiers are often used to tie a software license to a physical machine. Because the virtual machine is merely duplicating the physical machine, one can copy the physical system system identifiers into the virtual machine. If one is also using a dedicated physical device for the virtual machine storage, this allows booting the Windows installation as a virtual machine or natively with dual-boot.

To do this, one needs to modify the XML of the virtual machine to replicate their system. An example with some valid values is below:

<sysinfo type="smbios">
    <bios>
      <entry name="vendor">American Megatrends, Inc.</entry>
      <entry name="version">0812</entry>
      <entry name="date">02/24/2023</entry>
      <entry name="release">8.12</entry>
    </bios>
    <system>
      <entry name="manufacturer">ASUS</entry>
      <entry name="product">System Product Name</entry>
      <entry name="version">System Version</entry>
      <entry name="serial">System Serial Number</entry>
      <entry name="uuid">UNIQUE_UUID</entry>
      <entry name="sku">SKU</entry>
      <entry name="family">To be filled by O.E.M.</entry>
    </system>
    <baseBoard>
      <entry name="manufacturer">ASUSTeK COMPUTER INC.</entry>
      <entry name="product">PRIME Z790-P WIFI</entry>
      <entry name="version">Rev 1.xx</entry>
      <entry name="serial">UNIQUE_SERIAL_NUMBER</entry>
      <entry name="asset">Default string</entry>
    </baseBoard>
  </sysinfo>

Acquiring the system values involves using dmidecode. Root privileges are required. An example invocation is dmidecode -s bios-vendor; the full translation to the XML above is:

<sysinfo type="smbios">
    <bios>
      <entry name="vendor">dmidecode -s bios-vendor</entry>
      <entry name="version">dmidecode -s bios-vendor</entry>
      <entry name="date">dmidecode -s bios-release-date</entry>
      <entry name="release">dmidecode -s bios-version</entry>
    </bios>
    <system>
      <entry name="manufacturer">dmidecode -s system-manufacturer</entry>
      <entry name="product">dmidecode -s system-product-name</entry>
      <entry name="version">dmidecode -s system-version</entry>
      <entry name="serial">dmidecode -s system-serial-number</entry>
      <entry name="uuid">dmidecode -s system-uuid</entry>
      <entry name="sku">dmidecode -s system-sku-number</entry>
      <entry name="family">dmidecode -s system-family</entry>
    </system>
    <baseBoard>
      <entry name="manufacturer">dmidecode -s baseboard-manufacturer</entry>
      <entry name="product">dmidecode -s baseboard-product-name</entry>
      <entry name="version">dmidecode -s baseboard-version</entry>
      <entry name="serial">dmidecode -s baseboard-serial-number</entry>
      <entry name="asset">dmidecode -s baseboard-asset-tag</entry>
    </baseBoard>
  </sysinfo>

Lastly, by default Linux systems store the physical hardware clock as UTC. When dual-booting; this conflicts with the relative clock used by default in Windows or the virtual environment setup. This clock delta can cause the time to change on the system in unhealthy ways during reboots; specifically voiding certificates and causing havoc with software licensing tools. To rectify this, modify the clock section of the virtual machine XML, to utilize UTC instead of localtime.

<clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>

In addition, within the virtual machine edit the registry to add a new dword("RealTimeIsUniversal"=dword:00000001) to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation.

Power-off and power-on the VM to verify the virtual machine is reporting the correct time. At this point one could natively boot into the operating system and use many hardware-locked license protected software offerings.

Credits & Resources

Docs
- ArchWiki
- Libvirt
- Linux Kernel
  - KVM
  - VFIO
Tutorials
- Heiko Sieger - Running Windows 10 on Linux using KVM with VGA Passthrough
- Alex Williamson - VFIO GPU How To series
- David Yates - GPU passthrough: gaming on Windows on Linux
- Wendell - VFIO in 2019 – Pop!_OS How-To
  - Wendell is from Level1Techs. He has contributed to the FOSS community with a cool application called Looking Glass. I recommend you check out this video for more information.
  - Wendell has even collaborated with Linus from Linus Tech Tips and put out this video.
- Yuri Alek - Single GPU passthrough
- Jack Ford - Ubuntu 18.04 - KVM/QEMU Windows 10 GPU Passthrough
- Bsilvereagle - Virtualizing Windows 7 (or Linux) on a NVMe drive with VFIO
- Mathias Hauber
- Arseniy Shestakov - How To: pass GPU to VM and back without X restart
- Rokas Kupstys - Performance of your gaming VM
Videos
- GrayWolfTech - Play games in Windows on Linux! PCI passthrough quick guide
- Raven Repair Co. - How to create a KVM gaming virtual machine in under 30 minutes!
- Level1Linux - GPU Passthrough for Virtualization with Ryzen: Now Working
Blogs
- The Passthrough Post
- Heiko Sieger
Lectures
- Alex Williamson - Red Hat
  - An Introduction to PCI Device Assignment with VFIO
  - VFIO Device Assignment Quirks, How to use Them and How to Avoid Them
- Martin Polednik - Red Hat
  - Helping Users Maximize VM Performance
- Neo Jia & Kirti Wankhede - NVIDIA
  - vGPU on KVM - A VFIO Based Framework
Communities

Footnotes

Check out this thread from Hacker News for more information. ^↺
I'll be using the term iGPU to refer to Intel's line of integrated GPUs that usually come built into their processors, and the term dGPU to refer to dedicated GPUs which are much better performance-wise and meant for gaming or video editing (NVIDIA/AMD). ^↺
Make sure that the monitor input used for your gaming VM supports FreeSync/G-Sync technology. In my case, I reserved the displayport 1.2 input for my gaming VM since G-Sync is not supported across HDMI (which was instead used for host graphics). ^↺
I specifically wanted my Linux host to be able to perform CUDA work on the attached NVIDIA GPU. Just because my graphics card wasn't attached to a display didn't stop me from wanting to use cuDNN for ML/AI applications. ^↺
Applying the ACS Override Patch may compromise system security. Check out this post to see why the ACS patch will probably never make its way upstream to the mainline kernel. ^↺
I'm actually being a bit disingenuous here... I deliberately purchased hardware that I knew would provide ACS implementation (and hence good IOMMU isolation). After flashing the most recent version of my motherboard's BIOS, I made sure to enable the following features under the "AMD CBS" menu: ACS Enable, AER CAP, ARI Support. ^↺
AMD CPUs/motherboards/chipsets tend to provide better ACS support than their Intel counterparts. The Intel Xeon family of processors is a notable exception. Xeon is mainly targeted at non-consumer workstations and thus are an excellent choice for PCI/VGA passthrough. Be aware that they do demand a hefty price tag. ^↺
Credit to the solution presented in this post. ^↺
If you decide to use bash scripts to launch your VM, I've included a file in the repository called qemu.sh. Make sure to fill out the #TODO section of the code with your custom version of the command qemu-system-x86-64. ^↺
See this link for more details and a comparison between QEMU and virt-manager. ^↺
See this link and this for software/hardware solutions that share your keyboard and mouse across your host and guest. ^↺
For more information on hugepages, refer to this link. ^↺
Credit to the comment from /u/tholin in this post. ^↺
Credit to Mathias Hueber in this post. ^↺
See a similar discussion here from Rokas Kupstys in this post. ^↺
If you're curious about the best CPU pinning strategy for optimizing the latency vs. performance tradeoff, I recommend you check out this discussion. ^↺
Although the overall performance between virtio-blk and virtio-scsi is similar, passing a single virtio-scsi controller can handle a multitude of PCI devices, whereas virtio-blk exposes one PCI device per controller. This comment on Reddit from a RedHat employee provides some good context and resources. ^↺
For the sake of fairness, I chose to passthrough all 12-cores/24 threads to the KVM. That way, the bare-metal installation won't have an unfair advantage over the KVM when it comes to multi-core processes. Unfortunately, I couldn't passthrough all 32GB of RAM to the KVM since the host naturally reserves some of its own. In order to mitigate this as much as possible, I passed the remaining 29GB of RAM to the KVM. Due to its nature, a surplus of RAM doesn't really improve performance so much as it prevents bottlenecking. ^↺

gpu-passthrough-tutorial's People

Contributors

Stargazers

Watchers

Forkers

ppaudel93 tygore zihaoyu atavacron ta-vroom bytevil felix-indoing heinsm olehlong galaxy001 limon amigd23 usrbinkat mathueb amrhamedp mcfarlanedev kriszos 2b6437 nunogt hritwiksinghal mvgjorge mrparkerlol smoha191 gubbelgobbel jions7ihj mspiegel31 farhad-rezazadeh kebinuchiousu darkguy2008 bubonicbear edumucelli ryancargan shaunhp taotien jango denji kushalkolar felipejfc valentinsche varunlakshmaiah faraoman crbanman elbachir-one kamalallouzi antoniogonzalezs jogiji irjeyaraj adam-dodson morpheus0x 5l1v3r1 gardotd426 gamemodforks suxue 4o4 ozymandis500 stjordanis noslin005 sparksd2145 rejo-philip leroy-walton fullstackalex roomforyeesus gg-big-org jafeeye bpeperkamp kartikmodi httpanimation smvorwerk danieleagle samyang2558 bctcvai spmzt mlkood emaxdh enamulhasanabid jucarei realtehreal ssun3 jlengelbrecht zperzendetta ehbnr schyhub sokoldmytro nagymathev lbognanni cristianogregnanin forkedintime

gpu-passthrough-tutorial's Issues

alloc_hugepages.sh error

hi
when i run alloc_hugepages.sh :

./alloc_hugepages.sh: line 7: /2: syntax error: operand expected (error token is "/2")
Allocating hugepages...
./alloc_hugepages.sh: line 10: echo: write error: Invalid argument
./alloc_hugepages.sh: line 14: ((: 0 != && 0 < 1000 : syntax error: operand expected (error token is "&& 0 < 1000 ")

nvidia/nvidia-drm drivers do not automatically bind/unbind

I have to manually load nvidia modules if I want to use them with the host. Do I have to modify my scripts somehow to enable drivers to load properly?

unable to map backing store for guest RAM: Cannot allocate memory

Hello and first and foremost, thank you for this page, it relly helps!

when trying to use the Hugpages, despite having done everything you did and having the same amount of RAM (32gb) I'm getting the below error message when trying to start the VM

unable to map backing store for guest RAM: Cannot allocate memory

Any idea what could go wrong?

Ryzen Host-Passthrough

Hello @bryansteiner ,

Congratulations on a very well written and detailed guide for Linux KVM Passthrough. I happen to have a configuration very similar to yours bar the CPU being a 3700X and have been scouring the net to find solution to my problem. I was hoping you could be of assistance.

When I configure the cpu mode to host-passthrough the vm has significant performance penalty, ranging from long boot time, login, freeze at boot/login/ after login. In fact I only managed to login once with host-passthrough out of all attempts to see in task manager the full CPU as I configured it with Virtualization: Capable.

The only way I can get it to boot is by setting it to custom with <model fallback='allow'>EPYC-IBPB</model> where performance is close to native but it is Virtualization: Not capable.

From what I gather from your post it seems like you have no such issue, could you please tell me if it is due to you running into it but circumventing it or `host-passthrough just works for you ?

alloc_hugepages.sh $MEMORY is not defined

This command does not work because $MEMORY is not defined:

HUGEPAGES="$(($MEMORY/$(($(grep Hugepagesize /proc/meminfo | awk '{print $2}')/1024))))"

I assume you meant to add this as an configuration parameter in kvm.conf?

nodedev-detach hangs

(creating a new issue with the same body as my reply to the old one)
Originally posted by @Moonlight63 in #16 (comment)

Hello, Thank you for the guide. I have been running passthrough for a while, but just did a fresh install of Pop and thought it might be nice to be able to use my second gpu when not running VMs. I am having the same issues as others here.

Running
virsh nodedev-detach $VIRSH_GPU_VIDEO
where, for me

VIRSH_GPU_VIDEO=pci_0000_02_00_0
VIRSH_GPU_AUDIO=pci_0000_02_00_1

causes a hang.

My gpus are on there own IOMMU groups

IOMMU Group 34 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
IOMMU Group 34 02:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
IOMMU Group 35 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
IOMMU Group 35 01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

I have modified my xorg.conf so that it only uses the 1080 for host, and I have disabled AutoAddGPU, and I have a 3 monitor setup with all 3 plugged into the 3 displayports on the 1080, my full config is this:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 460.73.01

Section "ServerFlags"
	Option "AutoAddGPU" "off"
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "SAC DP"
    HorizSync       30.0 - 222.0
    VertRefresh     30.0 - 144.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080"
    BusID          "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-6"
    Option         "metamodes" "DP-4: 2560x1440_144 +2560+0, DP-0: 2560x1440_144 +0+0, DP-2: 2560x1440_144 +5120+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

And finally I have verified that the 1070 is not being used by anything with nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P0    46W / 210W |    429MiB /  8116MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   35C    P8    11W / 230W |      2MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4462      G   /usr/lib/xorg/Xorg                347MiB |
|    0   N/A  N/A      4957      G   /usr/bin/gnome-shell               78MiB |
+-----------------------------------------------------------------------------+

As a side note, my CPU doesn't list the virtualization option as VT-d, but rather calls it by it's full name in dmesg:

[    0.302075] DMAR: IOMMU enabled
...
[    0.543400] DMAR-IR: IOAPIC id 8 under DRHD base  0xfbffc000 IOMMU 1
[    0.543401] DMAR-IR: IOAPIC id 9 under DRHD base  0xfbffc000 IOMMU 1
[    0.543402] DMAR-IR: HPET id 0 under DRHD base 0xfbffc000
[    0.543403] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.543404] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.544009] DMAR-IR: Enabled IRQ remapping in xapic mode
[    5.119927] DMAR: [Firmware Bug]: RMRR entry for device 06:00.0 is broken - applying workaround
[    5.119931] DMAR: dmar0: Using Queued invalidation
[    5.119937] DMAR: dmar1: Using Queued invalidation
[    5.129946] DMAR: Intel(R) Virtualization Technology for Directed I/O

Possibly because it's a xeon? Just thought I would mention it for others who come here.

Anyway, as far as I can tell, I've done everything mentioned and I can't find anything else that would be stopping the unload. Any ideas? Yes my scripts are executable, and I've been trying to just run the commands one by one in terminal to see if I can find an error exit, but since virsh nodedev-detach never completes and just hangs, no error is reported. Any help is greatly appreciated.

virt-manager cannot find PCIe devices from name

My file structure -

.
├── kvm.conf

├── qemu

└── qemu.d
    
    └── win10

        ├── prepare

        │   └── begin

        │       ├── alloc_hugepages.sh

        │       ├── bind_vfio.sh

        │       └── cpu_mode_performance.sh

        └── release

            └── end

                ├── cpu_mode_ondemand.sh

                ├── dealloc_hugepages.sh

                └── unbind_vfio.sh

All files can be executed -

root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x kvm.conf
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/prepare/begin/*
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/prepare/begin/alloc_hugepages.sh 
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/prepare/begin/bind_vfio.sh
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/prepare/begin/cpu_mode_performance.sh
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/release/end/*
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/release/end/cpu_mode_ondemand.sh
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/release/end/dealloc_hugepages.sh
root@test_multi_gpu_system:/etc/libvirt/hooks# chmod +x /etc/libvirt/hooks/qemu.d/win10/release/end/unbind_vfio.sh
root@test_multi_gpu_system:/etc/libvirt/hooks#

My IOMMU groups -

IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec2] (rev 07)
IOMMU Group 10 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]
IOMMU Group 11 04:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [8086:1905] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
IOMMU Group 1 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
IOMMU Group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 630 (Desktop) [8086:3e92]
IOMMU Group 3 00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
IOMMU Group 4 00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
IOMMU Group 4 00:14.2 Signal processing controller [1180]: Intel Corporation 200 Series PCH Thermal Subsystem [8086:a2b1]
IOMMU Group 5 00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba]
IOMMU Group 6 00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282]
IOMMU Group 7 00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #1 [8086:a290] (rev f0)
IOMMU Group 8 00:1c.4 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #5 [8086:a294] (rev f0)
IOMMU Group 9 00:1f.0 ISA bridge [0601]: Intel Corporation Z370 Chipset LPC/eSPI Controller [8086:a2c9]
IOMMU Group 9 00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1]
IOMMU Group 9 00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
IOMMU Group 9 00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3]

I am trying to pass both 1080Tis to the VM, so my kvm.conf, bind_vfio.sh and unbind_vfio.sh are a little different and also I do not know if the VM will automatically detect the added keyboard and mouse -

(base) root@test_multi_gpu_system:/etc/libvirt/hooks# cat kvm.conf
VIRSH_GPU_VIDEO_A=pci_0000_01_00_0
VIRSH_GPU_VIDEO_B=pci_0000_02_00_0
VIRSH_GPU_AUDIO_A=pci_0000_01_00_1 
VIRSH_GPU_AUDIO_B=pci_0000_02_00_2
VIRSH_DEFAULT_AUDIO=pci_0000_00_1f_3
VIRSH_GPU_USB_A=pci_0000_00_14_0
VIRSH_GPU_USB_B=pci_0000_04_00_0
VIRSH_GPU_SERIAL_A=pci_0000_01_00_3
VIRSH_GPU_SERIAL_B=pci_0000_02_00_3
(base) root@test_multi_gpu_system:/etc/libvirt/hooks# cat qemu.d/win10/prepare/begin/bind_vfio.sh
#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach VIRSH_GPU_VIDEO_A
virsh nodedev-detach VIRSH_GPU_VIDEO_B
virsh nodedev-detach VIRSH_GPU_AUDIO_A
virsh nodedev-detach VIRSH_GPU_AUDIO_B
virsh nodedev-detach VIRSH_DEFAULT_AUDIO
virsh nodedev-detach VIRSH_GPU_USB
virsh nodedev-detach VIRSH_GPU_SERIAL_A
virsh nodedev-detach VIRSH_GPU_SERIAL_B
(base) root@test_multi_gpu_system:/etc/libvirt/hooks# cat qemu.d/win10/release/end/unbind_vfio.sh
#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach VIRSH_GPU_VIDEO_A
virsh nodedev-reattach VIRSH_GPU_VIDEO_B
virsh nodedev-reattach VIRSH_GPU_AUDIO_A
virsh nodedev-reattach VIRSH_GPU_AUDIO_B
virsh nodedev-reattach VIRSH_DEFAULT_AUDIO
virsh nodedev-reattach VIRSH_GPU_USB_A
virsh nodedev-reattach VIRSH_GPU_USB_B
virsh nodedev-reattach VIRSH_GPU_SERIAL_A
virsh nodedev-reattach VIRSH_GPU_SERIAL_B

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio
(base) root@test_multi_gpu_system:/etc/libvirt/hooks#

Here, after begining installation virt-manager says that the names don not exist -

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/createvm.py", line 2089, in _do_async_install
    guest.installer_instance.start_install(guest, meter=meter)
  File "/usr/share/virt-manager/virtinst/install/installer.py", line 542, in start_install
    domain = self._create_guest(
  File "/usr/share/virt-manager/virtinst/install/installer.py", line 491, in _create_guest
    domain = self.conn.createXML(install_xml or final_xml, 0)
  File "/usr/lib/python3/dist-packages/libvirt.py", line 4034, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirt.libvirtError: Hook script execution failed: internal error: Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin /etc/libvirt/hooks/qemu win10 prepare begin -) unexpected exit status 1: error: Could not find matching device 'VIRSH_GPU_VIDEO_A'
error: Node device not found: no node device with matching name 'VIRSH_GPU_VIDEO_A'
error: Could not find matching device 'VIRSH_GPU_VIDEO_B'
error: Node device not found: no node device with matching name 'VIRSH_GPU_VIDEO_B'
error: Could not find matching device 'VIRSH_GPU_AUDIO_A'
error: Node device not found: no node device with matching name 'VIRSH_GPU_AUDIO_A'
error: Could not find matching device 'VIRSH_GPU_AUDIO_B'
error: Node device not found: no node device with matching name 'VIRSH_GPU_AUDIO_B'
error: Could not find matching device 'VIRSH_DEFAULT_AUDIO'
error: Node device not found: no node device with matching name 'VIRSH_DEFAULT_AUDIO'
error: Could not find matching device 'VIRSH_GPU_USB'
error: Node device not found: no node device with matching name ''

Any help is appreciated.

How to debug crashes when unbinding

Thanks for the great walk through. I am experiencing hard crashes of the host computer when shutting down the VM 1/20 times. I suspect I happens during unbinding from vfio and rebinding onto nouveau. When I reboot and open the syslog it shows a line fill of @ symbols. Anyway to pinpoint the cause?

MEMORY value in the hugepages hook, and what to do on failure?

Hey Bryan

Absolutely terrific writeup. Very well done.

I have a couple of questions:

Parsing through your steps, you have these 2 lines in alloc_hugepages.sh:

## Calculate number of hugepages to allocate from memory (in MB)
HUGEPAGES="$(($MEMORY/$(($(grep Hugepagesize /proc/meminfo | awk '{print $2}')/1024))))"

Where does the $MEMORY value come from? Is that an environment variable that is available to the qemu/libvirt/whatever process that is running the hook script, or is that something that should be defined by the user in the kvm.conf?

What should a user do, if the host fails to successfully allocate the hugepages on VM start? I was experimenting with this script by itself by assigning a value to MEMORY just prior to the assignment HUGEPAGES=..., found that if I set MEMORY too high, then the hugepages couldn't get allocated (presumably within 1,000 tries):

10:38:01 root /etc/libvirt/hooks/qemu.d/Win10Full% prepare/begin/alloc_hugepages.sh
Allocating hugepages...
$HUGEPAGES == 8192
Succesfully allocated 4067 / 8192
Succesfully allocated 4096 / 8192
Succesfully allocated 4100 / 8192
...
Succesfully allocated 6479 / 8192
Succesfully allocated 6479 / 8192
Succesfully allocated 6481 / 8192
Succesfully allocated 6481 / 8192
Not able to allocate all hugepages. Reverting...

(The extra '$HUGEPAGES == 8192' there is a debug string I added for testing)

qemu.d folder is not created.

Hi I am using Archlinux.

I have downloaded the hooks helper and made it executable, however when I restart libvirtd, nothing happens.
My mechanism for doing this is to run sudo systemctl restart libvirtd as the command in the guide does not work.
Is there something painfully obvious I'm missing, or is it simply not compatible with systemd?
In this case am I able to manually put scripts here rather than using a tool?

nodedev-reattach hangs

I can start my vm but when i stop it my gpu is never rebinded to the amdgpu driver.

I can run /etc/libvirt/hooks/qemu.d/win11/prepare/begin/bind_vfio.sh without any problem, but when running /etc/libvirt/hooks/qemu.d/win11/release/end/unbind_vfio.sh nodedev-reattach hangs indefinitely.

Has anyone encountered a similar issue ?

$MEMORY not assigned in kvm.conf

The alloc_hugepages.sh script makes use of a $MEMORY variable to determine the memory allocated to the VM, but this variable doesn't appear in the kvm.conf file that is sourced.

Otherwise, a joy to read your tutorial. Great work!

Unable to complete install: 'Hook script execution failed: internal error: Child process

Hi there,

Great tutorial but I'm running into some issues. I have 2x 1080Ti's and gettings this error:

Unable to complete install: 'Hook script execution failed: internal error: Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /etc/libvirt/hooks/qemu win10 prepare begin -) unexpected exit status 1: error: unknown command: 'nodedev-detach=pci_0000_04:00.0'
error: Could not find matching device 'pci_0000_04:00.1'
error: Node device not found: no node device with matching name 'pci_0000_04:00.1'
error: Could not find matching device 'pci_0000_00:14.0'
error: Node device not found: no node device with matching name 'pci_0000_00:14.0'
error: Could not find matching device 'pci_0000_00:03.2'
error: Node device not found: no node device with matching name 'pci_0000_00:03.2'
error: Could not find matching device 'pci_0000_05:00.0'
error: Node device not found: no node device with matching name 'pci_0000_05:00.0'
'

Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/createvm.py", line 2089, in _do_async_install
guest.installer_instance.start_install(guest, meter=meter)
File "/usr/share/virt-manager/virtinst/install/installer.py", line 542, in start_install
domain = self._create_guest(
File "/usr/share/virt-manager/virtinst/install/installer.py", line 491, in _create_guest
domain = self.conn.createXML(install_xml or final_xml, 0)
File "/usr/lib/python3/dist-packages/libvirt.py", line 4034, in createXML
if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirt.libvirtError: Hook script execution failed: internal error: Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /etc/libvirt/hooks/qemu win10 prepare begin -) unexpected exit status 1: error: unknown command: 'nodedev-detach=pci_0000_04:00.0'
error: Could not find matching device 'pci_0000_04:00.1'
error: Node device not found: no node device with matching name 'pci_0000_04:00.1'
error: Could not find matching device 'pci_0000_00:14.0'
error: Node device not found: no node device with matching name 'pci_0000_00:14.0'
error: Could not find matching device 'pci_0000_00:03.2'
error: Node device not found: no node device with matching name 'pci_0000_00:03.2'
error: Could not find matching device 'pci_0000_05:00.0'
error: Node device not found: no node device with matching name 'pci_0000_05:00.0'

Here are the way I have the three files you specified configured. Am I doing something wrong?

kvm.conf:

Virsh devices

VIRSH_GPU_VIDEO=pci_04:00.0
VIRSH_GPU_AUDIO=pci_04:00.1
VIRSH_GPU_USB=pci_00:14.0
VIRSH_GPU_SERIAL=pci_00:03.2
VIRSH_NVME_SSD=pci_05:00.0

bind_vfio.sh:

#!/bin/bash

Load the config file

source "/etc/libvirt/hooks/kvm.conf"

Load vfio

modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

Unbind gpu from nvidia and bind to vfio

virsh nodedev-detach=pci_0000_04:00.0
virsh nodedev-detach=pci_0000_04:00.1
virsh nodedev-detach=pci_0000_00:14.0
virsh nodedev-detach=pci_0000_00:03.2

Unbind ssd from nvme and bind to vfio

virsh nodedev-detach=pci_0000_05:00.0

unbind.vfio.sh

#!/bin/bash

Load the config file

source "/etc/libvirt/hooks/kvm.conf"

Unbind gpu from vfio and bind to nvidia

virsh nodedev-reattach=pci_0000_04:00.0
virsh nodedev-reattach=pci_0000_04:00.1
virsh nodedev-reattach=pci_0000_00:14.0
virsh nodedev-reattach=pci_0000_00:03.2

Unbind ssd from vfio and bind to nvme

virsh nodedev-reattach=pci_0000_05:00.0

Unload vfio

modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

For reference these are the devices I'm trying to passthrough:

GPU with HMDI Audio:

04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. GP102 [GeForce GTX 1080 Ti]
Physical Slot: 4-1
Flags: bus master, fast devsel, latency 0, IRQ 85, NUMA node 0, IOMMU group 35
Memory at f8000000 (32-bit, non-prefetchable) [size=16M]
Memory at 383fc0000000 (64-bit, prefetchable) [size=256M]
Memory at 383fd0000000 (64-bit, prefetchable) [size=32M]
I/O ports at d000 [size=128]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

04:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
Subsystem: eVga.com. Corp. GP102 HDMI Audio Controller
Physical Slot: 4-1
Flags: bus master, fast devsel, latency 0, IRQ 73, NUMA node 0, IOMMU group 35
Memory at f9080000 (32-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

USB:

00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05) (prog-if 30 [XHCI])
Subsystem: Micro-Star International Co., Ltd. [MSI] C610/X99 series chipset USB xHCI Host Controller
Flags: bus master, medium devsel, latency 0, IRQ 19, NUMA node 0, IOMMU group 22
Memory at 383ffff00000 (64-bit, non-prefetchable) [size=64K]
Capabilities:
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

Serial:

00:03.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0, IOMMU group 16
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
I/O behind bridge: 0000d000-0000dfff [size=4K]
Memory behind bridge: f8000000-f90fffff [size=17M]
Prefetchable memory behind bridge: 0000383fc0000000-0000383fd1ffffff [size=288M]
Capabilities:
Kernel driver in use: pcieport

NVME:

05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0, IOMMU group 36
Memory at fb800000 (64-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: nvme
Kernel modules: nvme

I know it must be some small silly mistake I'm making or maybe I'm just not understanding how this works. I'm new to VMs on Linux and passthrough and I am dying to play Tarkov (as it is the only game that I play on Windows). Thanks for your help!

Edit: If you need to see everything on my system I'm willing to dump it here too.

Edit 2: Used the correct addresses and still no dice.

Hanging on Creating Domain

I have a thinkpad x1 extreme gen 2, I am following your guide exactly except for passing the NVMe SSD. I also modified the bind and unbind script as well as kvm.conf to ommit the NVMe, and also the GPU serial/USB as there isn't any on a laptop's dGPU.

When I begin the installation of the VM it hangs on Creating Domain.

I end up having to xkill virt manager and when I check sudo service libvirt status, the last log has something to do with the GPU.

Any ideas how I can further diagnose this problem?

I have a few guesses.

maybe it has somethig to do with UEFI, as if I just create a VM with BIOS, there are no issues, but unfortunately no passthrough
Hook Helper tool is somehow failing to to bind the GPU
Maybe it has something to do with my not having multiple monitors as this is a laptop

thank you for your help

Add <iothreads> line in CPU Pinning section

In the CPU Pinning section for Performance related guides, the XML settings are missing the <iothreads>1</iothreads>

This is the current indicated:

<vcpu placement="static">12</vcpu>
<cputune>
    <vcpupin vcpu="0" cpuset="6"/>
    <vcpupin vcpu="1" cpuset="18"/>
    <vcpupin vcpu="2" cpuset="7"/>
    <vcpupin vcpu="3" cpuset="19"/>
    <vcpupin vcpu="4" cpuset="8"/>
    <vcpupin vcpu="5" cpuset="20"/>
    <vcpupin vcpu="6" cpuset="9"/>
    <vcpupin vcpu="7" cpuset="21"/>
    <vcpupin vcpu="8" cpuset="10"/>
    <vcpupin vcpu="9" cpuset="22"/>
    <vcpupin vcpu="10" cpuset="11"/>
    <vcpupin vcpu="11" cpuset="23"/>
    <emulatorpin cpuset="0-3"/>
    <iothreadpin iothread='1' cpuset='4-5,12-17'/>
</cputune>

This is how it should be per your win10.xml config:

<vcpu placement="static">12</vcpu>
<iothreads>1</iothreads>
<cputune>
    <vcpupin vcpu="0" cpuset="6"/>
    <vcpupin vcpu="1" cpuset="18"/>
    <vcpupin vcpu="2" cpuset="7"/>
    <vcpupin vcpu="3" cpuset="19"/>
    <vcpupin vcpu="4" cpuset="8"/>
    <vcpupin vcpu="5" cpuset="20"/>
    <vcpupin vcpu="6" cpuset="9"/>
    <vcpupin vcpu="7" cpuset="21"/>
    <vcpupin vcpu="8" cpuset="10"/>
    <vcpupin vcpu="9" cpuset="22"/>
    <vcpupin vcpu="10" cpuset="11"/>
    <vcpupin vcpu="11" cpuset="23"/>
    <emulatorpin cpuset="0-3"/>
    <iothreadpin iothread='1' cpuset='4-5,12-17'/>
</cputune>

Chmod +x for bind_vfio.sh and unbind_vfio.sh

Tutorial misses out stating that bind_vfio.sh and unbind_vfio.sh should be made executable (chmod)

So close, but not quite working...?

TLDR: when I run the VM, nothing seems to happen (it doesn't appear to run, and it doesn't take over my video, keyboard, or mouse), but if I log out I may not have keyboard/mouse, and if I do and I log back in then one time it showed the VM video (the very early boot phase with the tianocore logo) and I had no keyboard/mouse.

Also, THANK YOU for the guide! Every other guide on the internet wants you to lock down the dGPU so only the guest can use it which seems insane and is also 30x more complicated than the (more useful) way that you do it.

SYSTEM:

Mobo: asus z690 extreme
CPU: intel 13900k
GPU: rtx 4090
Displays:
- one connected to the dGPU via DP
- one connected to the iGPU via HDMI
OS: vanilla Arch, linux-zen, x11, GNOME
- installed via archinstall so everything just works™

A couple notes:

VM is using windows 11 and is therefore named win11
The packages I install to get virt-manager set up:

sudo pacman -S qemu virt-manager virt-viewer dnsmasq vde2 bridge-utils libguestfs dmidecode spice-vdagent
# choose qemu-desktop instead of qemu-base or qemu-full when asked which repository extras I'd like

UEFI x86_64: /usr/share/OVMF/OVMF_CODE.fd is not an option, instead there is UEFI x86_64: /usr/share/edk2/x64/OVMF_CODE.fd
- however, I've tried with that and with the regular UEFI and I still get the same issue
The 4090 only has the GPU and audio to pass through
I've tried with and without doing the CPU host-passthrough step
In an effort to make this first setup as non-complex as possible, I'm going with standard image storage and no cpu pinning/hugepages

This runs with no issue:

$ sudo mkdir -p /etc/libvirt/hooks
$ sudo wget 'https://raw.githubusercontent.com/PassthroughPOST/VFIO-Tools/master/libvirt_hooks/qemu' \
     -O /etc/libvirt/hooks/qemu
$ sudo chmod +x /etc/libvirt/hooks/qemu

This command

sudo systemctl restart libvirtd

is this command in arch

sudo systemctl restart libvirtd

but it works with no issues.

The hooks:

$ tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
    └── win11
        ├── prepare
        │   └── begin
        │       └── bind_vfio.sh
        └── release
            └── end
                └── unbind_vfio.sh

$ cat /etc/libvirt/hooks/kvm.conf
## Virsh devices
VIRSH_GPU_VIDEO=pci_0000_01_00_0
VIRSH_GPU_AUDIO=pci_0000_01_00_1

$ cat /etc/libvirt/hooks/qemu.d/win11/prepare/begin/bind_vfio.sh
#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO

$ cat /etc/libvirt/hooks/qemu.d/win11/release/end/unbind_vfio.sh
#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO
virsh nodedev-reattach $VIRSH_GPU_AUDIO

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

The rest of the steps are followed precisely as they aren't unique to my machine. I don't have issues creating VM's generally.

Any ideas what's going on here?

Virsh detach hangs

Hi, I'm following your tutorial and had it working properly, and I think I did something that ended up breaking it (I was messing with having Nvidia PRIME on while the guest was offline). Now when I boot the machine, it gets stuck and I have no screen. I have ssh'ed into my machine and ran the launch script manually with set -x, and it gives me this:

+ modprobe vfio
+ modprobe vfio_iommu_type1
+ modprobe vfio_pci
+ virsh nodedev-detach pci_0000_01_00_0
error: Failed to detach device pci_0000_01_00_0
error: argument unsupported: VFIO device assignment is currently not supported on this system

+ virsh nodedev-detach pci_0000_01_00_1
error: Failed to detach device pci_0000_01_00_1
error: argument unsupported: VFIO device assignment is currently not supported on this system

+ virsh nodedev-detach pci_0000_01_00_2
error: Failed to detach device pci_0000_01_00_2
error: argument unsupported: VFIO device assignment is currently not supported on this system

+ virsh nodedev-detach pci_0000_01_00_3
error: Failed to detach device pci_0000_01_00_3
error: argument unsupported: VFIO device assignment is currently not supported on this system

I know the system can support VFIO, because it really was working before, with the exact same setup I have now (to my knowledge- except that the PCI ID of the GPU was 0000:02.00.0 (and I have no idea how/why the id changed)). I'm not sure what I did that could have made it no longer supported.

Thank you for posting such a great resource! Any help would be appreciated, even just pointers to how I can debug this myself.

CPU pinning logical/physical indices

I played around with the CPU pinning configuration. Here's the lstopo for Ryzen 9 3950X:

I followed your suggestion of mapping CPU indices such that a core and its siblings will be passed together with physical indices. I.e:

  <cputune>                        
    <vcpupin vcpu='0' cpuset='4'/> 
    <vcpupin vcpu='1' cpuset='20'/> 
    <vcpupin vcpu='2' cpuset='5'/>
    <vcpupin vcpu='3' cpuset='21'/>
    <!-- etc -->

Pass Core L#4 as vcpu 0,1, Core L#4 as vcpu 2,3, etc.

However. When running Geekbench on the VM, the multi-core performance was below expectation.
For comparison: setting vcpupin based on logical indices:

  <cputune>                        
    <vcpupin vcpu='0' cpuset='8'/> 
    <vcpupin vcpu='1' cpuset='9'/> 
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <!-- etc -->

That's a big multi-core performance hit.
I believe that the CPU-pinning should be done with the cpuset value being the logical index instead.

Hanging on Creating Domain #12

Hi,

I'm having the same issue here with virt-manager hanging on creating domain.

dmesg is reporting: NVRM: Attempting to remove device 0000:01:00.0 with non-zero usage count!

I have changed the graphics mode to compute as recommended in issue #12. nvidia-smi indicates that there are no processes running on the Nvidia GPU, but still this issue with a non-zero usage count.

The IO information for the GPU is IOMMU Group 2 01:00.0 3D controller [0302]: NVIDIA Corporation TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)

I have added the scripts bind_vfio.sh & unbind_vfio.sh, and made them executable.

The contents of bind_vfio.sh are:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO

The contents of unbind_vfio.sh are:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

kvm.conf contents are simply:

VIRSH_GPU_VIDEO=pci_0000_01_00_0

Where to go from here?

Thanks

P.S. I saw that newer drivers could cause issues so downgraded drivers to nvidia-470 and tried again. Still hanging at the same point. dmesg:

NVRM: GPU at PCI:0000:01:00: GPU-832b7f23-880e-d656-20c0-331ca6c8873a
[  109.630956] NVRM: Xid (PCI:0000:01:00): 79, pid=3465, GPU has fallen off the bus.
[  109.630958] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[  243.446680] VFIO - User Level meta-driver version: 0.3
[  243.464970] NVRM: Attempting to remove minor device 0 with non-zero usage count!

P.P.S I have further information, by adding set -x at the beginning of bind_vfio.sh we have further logs in dmesg:

[   26.485239] nvidia 0000:01:00.0: not ready 8191ms after resume; waiting
[   34.933241] nvidia 0000:01:00.0: not ready 16383ms after resume; waiting
[   52.597123] nvidia 0000:01:00.0: not ready 32767ms after resume; waiting
[   87.412701] nvidia 0000:01:00.0: not ready 65535ms after resume; giving up
[   87.412759] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.473074] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.473112] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.473413] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.473638] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.473669] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.473846] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.473949] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.473980] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474025] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.474119] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474148] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474309] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.474426] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474457] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474603] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474632] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474912] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474937] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.475108] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.475137] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  145.364413] nvidia 0000:01:00.0: not ready 1023ms after resume; waiting
[  146.420419] nvidia 0000:01:00.0: not ready 2047ms after resume; waiting
[  148.596426] nvidia 0000:01:00.0: not ready 4095ms after resume; waiting
[  152.948405] nvidia 0000:01:00.0: not ready 8191ms after resume; waiting
[  161.396346] nvidia 0000:01:00.0: not ready 16383ms after resume; waiting
[  179.572264] nvidia 0000:01:00.0: not ready 32767ms after resume; waiting
[  214.387968] nvidia 0000:01:00.0: not ready 65535ms after resume; giving up
[  214.388023] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.410562] VFIO - User Level meta-driver version: 0.3
[  214.429107] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429128] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429312] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429326] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429365] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.552809] audit: type=1400 audit(1706607632.349:51): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-aab4928c-15e6-4b1b-aab1-de4fcd2b24af" pid=4628 comm="apparmor_parser"
[  214.681154] audit: type=1400 audit(1706607632.477:52): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-aab4928c-15e6-4b1b-aab1-de4fcd2b24af" pid=4631 comm="apparmor_parser"
[  214.819618] audit: type=1400 audit(1706607632.613:53): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-aab4928c-15e6-4b1b-aab1-de4fcd2b24af" pid=4640 comm="apparmor_parser"
[  216.980015] vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
[  218.035803] vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
[  220.275818] vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
[  224.627829] vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
[  233.075664] vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
[  251.251461] vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting
[  286.067103] vfio-pci 0000:01:00.0: not ready 65535ms after resume; giving up
[  286.067158] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.067542] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137034] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137047] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137052] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137195] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137281] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.
[  286.137283] NVRM: This may be due to a known Linux kernel bug.  Please
               NVRM: see the README section on 64-bit BARs for additional
               NVRM: information.
[  286.137284] nvidia: probe of 0000:01:00.0 failed with error -1

Virt manager now exits cleanly with an error, without hanging:

Unable to complete install: 'internal error: Unknown PCI header type '127' for device '0000:01:00.0''

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 72, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/createvm.py", line 2008, in _do_async_install
    installer.start_install(guest, meter=meter)
  File "/usr/share/virt-manager/virtinst/install/installer.py", line 695, in start_install
    domain = self._create_guest(
  File "/usr/share/virt-manager/virtinst/install/installer.py", line 637, in _create_guest
    domain = self.conn.createXML(initial_xml or final_xml, 0)
  File "/usr/lib/python3/dist-packages/libvirt.py", line 4400, in createXML
    raise libvirtError('virDomainCreateXML() failed')
libvirt.libvirtError: internal error: Unknown PCI header type '127' for device '0000:01:00.0'

Query about host GPU

Hi,

Thanks for writing up this tutorial. I am currently in the market for building a new PC, and I intend to do windows 10 virtualization and gpu passthrough on the new build.

I just wanted to know about whether a host GPU is necessary? Would it be possible to run this setup with a single GPU to be shared between host and guest?

Alternatively, could I do it with an integrated graphics processor (e.g. Ryzen 5 5600G) for the host? Would the setup be much different for this?

Thanks for your time.

iothreads not being used?

Came across this tutorial and noticed you're not using any of the iothreads (or I'm missing something). Not an issue for your setup as you're passing the nvme controller. Just might be worth a mentioning that to make use of them you need to create either scsi controller and attach the disks to that, or use virtio-blk.

scsi:

<controller type='scsi' index='0' model='virtio-scsi'>
<driver queues='4' iothread='1'/>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</controller>

virtio-blk:

<domain>
  ....
  <devices>
    .....
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native' discard='unmap' iothread='1' queues='8'/>
      <source dev='/var/lib/libvirt/images/pool/win10.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    .....
  </devices>
  ....
</domain>

AMD GPU audio won't bind to vfio

See #32 (comment) for fix

I am on Debian Testing with dual AMD discrete GPUs. I attempted to install Windows 10 in a virtual machine with GPU passthrough and CPU settings similar to your tutorial (of course, accordingly modified for my 6 core, 12-thread CPU). The virtual machine would hang on Creating Domain like #12 . I had a similar kvm.conf and unbind and bind script modifications similar to the user in the aforementioned issue. I tried several diagnostics, none to any avail.
Thus, I, with the same settings otherwise, removed the PCIe passthrough of the GPU and just went on installing Windows to fix the GPU issues later. I thought it might be that the display manager was holding that GPU hostage. However, I found that after trying to start the VM, the graphics card I was trying to pass through no longer appeared as it had before when running xrandr --listproviders.
Attempting further testing, I rebooted the PC and connected a single monitor to the GPU I intended to pass through. The desktop appeared on both displays. When I started the VM, the display with the passthrough GPU connected turned off, suggested that the bind script had, in fact, successfully executed. When running lsmod, I found that all kernel modules started by the script were running. However, my 10 cents are still on the idea that it's a kernel module or permissions issue.
I am still lost and trying to figure out the issue. I have bind_vfio.sh, alloc_hugepages.sh, and cpu_mode_performance.sh in my prepare/begin directory along with their corresponding scripts in release/end, all executable. I tested removing all but bind_vfio.sh, but to no avail.

Here is my current tree (after testing the removal):

/etc/libvirt/hooks
├── kvm.conf
├── qemu
└── qemu.d
    └── Windows10
        ├── prepare
        │   └── begin
        │       └── bind_vfio.sh
        └── release
            └── end
                └── unbind_vfio.sh

My system configuration is as followings:
CPU: AMD Ryzen 5 2600
Motherboard: Gigabyte Aorus AX370 Gaming 5
Kernel: liquorix 5.18-17.1~bookworm (which has ACS patches)
Kernel Parameters (defined in /etc/default/grub): quiet splash acpi_enforce_resources=lax pcie_acs_override=downstream,multifunction amd_iommu=on
Main GPU: PowerColor AMD Radeon RX 550 2GB (https://www.amazon.com/PowerColor-Radeon-550-Profile-Graphics/dp/B09V2GYKPJ/ref=sr_1_1?crid=AEZNE0MZSFYJ&keywords=powercolor+radeon+550&qid=1659337204&sprefix=powercolor+radeon+550%2Caps%2C150&sr=8-1)
Passthrough GPU: XFX Radeon RX 580 8GB
RAM: 32GB
Bash Version: 5.1.16
Distribution: Debian bookworm
Desktop: xfce4

I can confirm that IOMMU and AMD-V are enabled, as I turned them on, and iommu shell scripts output devices. The Windows 10 VM boots when passthrough is disabled and at reasonable speed, meaning that virtualization is (probably) working. Before I attempt to start the VM, appending DRI_PRIME=1 to the front of a command allows me to offload rendering to the passthrough GPU, and after attempting, it no longer works and reverts to the RX 550. (Almost the exact desired behavior, except that one, I want the VM running, and two, even when using "source" in the shell as the root user and attempting to execute the unbind script directly, the GPU is not returned and inaccessible for the rest of the session.)

Here's the XML file for my VM
Here's my bind_vfio.sh
Here's my unbind_vfio.sh

Thank you so much for creating the tutorial, and thank you for your time. I hope I haven't given you too much (or even worse, too little) information. Have a wonderful day.

memfd hugepages apparmor question

Hi, thanks for the excellent guide. I like your elegant libvirt hook solutions and it's all super clear and easy to follow.

I'm having one small problem though and hope you might give me a clue. I can't get hugepages working. It appears to conflict with apparmor.

Host is kubuntu 21.10
Using library: libvirt 7.6.0
Using API: QEMU 7.6.0
Running hypervisor: QEMU 6.0.0
I have to run qemu as my login (aka user 1000) in qemu.conf in order to get qemu-audio working apparently due to other similar apparmor conflicts, this may be part of the issue.

When I add <hugepages/> to the vm's xml section
<memoryBacking> <source type="memfd"/> <access mode="shared"/> </memoryBacking>

Then start the VM, memfd backed hugepages generates a Permission Denied error from apparmor that looks like this:

`
11 22:05:25 hexy audit[14630]: AVC apparmor="DENIED" operation="open" profile="libvirt-15de3735-d0a2-482b-95ce-42ab775ceaa7" name="/proc/sys/dev/i915/perf_stream_paranoid" pid=14630 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0

Nov 11 22:05:25 hexy kernel: audit: type=1400 audit(1636697125.209:59): apparmor="DENIED" operation="open" profile="libvirt-15de3735-d0a2-482b-95ce-42ab775ceaa7" name="/proc/sys/dev/i915/perf_stream_paranoid" pid=14630 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0

Nov 11 22:05:25 hexy audit[14630]: AVC apparmor="DENIED" operation="open" profile="libvirt-15de3735-d0a2-482b-95ce-42ab775ceaa7" name="/etc/pulse/client.conf.d/" pid=14630 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0

Nov 11 22:05:25 hexy audit[14630]: AVC apparmor="DENIED" operation="truncate" profile="libvirt-15de3735-d0a2-482b-95ce-42ab775ceaa7" name="/" pid=14630 comm="qemu-system-x86" requested_mask="w" denied_mask="w" fsuid=1000 ouid=1000

Nov 11 22:05:25 hexy kernel: audit: type=1400 audit(1636697125.213:60): apparmor="DENIED" operation="open" profile="libvirt-15de3735-d0a2-482b-95ce-42ab775ceaa7" name="/etc/pulse/client.conf.d/" pid=14630 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0

[...]

Nov 11 22:05:25 hexy libvirtd[2866]: internal error: qemu unexpectedly closed the monitor: 2021-11-12T06:05:25.217933Z qemu-system-x86_64: failed to resize memfd to 4294967296: Permission denied

Nov 11 22:05:25 hexy systemd[1]: machine-qemu\x2d3\x2dwork.scope: Deactivated successfully.
`

Do you have any ideas or suggestions to try?

Bios version

Thanks for the tutorial, what is your mobo bios version?

kvm.conf creation

Tutorial doesn't say how to find your GPU PCI addresses. Would this work for my configuration? I used "lspci -nn" to show the PCI devices. My VM does not have NVME passthough or USB/GPU serial passthrough. Should I comment out those lines like this?:

Virsh devices

VIRSH_GPU_VIDEO=pci_0000_00_00_0
VIRSH_GPU_AUDIO=pci_0000_00_00_1
#VIRSH_GPU_USB=pci_0000_0a_00_2
#VIRSH_GPU_SERIAL=pci_0000_0a_00_3
#VIRSH_NVME_SSD=pci_0000_04_00_0

Issue with hook script execution.

Good morning!

I have been running through your tutorial on Pop_OS with the following configuration:
Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz (Using integrated graphics)
NVIDIA Corporation GM200 [GeForce GTX 980 Ti]

I followed the instructions exactly with the exception of the last optimization step for multithreaded CPUs. I appear to be running into an issue with the hook scripts, though. Upon hitting "Begin Installation" I get the following error:

Unable to complete install: 'Hook script execution failed: internal error: Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /etc/libvirt/hooks/qemu win10 prepare begin -) unexpected exit status 127: /etc/libvirt/hooks/qemu: line 27: : command not found
'

Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
 callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/createvm.py", line 2089, in _do_async_install
 guest.installer_instance.start_install(guest, meter=meter)
File "/usr/share/virt-manager/virtinst/install/installer.py", line 542, in start_install
 domain = self._create_guest(
File "/usr/share/virt-manager/virtinst/install/installer.py", line 491, in _create_guest
 domain = self.conn.createXML(install_xml or final_xml, 0)
File "/usr/lib/python3/dist-packages/libvirt.py", line 4034, in createXML
 if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirt.libvirtError: Hook script execution failed: internal error: Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /etc/libvirt/hooks/qemu win10 prepare begin -) unexpected exit status 127: /etc/libvirt/hooks/qemu: line 27: : command not found

I'm unsure if I did something wrong as I copy and pasted the scripts from the guide. Is there some consideration I am missing for integrated graphics? They are on separate I/O MMU groups:

IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:191f] (rev 07)
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX 980 Ti] [10de:17c8] (rev a1)
IOMMU Group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GM200 High Definition Audio [10de:0fb0] (rev a1)
IOMMU Group 2 00:02.0 Display controller [0380]: Intel Corporation HD Graphics 530 [8086:1912] (rev 06)
IOMMU Group 3 00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
IOMMU Group 3 00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
IOMMU Group 4 00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
IOMMU Group 5 00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
IOMMU Group 6 00:1b.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [8086:a167] (rev f1)
IOMMU Group 7 00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
IOMMU Group 7 00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
IOMMU Group 7 00:1c.5 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #6 [8086:a115] (rev f1)
IOMMU Group 7 03:00.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
IOMMU Group 7 04:00.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
IOMMU Group 7 04:01.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
IOMMU Group 7 04:02.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
IOMMU Group 7 04:04.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
IOMMU Group 7 07:00.0 USB controller [0c03]: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge] [8086:15b6]
IOMMU Group 7 09:00.0 Ethernet controller [0200]: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller [1969:e091] (rev 10)
IOMMU Group 7 0a:00.0 Network controller [0280]: Intel Corporation Wireless 8260 [8086:24f3] (rev 3a)
IOMMU Group 8 00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
IOMMU Group 8 00:1d.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #13 [8086:a11c] (rev f1)
IOMMU Group 8 0b:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 [144d:a802] (rev 01)
IOMMU Group 9 00:1f.0 ISA bridge [0601]: Intel Corporation Z170 Chipset LPC/eSPI Controller [8086:a145] (rev 31)
IOMMU Group 9 00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
IOMMU Group 9 00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
IOMMU Group 9 00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)

Binding at startup; can't unbind correctly

Using Garuda Linux on laptop with NVidia dual-graphics. It seems there's something in Garuda's configuration that hangs onto the GPU and hangs while detaching.

The vfio-pci approach also wasn't working.

The solution is to add these to /etc/mkinitcpio.conf before the list of video drivers, to load vfio-pci early.

MODULES=(crc32c-intel intel_agp i915 vfio_pci vfio vfio_iommu_type1 vfio_virqfd amdgpu radeon nouveau)

then generate configs with

mkinitcpio -P

Now I boot Linux with an unbound GPU, great!

From here, your bind and unbind scripts run successfully, but I remain unable to use the GPU in Linux.

I run unbind_vfio.sh and then

> nvidia-xconfig --query-gpu-info

Number of GPUs: 1

GPU #0:
Name      : NVIDIA GeForce RTX 2060
UUID      : GPU-5b17f5ee-5f05-94a6-949f-d6a2c92e22d4
PCI BusID : PCI:1:0:0

Number of Display Devices: 1

Display Device 0 (TV-2):
EDID Name             : SAMSUNG
Minimum HorizSync     : 15.000 kHz
Maximum HorizSync     : 81.000 kHz
Minimum VertRefresh   : 24 Hz
Maximum VertRefresh   : 75 Hz
Maximum PixelClock    : 230.000 MHz
Maximum Width         : 1920 pixels
Maximum Height        : 1080 pixels
Preferred Width       : 1920 pixels
Preferred Height      : 1080 pixels
Preferred VertRefresh : 60 Hz
Physical Width        : 1020 mm
Physical Height       : 570 mm

Linux detects the GPU. It even detects the TV plugged via HDMI. For some reason I'm unable to use it. Plugging the HDMI cable, the "switch display" bar appears, but I'm able to switch output. The TV doesn't appear in Display Configuration, and I'm unable to start Optimus Manager.

Any idea how to get the GPU in a functional state from here?

By editing /etc/mkinitcpio.conf as above, everything works normal.

Editing /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0 systemd.unified_cgroup_hierarchy=1 loglevel=3 intel_iommu=on vfio-pci.ids=10de:1f15,10de:10f9"

Rebuilding config

sudo grub-mkconfig -o /boot/grub/grub.cfg

Rebooting.

HDMI output doesn't work (good!).

Running unbind_vfio.sh. Success.

I remain with the behavior described above. GPU doesn't work.

Let's try the VM. From a bound state, the VM runs without error. It only displays "Connecting to graphical console for guest" and fails to take the HDMI output.

Removed the display device, now I get a console with

BdsDxe: loading Boot0003 "Windows Boot Manager" from HD(1,GPT,D0C359A4-6867-44B3-B3E7-D2F1F3644DE7,0x800,0x32000)/\EFI\Microsoft\Boot\bootmgfw.efi
BdsDxe: starting Boot0003 "Windows Boot Manager" from HD(1,GPT,D0C359A4-6867-44B3-B3E7-D2F1F3644DE7,0x800,0x32000)/\EFI\Microsoft\Boot\bootmgfw.efi

qemu.d not showing

So I tried following the tutorial but after I did "sudo service libvirtd restart", the qemu.d folder isn't showing up.

Can only bind graphics card through kernel parameters, not with hooks

Hey there, I was wondering if you could help me at all!

I'll just get the boring details out there way:

Distro: Arch
CPU: Ryzen 5 3600
Primary GPU: Radeon RX 560
Secondary GPU for pass through: Geforce RTX 2070 Super
RAM: 16gb Corsair Vengeance 3000-cl15
Nvidia 440 drivers

#  lspci -nnk -d 10de:1e84 :-
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] [10de:1e84] (rev a1)
		Subsystem: Gigabyte Technology Co., Ltd TU104 [GeForce RTX 2070 SUPER] [1458:3ffc]
		Kernel driver in use: nvidia
		Kernel modules: nouveau, nvidia_drm, nvidia

So the last few days I have gotten the pass through working perfectly by reserving the 2070s for vfio by passing in the bus ids as kernel parameters in grub and in the modules part of mkinitcpio.

This works fine but has the obvious drawback that I am unable to use my RTX card on the host OS. I came across your guide as you have quite a similar setup to mine (yours obviously a bit better :^)) and you mention using hooks to bind and unbind the graphics cards. Unfortunately, this doesn't seem to work for me and I have a few questions regarding this..

Any attempt I make of dynamically unbinding/binding the GPU seems to result in the nvidia driver being unbound ( lspci -nnk -d 10de:1e84 shows no driver is being used) but the scripts seem to hang and never finish (I assume meaning the vfio driver fails to bind the card). Do you have any idea what could be the issue? Any 'gotcha'? I have nothing related to nvidia or vfio in mkinitcpio or the grub config.

The nvidia driver is bound to the card on boot.. I think perhaps this is where I'm going wrong. For example, you mention you use bumblebee to access your secondary card on the host OS. To use my 2070s I use prime-run and Nvidia render offloading. Is there something that bumblebee does (like blacklisting a driver) that would make your VM hooks work and mine not? I have read somewhere that it blacklists nvidia_drm which I am currently using for the render offload, I wondering if you know if I should be blacklisting the nvidia driver on boot and then using something like bumblebee to activate it to play games using the card?

Sorry for so many questions, I'm trying to direct my questioning a bit to take any burden off you replying! ha I genuinely have no idea what to try to get this to work and there seems be no log or error that could show me what is failing in the background for me to troubleshoot...

Regards,
Steve
(Thanks for the guide btw)

EDIT

I'm now convinced it's because I was loading the nvidia driver on boot... I'm going to give bumblebee/nvidia x-run a go and have a play around with that :)

Problems since new Kernels

Hello,

Ubuntut 20.04 and QEMU Virt-Manager - Nested virtualization suddenly doesnt work

Since Yesterday when i want start Windows 10 VM in Ubuntu 20.04 and QEMU 4.2.1 also tryed 5.2 and 6.2 and latest

i get this error:

Code:

qemu-system-x86_64: error: failed to set MSR 0x48f to 0xffffff00036dfb
qemu-system-x86_64: /home/user/qemu-4.2.1/target/i386/kvm.c:2691: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

i have kvm ignore_msrs=1 and so on tryed ON and OFF like before it worked in XML

Code:

Anyone can help it worked now for years but now not anymore since apt update and apt upgrade need downgrade Ubuntu Linux Kernel?

i need CPU Host-Passthrough otherwise Windows 10 has no good performance like in all Windows 10 Gaming VM GPU Passthrough Tutorials stand and also i tested

Hope any one can help here also GRUB

Code:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on kvm.ignore_msrs=1 video=efifb:off vfio-pci.ids=10de:2486,10de:228b,10de:2503,10de:228e isolcpus=3-13,17-27 nohz_full=3-13,17-27 rcu_nocbs=3-13,17-27 msr.allow_writes=on kvm.intel_nested=1"

A and
Ubuntu Linux Kernel
5.11.0-43-generic

and when start MacOS VM also with CPU Host-Passthrough it work witout an Error hmm

Thank ya
Kind Regards

i can't get this working on ubuntu 20.04 stock kernel

i am trying to passthrough my Quadro K5200 but creating vm hangs also
if i try this: sudo virsh nodedev-detach pci_0000_81_00_0 nothing happens and system becomes unstable

sudo lspci | grep -i NVIDIA
03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
81:00.0 VGA compatible controller: NVIDIA Corporation GK110GL [Quadro K5200] (rev a1)
81:00.1 Audio device: NVIDIA Corporation GK110 High Definition Audio Controller (rev a1)

dmesg | grep IOMMU
[ 0.166485] DMAR: IOMMU enabled
[ 0.322075] DMAR-IR: IOAPIC id 3 under DRHD base 0xfbffe000 IOMMU 0
[ 0.322077] DMAR-IR: IOAPIC id 0 under DRHD base 0xc3ffc000 IOMMU 1
[ 0.322079] DMAR-IR: IOAPIC id 2 under DRHD base 0xc3ffc000 IOMMU 1

uname -r = 5.4.0-37-generic

grub settings = GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"

./iommu.sh
..
IOMMU Group 31 80:05.4 PIC [0800]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 IOAPIC [8086:0e2c] (rev 04)
IOMMU Group 32 81:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK110GL [Quadro K5200] [10de:103c] (rev a1)
IOMMU Group 32 81:00.1 Audio device [0403]: NVIDIA Corporation GK110 High Definition Audio Controller [10de:0e1a] (rev a1)
IOMMU Group 33 ff:08.0 System peripheral [0880]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link 0 [8086:0e80] (rev 04)
IOMMU Group 34 ff:09.0 System peripheral [0880]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 QPI Link 1 [8086:0e90] (rev 04)
..

tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
└── WIN10HD
├── prepare
│ └── begin
│ └── bind_vfio.sh
└── release
└── end
└── unbind_vfio.sh

maybe this dosen't work becource vfio now is in kernel?
cat /etc/libvirt/hooks/qemu.d/WIN10HD/prepare/begin/bind_vfio.sh
#!/bin/bash

Load the config file

source "/etc/libvirt/hooks/kvm.conf"

Load vfio

modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

Unbind gpu from nvidia and bind to vfio

virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO

Unbind ssd from nvme and bind to vfio

#virsh nodedev-detach $VIRSH_NVME_SSD

Frozen bug (winter is coming)

First of all, a big bravo for your work.
I think it is not because of your code but when i launch my VM, it crashes my computer.
I try on this configuration (please do not laugh, it's a test) :
-i7-4770S
-ATI Radeon 5550
-16GiB
-Pop OS! 20.04 LTS

IOMMU Enable

dmesg | grep IOMMU
[    0.037190] DMAR: IOMMU enabled
[    0.094272] DMAR-IR: IOAPIC id 8 under DRHD base  0xfed91000 IOMMU 1
root@pop-os:~#

Vitualise ok

root@pop-os:~# dmesg | grep VT-d
[    1.048981] i915 0000:00:02.0: [drm] VT-d active for gfx access

My GPU is isolated

IOMMU Group 12 00:1f.2 RAID bus controller [0104]: Intel Corporation SATA Controller [RAID mode] [8086:2822] (rev 04)
IOMMU Group 12 00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 04)
IOMMU Group 13 02:00.0 PCI bridge [0604]: Texas Instruments XIO2001 PCI Express-to-PCI Bridge [104c:8240]
IOMMU Group 14 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Redwood PRO [Radeon HD 5550/5570/5630/6510/6610/7570] [1002:68d9]
IOMMU Group 14 04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Redwood HDMI Audio [Radeon HD 5000 Series] [1002:aa60]
IOMMU Group 1 00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor

Hook download ok

sudo wget 'https://raw.githubusercontent.com/PassthroughPOST/VFIO-Tools/master/libvirt_hooks/qemu' \
>      -O /etc/libvirt/hooks/qemu
--2020-12-15 21:20:00--  https://raw.githubusercontent.com/PassthroughPOST/VFIO-Tools/master/libvirt_hooks/qemu
Résolution de raw.githubusercontent.com (raw.githubusercontent.com)… 64:ff9b::9765:7885, 151.101.120.133
Connexion à raw.githubusercontent.com (raw.githubusercontent.com)|64:ff9b::9765:7885|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 1010 [text/plain]
Enregistre : «/etc/libvirt/hooks/qemu»
/etc/libvirt/hooks/ 100%[===================>]    1010  --.-KB/s    ds 0,01s   
2020-12-15 21:20:01 (100 KB/s) - «/etc/libvirt/hooks/qemu» enregistré [1010/1010]

Execute :

sudo chmod +x /etc/libvirt/hooks/qemu

No return

Files and directories created

root@pop-os:~# tree /etc/libvirt/hooks/
/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       └── bind_vfio.sh
        └── release
            └── end
                └── unbind_vfio.sh

source "/etc/libvirt/hooks/kvm.conf"

##Virsh devices
VIRSH_GPU_VIDEO=pci_0000_04_00_0
VIRSH_GPU_AUDIO=pci_0000_04_00_1

I followed the rest until part 4 but when i run my vm... it crashes. I have to hard reset my computer.
I don't know where to look.
Can you help me please?

Hugepages was renamed to libhugetlbfs-bin

Hello! I have been following your guide and sudo apt install hugepages did not work for me. According to the following issue, it was renamed to libhugetlbfs-bin.

Link to the GiHub issue where I found this information.

System crashed during VM startup

Hi,

I tried this because I had a dream of being able to do KVM gaming. I have an RX 580, an AMD Ryzen 1600, a Gigabyte GA-A320M-S2H, and 16 GB of RAM. I am using Fedora KDE Plasma if that also helps.

I followed the guide except for the USB and Serial Bus controllers because iommu.sh did not show anything related to those controllers, so I passed through only the Video and Audio controllers from my RX 580, because that's all that has been shown.

Here's the output from iommu.sh:

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 10 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 11 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 11 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 11 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 11 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 11 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 11 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 11 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU Group 11 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 12 01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43bc] (rev 02)
IOMMU Group 12 01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b8] (rev 02)
IOMMU Group 12 01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b3] (rev 02)
IOMMU Group 12 02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 12 02:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 12 02:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 12 02:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 12 05:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
IOMMU Group 12 06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
IOMMU Group 13 07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
IOMMU Group 13 07:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
IOMMU Group 14 08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU Group 15 08:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
IOMMU Group 16 08:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
IOMMU Group 17 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU Group 18 09:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 19 09:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
IOMMU Group 1 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 3 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 5 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 6 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 8 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 9 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]

I also will provide the tree for the libvirt hooks:

/etc/libvirt/hooks/
├── kvm.conf
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       └── bind_vfio.sh
        ├── release
        │   └── end
        │       └── unbind_vfio.sh
        ├── start
        ├── started
        └── stopped

If it helps I had to manually make these, after restarting libvirt.

I passed through my GPU, and my keyboard, mouse and headset, and when I pressed Begin Installation, my screen went blank for 10 minutes (I counted) and I had to hard reset my computer. When I logged back in I got these system errors:

If I had to guess, Fedora got confused on whether it or the VM got my card. I'll also provide the scripts I used (essentially passing through my VGA and Audio busses.)

kvm.conf:

## Virsh devices
VIRSH_GPU_VIDEO=pci_0000_07_00_0
VIRSH_GPU_AUDIO=pci_0000_07_00_1

bind_vfio.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO

unbind_vfio.sh:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO
virsh nodedev-reattach $VIRSH_GPU_AUDIO

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

I tried a similar method to no avail. I have IOMMU on and AMD-V on. I hope my hardware IS capable of doing this, however I've heard 1st gen Ryzen 5's like mine have trouble with virtualization.

Any help would be appreciated. Thanks!

bryansteiner / gpu-passthrough-tutorial Goto Github PK

gpu-passthrough-tutorial's Introduction

Table of Contents

Introduction

Considerations

Hardware Requirements

Hardware Setup

Tutorial

Part 1: Prerequisites

ACS Override Patch (Optional):

Download ISO files (Mandatory):

Part 2: VM Logistics

Part 3: Creating the VM

Part 4: Improving VM Performance

Hugepages

CPU Governor

CPU Pinning

Disk Tuning

Hyper-V Enlightenments

Part 5: Benchmarks

Part 6: Software Licensing Considerations

Credits & Resources

Footnotes

gpu-passthrough-tutorial's People

Contributors

Stargazers

Watchers

Forkers

gpu-passthrough-tutorial's Issues

Virsh devices

Load the config file

Load vfio

Unbind gpu from nvidia and bind to vfio

Unbind ssd from nvme and bind to vfio

Load the config file

Unbind gpu from vfio and bind to nvidia

Unbind ssd from vfio and bind to nvme

Unload vfio

Virsh devices

EDIT

Load the config file

Load vfio

Unbind gpu from nvidia and bind to vfio

Unbind ssd from nvme and bind to vfio

Recommend Projects

Recommend Topics

Recommend Org