Hi, this is Wayne again with a topic “VFIO is Back! Back AGAIN! Our Killer VFIO Build + Setup Guide”.
So I’m back – and this is vfio or pass through I’ve written some new guides and I’m also putting some information together. But I’m also kind of helping AMD understand this use case a little bit better. We’Re entering an age where it really does make sense to be able to run multiple operating systems on a single GPU. The way that a computer CPU can run virtual machines, which has been hugely handy and beneficial.
The Enterprise has been able to run virtual jobs on gpus from different customers for years, but it’s never really quite materialized in a desktop computer context. Okay, it actually has for machine learning and some other special applications like that, but not for what we’re doing exactly here. I love this.
I’Ve been running this way on my personal threadripper system, gen threadripper and it’s been running flawlessly because it’s got its warts. I tend not to want to touch it and there have definitely been some warts, especially around GPU reset, and you can’t just buy arbitrary hardware, and even this Hardware is not perfect, but let’s take a closer look and dive in foreign. It’S going to be based around the 7950x 16 core CPU from AMD. Yes, this is a sponsored, build brought to you by AMD and Corsair, although I’m actually doing the video for free, because I want to explore vfio, there are dozens of us literally dozens.
There have been some reports about problems with 7000 series, gpus four GPU pass-through for the use cases, so I’m going to take care of my Linux audience and you can win my computer there’s a forum thread that has the full how-to to walk you through how you Can set up GPU pass through where the 7000 series gpus there are a lot more details that you have to take into account, because this kind of thing is really meant for Enterprise technology and that we can experience it with our desktop 7950 means that we need To tweak some options in BIOS tweaks from software options and really thread the needle here, but first let’s do our build with all of our Corsair gear. For the gpus. Don’T worry. I’Ve got quite the selection of gpus. We’Ve got a Merc 319 Radeon 6950 XT wait.
69.50, you can still buy these. You can mix six and seven thousand series. Maybe you want to run dual 7000 series.
I’Ve Got The Power color hellhound 7900 XTX. I’Ve got the AMD reference at 7900 XTX, the 7900 XT, the Tai Chi white from ASRock. We’Ve got the OC formula from ASRock and even more gpus that we’re testing, because I want to get to the bottom of it for the build itself, the case a Corsair 700d, because it’s got the airflow, it’s a full tower case. It’S a two-person lift, and it really is Corsair – has done nothing to make this case lighter literally, nothing it but weighs a ton and it’s empty. I’M gon na have to double box this. When I mail it we’ll just set it there for now: 64 gigabytes of Vengeance, RGB ddr5, 3m.2, 500 gigs for our host Linux operating system and two two terabyte nvme for our pass-through operating systems systems.
Yes, two VMS running on this single machine. Why? Because we can do it, Corsair also sent about a thousand dollars of fans at a 1200 watt power supply thanks, Corsair and AMD for making this possible. Let’S get started now for the motherboard, we’re going to be using the ASRock Tai Chi x670e, because this board is particularly well suited for what we’re trying to do. Asrock historically has taken pretty good care of Linux users and whenever they make a mistake, you can usually email them and get support to fix it.
Then they send you a bios and everything’s good. I personally use the x670e for this build already, and it’s worked out. Pretty well now, if you need something with a little more horsepower, I do have a threadripper version of this build in another video. You can check that out. It’S pretty awesome, but it’s really impressive you’re able to do this with a 7950 and have a reasonable experience. Now, by way of a full disclaimer, what we’re doing here with the build is for very Advanced users, even though I make it seem Plug and Play It’s not really actually plug and play, and there are some caveats some some land mines you can trip over. Updating your system and changing the software, even Windows, update or driver updates can introduce instability into the system. It’S really cool when you get it working, and I really do think this is the future of computing, because you can take all the cruft of the past and stuff it into a nice little box and then all of the cruft awful Badness can live in the Box, while you Embrace new technology and new techniques and new security for the new thing, because I I don’t trust any of this stuff and so being able to securely say: oh okay, all my games live here with the de Nuvo and the DRM and all the Other stuff – and then I can do other more important things over here and not be worried about.
You know some company trying to monetize. What’S on my computer to make a few extra bucks, I I’m guaranteed that’s not happening because of how this works. Architecturally. It’S just cool AF to be able to run a gaming virtual machine, pass through real GPU hardware and get reasonable performance.
It’S the future. I think, but just be aware of the potential complications. It’S not the land of milk and honey. Necessarily I mean there’s milk and honey, but there’s also ground up insect Parts in the honey.
Sometimes it’s not super amazing, but most of the time it is pretty amazing. All the ssds that we’re using for this are Corsair mp600 Pro NH they’re up to 6.6 gigabytes per second. They are all PCI Express, gen, 4.. Every one of them is very, very fast, 500 gigs for our main boot operating system, and then two two terabyte drives one for each Windows: Virtual Machine, I’m gon na put one of the two terabyte drives in a slot that is closest to the CPU.
That’S connected to the CPU, because it’s going to be really high performance, the other two terabyte SSD, I’m going to put at the front edge of the motherboard. This layout is pretty much the perfect layout. I think Linux doesn’t care. Linux will be perfectly happy running through the chipset.
It’S just a question of is your m.2 connected to the CPU or through the chipset, and a bunch of things can connect through the chipset and so that’s kind of shared bandwidth. But for this build it’s not really going to matter too much now. Normally it’s a good idea to go ahead and install your memory and your CPU cooling and all of that, but I’m anxious to actually get this mounted in a case and with CPU cooling memory. Even before I add the gpus, I can go ahead and set up Linux and sort of do some of the pre-configuration, but I’m gon na need a little more room all right.
The first thing to understand if you’re going to embark on this journey is the BIOS version matters a lot partly because of the vendor, but also because of the aquiza version from AMD. The frenegies of versions can cause different behaviors, so we’re on 1.24 point. This version on screen on our ASRock x670e now the other thing that you’re going to want to do is enable iommu and some other options that have to do with pass-through. The Linux kernel is really cool it’ll.
Let you override some of this in software, but the software override is Never As Good As the hardware and trying to do a software override when your Hardware is not, you know, supporting all the options can actually lead to some of the reset issues. It’S like. Oh, this isn’t showing up the way that I need to with Iowa Memu group separation I’ll just force an override that can actually create more problems. So if you can do it in Hardware, you should always do it in Hardware. We are going to enable resize bar you can disable it if you’re having troubleshooting, because resize bar doesn’t necessarily work out of the box. There’S an entry for that in the how-to. We’Ll talk a little bit more about that. Well, at least in a how-to.
We will I’m also setting the decoding limit to one terabyte, which might not actually be because we’ve got two gpus installed. The motherboard is going to want to try to use the top GPU by default, so we also want to change the GPU priority. We actually want to use the built-in GPU as a priority. Not every motherboard has options like this, but this is what we need to use for this setup. The i o mmu option must explicitly be on enabled enabled it says uh a lot of the times. This is on auto and if you read the text it says Auto is enabled, but that’s not actually true. Imu Auto is a partial enablement on a lot of motherboards all right with our bone stock, Ubuntu installation, yeah Ubuntu, is kind of easy mode.
We’Re just going to install the virtual machine manager package, which we had a lot of dependencies for us. Basically, for this part, just follow the guide on the level one text Forum there’s more steps here than I’m putting in the video, but it is step by step in the how-to guide on the Forum. You might be thinking because this is Linux. There’S no option for an external microphone, but we can actually just pass that through as a USB device Works totally fine and so the Corsair software will pick up the commander core and Elite LCD in the virtual machine. It works fine.
When you do tell the Linux kernel to reorganize the pcie Base address registers, it tries, but then you don’t get any graphical environment even on a GPU. I’M not really sure why this is. But this is something I’m going to work on between this part and the next part of this video we’ve sort of done. Our pre-testing everything’s working great with our pre-testing 64 gigabytes of memory, our 7900 XTX, our primary virtual machine GPU, our igpu, all that’s working! Now.
We need to add our second GPU there’s, not enough room. Corsair fortunately, has a fix for that and that’s this cable. I can probably get it to fit by turning both gpus vertically, but for now I’m just going to use the 6800, which does have a USB port USBC. Now, sometimes your cards will have a USB port.
Sometimes not it’s for VR. You can use USBC, but it’s also handy in this use case, because we can pass through the USB controller, that is on the GPU, along with the virtual machines like which USB port goes with, which virtual machine I just use the same USB port, that’s physically. On a GPU, I’m obviously not quite finished, with our build, I’ve got to put the sides back on. I’Ve got to figure out how to juggle multiple massive gpus.
In this case, it is actually a little easier to stack 2gpus one on top of another, rather than use our flat ribbon cable, but I think there’s some combination of gpus that I can find now for this system. I’M actually going to juggle in several different sets of gpus, not just the ones that we’re going to be giving away. So the final gpus that are in the system might change a little bit by the time that the giveaway is over.
In about two to three weeks, um, probably a little more than three weeks actually from the time this video is launched, so we’ve got a little time with the system. My goal is to gather metrics and data on gpus that have reset problems and if the workarounds that I found are good for those gpus on this platform, now to be sure, there are a lot of land mines and pitfalls, and things that you can trip over. You can get your GPU into a state where it doesn’t want to reset it’s crashed and the Arcana to get to GPU to reset it’s basically unknown, but restarting the system does actually clear it. I found that to be less of a problem on the 7000 series. Gpus with the platform tweaks versus the 6000 series gpus but understand, what’s happening under the hood, this is not really so much that AMD has regressed.
It’S. The this use case is not being tested with the changes that come from supporting things like cxl, faster pcie, signaling rate and a lot of the other new features you see. Pcie is not really a static specification and what we have with the Zen 4 cores making way to Zen five and Enterprise has support in our Zen 4 cores uh. There are edge cases when they when they bring up a server platform and then they hand it to the desktop folks.
It’S a whole other process, and so these are some of the landmines early adopters like we’re. Basically, an early adopter use case for this and the landmines that we’ve tripped over uh sort of inform what goes into the Next Generation. Historically, there haven’t been enough of us to really get anyone’s attention, but that’s really been changing, but it’s also driven some of the complexity like this was actually easier to get working five years ago, because PCI Express was actually simpler five years ago. So if you do decide to undertake this project, understand that it’s going to be a little complicated, you’re going to have some eyes or dots some teeth across them and you may get into an impossible situation and even bios updates May create instability. But this kind of thing is the future: okay, not necessarily running multiple gpus in the same system, but being able to share multiple operating systems on a host. Pc is absolutely our security future and there’s no better way to sandbox games with really insane DRM or even just spyware, that they bundle in that isolated inside a virtual machine, so details of all of that and the step-by-step instructions to reproduce What I’ve done here is On the level one forums, it’ll walk you through step by step, recreating something like this, assuming that you’re using the same or similar Hardware, but on your own Hardware, again huge thanks to Corsair for providing a ton of parts and for AMD for Pro for providing a Rest to the parts for this system – power, color, XFX ASRock for our motherboard ASRock for the gpus – that we’re testing got a lot of support from a lot of vendors. I know it’s an edge use case, but I’m going to try to do better to take care of users.
Doing this so be sure to check out our other content on this on the level one Linux Channel and a little bit more about a call to arms. If you’re, if you’re in the vfio space and you’ve, been doing vfio for a long time, hello and welcome, thank you for coming along. It’S uh, it’s good to have you and let’s organize a little bit so that we can more effectively use our time and resources to bring this thing forward, make a little bit more noise when things like game, DRM or buggy. Hardware negatively impacts us because the future is coming soon. I’M wonderless at level one I’m signing out and you can find me in the level one forums. .