Installing the NVIDIA software on an ESXi host and configuring for vGPU usage

I have been busy in the lab with testing our VR workload within a VM and then streaming the output to a head-mounted display. Last week I received a nice new shiny NVIDIA RTX6000 to use in my Dell Precision workstation. I received a passively cooled RTX8000 at first, by mistake that is. And the workstation wouldn’t boot as it doesn’t support that card, great in hindsight as would have probably been overheated fast considering the lack of airflow in my home office. After adding the RTX6000 to my machine it booted and I had to install the NVIDIA vib on ESXi. I also had to configure the host accordingly. I did it through the command-line as that was the fastest for me. I started with copying the vib file to /tmp/ on the ESXi host using scp and then did the following:

esxcli system maintenanceMode set –e true
esxcli software vib install –v /tmp/NVIDIA**.vib
esxcli system maintenanceMode set –e false
reboot

The above places the host in maintenance mode, installs the vib, removes the host from maintenance mode and then reboots it. The other thing I had to do, as I am planning on using vGPU technology, is to set the host by default to “Shared Direct – Vendor shared passthrough graphics”. You can also do this through the command-line as follows:

esxcli graphics host set --default-type SharedPassthru

You can also set the assigned policy:

esxcli graphics host set --shared-passthru-assignment-policy <Performance | Consolidation>

I configured it set to “performance” as for my workload this is crucial, it may be different for your workload though. In other to ensure these changes are reflected in the UI you will either need to reboot the host, or you can restart Xorg the following way:

/etc/init.d/xorg stop
nv-hostengine -t
nv-hostengine -d
/etc/init.d/xorg start

That is what it took. I realized after the first reboot I could have configured the host graphics configuration and changed the default policy for the passthrough assignment first probably and then reboot the host. That would also avoid the need to restart Xorg as it would be restarted with the host.

If there’s a need for it, you can also change the NVIDIA vGPU scheduler being used. There are three options available: “Best Effort”, “Equal Share”, and “Fixed Share”. Using esxcli you can configure to use a particular scheduler. This is also documented here. I set my host to Equal Share with a 1 milisecond time slice, which you can do as shown below.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x00010001"

And for those who care, you can see within vCenter which VM is associated with which GPU, but you can also check this via the command-line of course:

esxcli graphics vm list

And the following command will list all the devices present in the host:

esxcli graphics device list

On twitter I was just pointed to a script which lists the vGPU vib version across all hosts of your vCenter Server instances. Very useful if you have a larger environment. Thanks Dane for sharing.

Comments

MikeC #ITBloke says

23 January, 2020 at 13:43

Any reason to take the machine out of maint mode on the first reboot?
Since it’s rebooting anyway I would have waited till after the passthrough assignment.
Not needed in your worklab possibly but real life can be a bit more finicky.
- Duncan says
  
  23 January, 2020 at 13:59
  
  When it is a cluster running workloads I would take it out of maintenance after the reboot indeed. In this case it was an environment with no VMs running yet. Hence I didn’t care much. Valid comment though, thanks!
Vandrey Trindade says

24 January, 2020 at 16:08

Great! Thanks!
Bouke Groenescheij says

29 January, 2020 at 17:27

Great post! But the real question is: can it run Beat Saber?
- Duncan Epping says
  
  29 January, 2020 at 18:21
  
  we will find out soon 🙂

Related

Reader Interactions

Comments