Failed to initialize nvml driver library version mismatch

The error appears after installing cuda_11.5.0_495.29.05_linux.run, obviously on a machine with a GPU 😁😁. For an upgrade of the system, I tend to get the whole CUDA package because I feel it’s more reliable than just the drivers. Once you accept the EULA, you are offered an installation menu like this:

CUDA Installer 
- [X] Driver
[X] 495.29.05
+ [X] CUDA Toolkit 11.5
[X] CUDA Samples 11.5
[X] CUDA Demo Suite 11.5
[X] CUDA Documentation 11.5
Options
Install

You can select or deselect the components with the space bar. I don’t want to bloat the system disk, so I selected only the drivers. After the install you get this message, that seems correct

===========
= Summary =
===========

Driver: Installed
Toolkit: Not Selected
Samples: Not Selected

To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

But after a reboot, the drivers somehow don’t end up where they should and the error on the title is given when nvidia-smi is called. We can check the drivers:

## > nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
## > cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.57 DATE
GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

How come we have the 450.57 drivers if we installed the cuda_11.5.0_495.29? Obviously we didn’t choose the right installer option! Again I run the installer, this time selecting the Driver AND the Cuda Toolkit. The summary should look like this

===========
= Summary =
===========

Driver: Installed
Toolkit: Installed in /usr/local/cuda-11.5/
Samples: Not Selected

Please make sure that
- PATH includes /usr/local/cuda-11.5/bin
- LD_LIBRARY_PATH includes
/usr/local/cuda-11.5/lib64, or,
add /usr/local/cuda-11.5/lib64 to
/etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.5/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

And this configuration survives a reboot. The call to nvidia-smi shows the right drivers also. Problem solved, for the moment 😉.

Just a friendly heads up for NVIDIA users that might run into this.

Earlier today I upgraded my NVIDIA driver (which I'm early-loading) from 510.60.02 to 510.68.02. After rebooting, Gnome Shell felt sluggish, and trying to launch a Vulkan game failed. nvidia-smi failed with the message:

Failed to initialize NVML: Driver/library version mismatch

Checking the system log I found these messages from the kernel:

NVRM: API mismatch: the client has the version 510.68.02, but
NVRM: this kernel module has the version 510.60.02.  Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

Solution: updating the initramfs:

# mkinitcpio -P

After that and a reboot, all is back to normal.

I checked pacman's log and it didn't do any post-upgrade hook to update the initramfs. I had thought this was automatic, but now I think it's just because I probably hadn't ever done an NVIDIA driver update without a kernel upgrade at the same time -- and the latter does trigger that hook.

Blame's on me for not reading the wiki carefully:

If added to the initramfs, do not forget to run mkinitcpio every time there is a nvidia driver update

I've now added the extra pacman hook as suggested.

Drivers are an extremely important part of being able to run any sort of hardware on your system. Nvidia drivers are software installed on the system for the purpose of helping to smoothly operate the Nvidia graphics card and help the system access this hardware. In this article, some light will be shed on what can cause this error as well as how we can fix them.

In this section, the possible reasons and solutions for the error will be demonstrated. If you shift your focus to the error statement clearly indicates that there exists an issue between the kernel module and the library. The issue is that their versions are mismatched. This difference in version will invoke the error whenever the Nvidia driver is being utilized.

What Methods are used for resolving “failed to initialize nvml: driver/library version mismatch”?

There is an extremely simple method using which this error can be fixed. The following section will demonstrate what these possible fixes to this issue are.

Solution: Remove the Module and Load a New One

To resolve this problem, remove the module and load a new Nvidia module. Follow the steps below closely to achieve this.

Step 1: Check Kernel Version

Step number 1 involves checking the kernel version. Run the code below to know which Nvidia kernel version is being executed on the system currently:

$ nvidia-smi

Alternatively, the command below can also be used to check the version:

$ modinfo nvidia

Check out the sample snippet below:

Failed to initialize nvml driver library version mismatch

Step 2: Remove the Nvidia Driver

Once the kernel version is known, the current driver will be removed from the system using the command given below:

$ sudo apt purge nvidia*

Look at the following example:

Failed to initialize nvml driver library version mismatch

Step 3: Reinstall the Correct Driver

The final step is to reinstall the driver with the correct version. The perfect version to utilize is also the kernel driver version. For instance, in this case, use the following command to install:

$ sudo apt install nvidia-driver-470 nvidia-settings nvidia-prime

The snippet below shows the installation of the correct driver:

Failed to initialize nvml driver library version mismatch

Through this method, the error should be fixed since now the kernel and driver version should match appropriately.

Conclusion

To fix this Nvidia error, the kernel version of the system needs to be checked and match the corresponding Nvidia driver version. If there is a problem with that then the correct version can be installed after purging the previous version from the system. This post has demonstrated the reason for the error and also stated the solution to fix it.

How to fix Failed to initialize NVML driver library version mismatch?

Rebooting the node is the easiest way to fix the issue. Rebooting the node will make sure that the drivers are properly initialized after the upgrade.

Where is Nvidia SMI installed?

Normally nvidia-smi is stored in %WINDIR%\System32 and another copy in driver store (at least that's how it is with newer drivers).

How do I know what Nvidia driver I have Ubuntu?

How to check NVIDIA driver version on your Linux system.
NVIDIA X server settings. Let's start with the most obvious attempt to find out NVIDIA driver version by running NVIDIA X server settings application from your GUI menu..
System Management Interface. ... .
Check Xorg X server logs. ... .
Retrieve module version..

How do I find my Nvidia driver version?

A: Right-click on your desktop and select NVIDIA Control Panel. From the NVIDIA Control Panel menu, select Help > System Information. The driver version is listed at the top of the Details window. For more advanced users, you can also get the driver version number from the Windows Device Manager.