Bug 5028

Summary: Wiki Howto/CUDA updates to official CUDA repo compatibility info
Product: Infrastructure Reporter: FeRD (Frank Dana) <ferdnyc>
Component: WebsitesAssignee: Nicolas Chauvet <kwizart>
Status: RESOLVED FIXED    
Severity: enhancement CC: lxtnow, matthias
Priority: P1    
Version: NA   
Hardware: x86_64   
OS: GNU/Linux   
namespace:

Description FeRD (Frank Dana) 2018-09-20 03:38:54 CEST
Nvidia have gone and changed things around in their official CUDA repository, making some of https://rpmfusion.org/Howto/CUDA out of date.

They've released cuda-10-0 (the metapackage) alongside (and requiring) new 410.48 driver packages, several of which have been renamed:

xorg-x11-drv-nvidia-* => nvidia-driver-*
libXNVCtrl-*            => nvidia-libXNVCtrl-*

As a result, the suggested excludes in ยง "Which driver Package" (huh, odd capitalization, just noticed that) are insufficient.

This:
===============
Both "CUDA" and "RPM Fusion" repositories provide the nvidia driver packages. Usually, the package provided by RPM Fusion is higher. But in case you want to avoid the risk, add this:

#/etc/yum.repos.d/cuda.repo
[cuda]
name=cuda
...
exclude=xorg-x11-drv-nvidia*,akmod-nvidia*,kmod-nvidia*,nvidia-settings,nvidia-xconfig,nvidia-persistenced
==============

Should be changed to:
===============
Both "CUDA" and "RPM Fusion" repositories provide the nvidia driver packages. Usually, the package provided by RPM Fusion is higher. But in case you want to avoid the risk, add this:

#/etc/yum.repos.d/cuda.repo
[cuda]
name=cuda
...
exclude=nvidia-driver*,xorg-x11-drv-nvidia*,akmod-nvidia*,kmod-nvidia*,nvidia-libXNVCtrl*,nvidia-settings,nvidia-xconfig,nvidia-persistenced
==============

(The old xorg-x11-drv-nvidia* match needs to remain as well, at least for now, because the 396.xx drivers are still there under the old package names.)

If nvidia-driver* or nvidia-libXNVCtrl* isn't excluded, dnf will attempt to obsolete the corresponding rpmfusion packages with the ones from the cuda repo.
Comment 1 Nicolas Chauvet 2018-09-20 10:25:17 CEST
I've managed to install the cuda-10.0 meta-package using;

dnf download cuda-drivers-410.48-1 cuda-runtime-10.0
dnf install cuda-10.0

I've forwarded the issue at:
https://devtalk.nvidia.com/default/topic/1041838/cuda-setup-and-installation/cuda-repo-issue-with-nvidia-driver-on-fedora-and-el/
Comment 2 FeRD (Frank Dana) 2018-09-20 16:15:34 CEST
Actually it looks like you only need to force cuda-drivers-410.48-1, after that cuda-runtime-10-0 will install normally.

$ sudo rpm -Uvh ./cuda-drivers-410.48-1.x86_64.rpm --nodeps
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:cuda-drivers-410.48-1            ################################# [100%]

$ sudo dnf install cuda-runtime-10-0
Dependencies resolved.
================================================================================
 Package                    Arch          Version             Repository   Size
================================================================================
Installing:
 cuda-runtime-10-0          x86_64        10.0.130-1          cuda        6.1 k
Installing dependencies:
 cuda-cublas-10-0           x86_64        10.0.130-1          cuda         41 M
 cuda-cudart-10-0           x86_64        10.0.130-1          cuda        148 k
 cuda-cufft-10-0            x86_64        10.0.130-1          cuda         75 M
 cuda-curand-10-0           x86_64        10.0.130-1          cuda         43 M
 cuda-cusolver-10-0         x86_64        10.0.130-1          cuda         57 M
 cuda-cusparse-10-0         x86_64        10.0.130-1          cuda         36 M
 cuda-libraries-10-0        x86_64        10.0.130-1          cuda        6.3 k
 cuda-license-10-0          x86_64        10.0.130-1          cuda         23 k
 cuda-npp-10-0              x86_64        10.0.130-1          cuda         80 M
 cuda-nvgraph-10-0          x86_64        10.0.130-1          cuda         49 M
 cuda-nvjpeg-10-0           x86_64        10.0.130-1          cuda        361 k
 cuda-nvrtc-10-0            x86_64        10.0.130-1          cuda        8.3 M

Transaction Summary
================================================================================
Install  13 Packages

Total download size: 389 M
Installed size: 649 M
Is this ok [y/N]: n
Operation aborted.


$ sudo dnf install cuda-10-0        
Dependencies resolved.
================================================================================
 Package                            Arch        Version         Repository
                                                                           Size
================================================================================
Installing:
 cuda-10-0                          x86_64      10.0.130-1      cuda      6.1 k
Installing dependencies:
 cuda-command-line-tools-10-0       x86_64      10.0.130-1      cuda      6.2 k
 cuda-compiler-10-0                 x86_64      10.0.130-1      cuda      6.1 k
 [...etc, etc...]
 cuda-tools-10-0                    x86_64      10.0.130-1      cuda      6.0 k
 cuda-visual-tools-10-0             x86_64      10.0.130-1      cuda      6.8 k

Transaction Summary
================================================================================
Install  49 Packages

Total download size: 1.7 G
Installed size: 3.0 G
Is this ok [y/N]: y
[...snip...]
Complete!


Of course (re: your bug report @nvidia), they're probably going to say that forcing it like this is invalid, the obsoletes are intentional, using CUDA 10.0 with the 396.54 drivers is not a supported configuration, using CUDA with the drivers from a different repo is not a supported configuration, blah blah blah. :-/

Even though, after running

$ cuda-install-samples-10.0.sh $HOME/cuda/
$ cd $HOME/cuda/NVIDIA_CUDA-10.0_Samples/
$ HOST_COMPILER=cuda-g++ make all

...it *seems* to work fine. (The compile hasn't gotten through all of the examples yet, but so far I've only gotten some code-related warnings, unused variables and etc. Nothing to do with the hardware drivers/APIs.)


That reminds me: Should we consider taking a page from negativo17, and packaging cuda-gcc(-c++) held to a version that's compatible with Nvidia's CUDA repo, since it seems Fedora's gcc is destined to be endlessly too-recent? 

Nvidia don't provide any compiler, Fedora doesn't provide a (compatible) compiler, and AFAICT rpmfusion doesn't provide a compatible compiler. I've been cherry-picking cuda-gcc and cuda-gcc-c++ from the fedora-nvidia repo negativo17 maintains, but I can't keep that enabled because it's kind of overstuffed and the packages conflict with BOTH rpmfusion and Nvidia's CUDA repo.

See: https://negativo17.org/nvidia-driver/
and particularly https://negativo17.org/nvidia-driver/#tablepress-4
Comment 3 FeRD (Frank Dana) 2018-09-20 16:34:58 CEST
(In reply to FeRD (Frank Dana) from comment #2)
>
> That reminds me: Should we consider taking a page from negativo17, and
> packaging cuda-gcc(-c++) held to a version that's compatible with Nvidia's
> CUDA repo, since it seems Fedora's gcc is destined to be endlessly
> too-recent? 
> 
> Nvidia don't provide any compiler, Fedora doesn't provide a (compatible)
> compiler, and AFAICT rpmfusion doesn't provide a compatible compiler.

Although, now that I think about it, would cuda-gcc-7.3.0 be eligible for Fedora's repo? Assuming it (and its contents) were renamed like that, and it didn't cause any conflicts with the default gcc 8.0 package(s). Maybe rpmfusion is the wrong place for that, anyway.
Comment 4 Nicolas Chauvet 2018-09-20 16:36:43 CEST
(In reply to FeRD (Frank Dana) from comment #3)
> (In reply to FeRD (Frank Dana) from comment #2)
...
> Although, now that I think about it, would cuda-gcc-7.3.0 be eligible for
> Fedora's repo? Assuming it (and its contents) were renamed like that, and it
> didn't cause any conflicts with the default gcc 8.0 package(s). Maybe
> rpmfusion is the wrong place for that, anyway.
It can also be done as a copr repo.
But from my side I don't have any issue to use the gcc from devtoolkit-7.
Comment 5 Nicolas Chauvet 2018-09-20 16:50:00 CEST
(In reply to Nicolas Chauvet from comment #4)
> (In reply to FeRD (Frank Dana) from comment #3)
> > (In reply to FeRD (Frank Dana) from comment #2)
> ...
> > Although, now that I think about it, would cuda-gcc-7.3.0 be eligible for
> > Fedora's repo? Assuming it (and its contents) were renamed like that, and it
> > didn't cause any conflicts with the default gcc 8.0 package(s). Maybe
> > rpmfusion is the wrong place for that, anyway.
> It can also be done as a copr repo.
> But from my side I don't have any issue to use the gcc from devtoolkit-7.

Also a better way forward may be to use modules
https://docs.fedoraproject.org/en-US/modularity/using-modules/
Comment 6 Nicolas Chauvet 2023-01-05 21:31:25 CET
Closing as fixed (now the nvidia driver in repo uses modules).