Bug 5741

Summary: Nvidia module not found - falling back to nouveau
Product: Fedora Reporter: Kevin <kmg952>
Component: nvidia-390xx-kmodAssignee: Henrik Nordström <henrik>
Status: RESOLVED INVALID    
Severity: major CC: kmg952
Priority: P1    
Version: f32   
Hardware: x86_64   
OS: GNU/Linux   
namespace:
Attachments: nvidia-bug-report.sh

Description Kevin 2020-09-04 05:13:32 CEST
The nvidia is not being loaded on boot, I'm getting the message "Nvidia module not found - falling back to nouveau".

This is after a catastrophic failure that required a full rebuild from bare metal. Prior to the failure, the whole dnf update / nvidia thing worked perfectly.

Distro: Fedora 32

# uname -a
Linux <host name> 5.8.4-200.fc32.x86_64 #1 SMP Wed Aug 26 22:28:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# modprobe -v nvidia
insmod /lib/modules/5.8.4-200.fc32.x86_64/kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz
insmod /lib/modules/5.8.4-200.fc32.x86_64/kernel/drivers/char/ipmi/ipmi_devintf.ko.xz
insmod /lib/modules/5.8.4-200.fc32.x86_64/extra/nvidia-390xx/nvidia.ko

# lsmod | grep nvidia
nvidia              15835136  0
ipmi_msghandler       118784  2 ipmi_devintf,nvidia

# ls -l /lib/modules/5.8.4-200.fc32.x86_64/extra/
drwxr-xr-x   2 root root 4096 Aug 31 11:59 bbswitch
drwxr-xr-x.  9 root root 4096 Aug 30 13:36 drivers
drwxr-xr-x. 13 root root 4096 Aug 30 13:36 fs
drwxr-xr-x. 12 root root 4096 Aug 30 13:36 net
drwxr-xr-x   2 root root 4096 Sep  4 09:54 nvidia-390xx

# ls -l /lib/modules/5.8.4-200.fc32.x86_64/extra/nvidia-390xx/
-rw-r--r-- 1 root root    93176 Sep  4 09:54 nvidia-drm.ko
-rw-r--r-- 1 root root 22192400 Sep  4 09:54 nvidia.ko
-rw-r--r-- 1 root root  1407088 Sep  4 09:54 nvidia-modeset.ko

# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.8.4-200.fc32.x86_64 root=/dev/mapper/fedora_lvm-root ro resume=/dev/mapper/fedora_lvm-swap rd.lvm.lv=fedora_lvm/root rd.lvm.lv=fedora_lvm/swap rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1

HELP!!! Please, help.
Kevin
Comment 1 leigh scott 2020-09-04 09:44:45 CEST
Try

sudo dracut --omit-drivers nouveau -f

rebbot
Comment 2 Kevin 2020-09-04 10:39:33 CEST
Why would that work? If the nvidia module cannot be found wouldn't deleting the nouveau module prevent the system from booting (into X mode)?
Comment 3 leigh scott 2020-09-04 10:52:33 CEST
(In reply to Kevin from comment #2)
> Why would that work? If the nvidia module cannot be found wouldn't deleting
> the nouveau module prevent the system from booting (into X mode)?

Your first post indicated it was loaded

# lsmod | grep nvidia
nvidia              15835136  0
ipmi_msghandler       118784  2 ipmi_devintf,nvidia


Nouveau blocks nvidia from working.

TBH you haven't posted much useful info, run


/usr/bin/nvidia-bug-report.sh


and attach the file to your next post.
Comment 4 Kevin 2020-09-05 01:32:41 CEST
Created attachment 2223 [details]
nvidia-bug-report.sh

"Your first post indicated it was loaded" - no, that is not the case. That lsmod was after the modprobe - it was that modprobe that loaded the module. If the module was already loaded, modprobe would have done nothing.

I did the dracut and rebooted as you requested. Unfortunately, no change! I have also run nvidia-bug-report.sh, but prior to doing so I did:

dnf erase xorg-x11-drv-nvidia-390xx akmod-nvidia-390xx
dnf install xorg-x11-drv-nvidia-390xx akmod-nvidia-390xx

and then rebooted (yes, I waited for the system to quiesce) just to ensure a clean environment. The log file is attached. BTW, I got the following:

Running nvidia-bug-report.sh...ls: cannot access '/proc/driver/nvidia/./gpus/': No such file or directory

but that's probably understandable under the circumstances.

Sorry for any inconvenience caused by the omission of the nvidia-bug-report.sh output.
Comment 5 leigh scott 2020-09-05 02:29:24 CEST
Intel bios is loaded


[    89.652] (--) PCI:*(0@0:2:0) 8086:0166:1043:1477 rev 9, Mem @ 0xf7400000/4194304, 0xd0000000/268435456, I/O @ 0x0000f000/64, BIOS @ 0x????????/131072
[    89.652] (--) PCI: (1@0:0:0) 10de:0de9:1043:1477 rev 161, Mem @ 0xf6000000/16777216, 0xe0000000/268435456, 0xf0000000/33554432, I/O @ 0x0000e000/128, BIOS @ 0x????????/524288


and is the active VGA


[    89.874] (II) Module glx: vendor="X.Org Foundation"
[    89.874] 	compiled for 1.20.8, module version = 1.0.0
[    89.874] 	ABI class: X.Org Server Extension, version 10.0
[    89.874] (==) Matched modesetting as autoconfigured driver 0
[    89.874] (==) Matched fbdev as autoconfigured driver 1
[    89.874] (==) Matched vesa as autoconfigured driver 2
[    89.874] (==) Assigned the driver to the xf86ConfigLayout
[    89.874] (II) LoadModule: "modesetting"


   63.228113] bbswitch: loading out-of-tree module taints kernel.
[   63.228848] bbswitch: module verification failed: signature and/or required key missing - tainting kernel
[   63.229891] bbswitch: version 0.8
[   63.230652] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[   63.231417] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[   63.232215] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200528/nsarguments-59)
[   63.233139] bbswitch: detected an Optimus _DSM function
[   63.233974] pci 0000:01:00.0: enabling device (0000 -> 0003)
[   63.234839] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[   63.237772] bbswitch: disabling discrete graphics


bbswitch has disabled discrete graphics, probably choked on nouveau :-)


nouveau is loaded, nvidia can't load if it's present.
The blacklisting hasn't stopped nouveau loading, purging nouveau from initramfs image will fix it.

i915 2629632 25 - Live 0xffffffffc0643000
i2c_algo_bit 16384 2 nouveau,i915, Live 0xffffffffc04a2000
crct10dif_pclmul 16384 1 - Live 0xffffffffc04a8000
crc32_pclmul 16384 0 - Live 0xffffffffc04ef000
crc32c_intel 24576 6 - Live 0xffffffffc0487000
drm_kms_helper 258048 2 nouveau,i915, Live 0xffffffffc0603000
cec 61440 2 i915,drm_kms_helper, Live 0xffffffffc05f3000
ghash_clmulni_intel 16384 0 - Live 0xffffffffc0500000
uas 32768 0 - Live 0xffffffffc04f4000
serio_raw 20480 0 - Live 0xffffffffc046a000
drm 622592 22 nouveau,ttm,i915,drm_kms_helper, Live 0xffffffffc0534000
Comment 6 Kevin 2020-09-05 02:44:59 CEST
"purging nouveau from initramfs image will fix it" - didn't the dracut do that.

$ dracut --list-modules /boot/initramfs-5.8.4-200.fc32.x86_64.img|grep nou

gives no output
Comment 7 Kevin 2020-09-05 08:40:20 CEST
I've just tried this:

# dracut --omit-drivers nouveau --add-drivers nvidia test.img
# dracut --list-modules test.img|grep nvid
# dracut --list-modules test.img|grep nou

I didn't boot with that file as I don't think it would be worth it.
Comment 8 leigh scott 2020-09-05 15:30:19 CEST
'dracut --list-modules' lists dracut modules not kernel modules


You need to use to view kernel modules

sudo lsinitrd |grep nouveau
Comment 9 Kevin 2020-09-06 01:55:17 CEST
# lsmod|grep nvid
# modprobe -v nvidia
insmod /lib/modules/5.8.4-200.fc32.x86_64/kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz 
insmod /lib/modules/5.8.4-200.fc32.x86_64/kernel/drivers/char/ipmi/ipmi_devintf.ko.xz 
insmod /lib/modules/5.8.4-200.fc32.x86_64/extra/nvidia-390xx/nvidia.ko 
# lsmod|grep nvid
nvidia              15835136  0
ipmi_msghandler       118784  2 ipmi_devintf,nvidia
# dracut --omit-drivers nouveau --add-drivers nvidia test.img
# dracut --list-modules|egrep 'nou|nvid'
<nothing>
Comment 10 Kevin 2020-09-16 01:15:43 CEST
anybody?
Comment 11 Nicolas Chauvet 2020-11-30 21:52:46 CET
bbswitch might put the device into a unkown state in which it's not possible to recover. This method isn't advertised by nvidia on purpose and not supported by us.


Don't use bbswich on your next device.