Bug 4413

Summary: A nvidia module isn't generated right after kernel generation.
Product: Fedora Reporter: dekomori9854
Component: nvidia-340xx-kmodAssignee: Przemysław Palacz <pprzemal>
Status: RESOLVED FIXED    
Severity: enhancement CC: audrey, dekomori9854, kwizart, rick.h.ns
Priority: P1    
Version: f28   
Hardware: x86_64   
OS: GNU/Linux   
namespace:
Attachments: x has not started right.
after dracut /boot/initramfs-$(uname -r).img $(uname -r) operation
lsinitrd initramfs-4.8.16-300.fc25.x86_64.img
kernel 4.9.2-200 nvidia-bug-report
initramfs-4.9.2-200.fc25.x86_64.img
/var/cache/akmods/akmods.log logfile
/var/cache/akmods/akmods.log
journalctl |grep akmod

Description dekomori9854 2017-01-12 08:50:17 CET
Created attachment 1718 [details]
x has not started right.

The version of the kernel rises and new nvidia doesn't function right in akmod-nvidia340xx.
When the following operation is added, I recover.

# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nvidia.img

# dracut /boot/initramfs-$(uname -r).img $(uname -r)

# reboot

env:
$ rpm -qa *\nvidia\* *\kernel\*|sort;uname -r
abrt-addon-kerneloops-2.9.0-1.fc25.x86_64
akmod-nvidia-340xx-340.101-2.fc25.x86_64
kernel-4.8.14-300.fc25.x86_64
kernel-4.8.15-300.fc25.x86_64
kernel-4.8.16-300.fc25.x86_64
kernel-core-4.8.14-300.fc25.x86_64
kernel-core-4.8.15-300.fc25.x86_64
kernel-core-4.8.16-300.fc25.x86_64
kernel-devel-4.8.14-300.fc25.x86_64
kernel-devel-4.8.15-300.fc25.x86_64
kernel-devel-4.8.16-300.fc25.x86_64
kernel-headers-4.8.16-300.fc25.x86_64
kernel-modules-4.8.14-300.fc25.x86_64
kernel-modules-4.8.15-300.fc25.x86_64
kernel-modules-4.8.16-300.fc25.x86_64
kernel-modules-extra-4.8.14-300.fc25.x86_64
kernel-modules-extra-4.8.15-300.fc25.x86_64
kernel-modules-extra-4.8.16-300.fc25.x86_64
kernel-tools-libs-4.8.16-300.fc25.x86_64
kmod-nvidia-340xx-4.8.15-300.fc25.x86_64-340.101-2.fc25.x86_64
kmod-nvidia-340xx-4.8.16-300.fc25.x86_64-340.101-2.fc25.x86_64
libreport-plugin-kerneloops-2.8.0-1.fc25.x86_64
xorg-x11-drv-nvidia-340xx-340.101-1.fc25.x86_64
xorg-x11-drv-nvidia-340xx-kmodsrc-340.101-1.fc25.x86_64
xorg-x11-drv-nvidia-340xx-libs-340.101-1.fc25.x86_64
4.8.16-300.fc25.x86_64

Why?
Comment 1 Nicolas Chauvet 2017-01-12 09:18:14 CET
Can you please attach the output of nvidia-bug-report.sh
Can you attach the output of both lsinitrd of each initramfs for a given kernel ?
Comment 2 dekomori9854 2017-01-12 12:42:37 CET
Created attachment 1719 [details]
after  dracut /boot/initramfs-$(uname -r).img $(uname -r) operation

Thank you very much for your quick response.
sorry,unfotunately,i could gather only few documents this time.

> Can you attach the output of both lsinitrd of each initramfs for a given kernel ?

# lsinitrd initramfs-4.8.16-300.fc25.x86_64.img
Image: initramfs-4.8.16-300.fc25.x86_64.img: 17M
========================================================================
Early CPIO image
========================================================================
drwxr-xr-x   3 root     root            0 Nov  7 18:59 .
-rw-r--r--   1 root     root            2 Nov  7 18:59 early_cpio
drwxr-xr-x   3 root     root            0 Nov  7 18:59 kernel
drwxr-xr-x   3 root     root            0 Nov  7 18:59 kernel/x86
drwxr-xr-x   2 root     root            0 Nov  7 18:59 kernel/x86/microcode
-rw-r--r--   1 root     root        24576 Nov  7 18:59 kernel/x86/microcode/GenuineIntel.bin
========================================================================
Version: dracut-044-78.fc25

# lsinitrd initramfs-4.8.16-300.fc25.x86_64-nvidia.img
Image: initramfs-4.8.16-300.fc25.x86_64-nvidia.img: 17M
========================================================================
Early CPIO image
========================================================================
drwxr-xr-x   3 root     root            0 Nov  7 18:59 .
-rw-r--r--   1 root     root            2 Nov  7 18:59 early_cpio
drwxr-xr-x   3 root     root            0 Nov  7 18:59 kernel
drwxr-xr-x   3 root     root            0 Nov  7 18:59 kernel/x86
drwxr-xr-x   2 root     root            0 Nov  7 18:59 kernel/x86/microcode
-rw-r--r--   1 root     root        24576 Nov  7 18:59 kernel/x86/microcode/GenuineIntel.bin
========================================================================
Version: dracut-044-78.fc25
Comment 3 dekomori9854 2017-01-12 12:49:42 CET
Created attachment 1720 [details]
lsinitrd initramfs-4.8.16-300.fc25.x86_64.img
Comment 4 dekomori9854 2017-01-13 04:42:13 CET
Created attachment 1721 [details]
kernel 4.9.2-200 nvidia-bug-report

This problem is solid after kernel update.
and lsinitrd initramfs-4.9.2-200.fc25.x86_64.img also attaches.
Comment 5 dekomori9854 2017-01-13 04:49:26 CET
Created attachment 1722 [details]
initramfs-4.9.2-200.fc25.x86_64.img

initramfs-4.9.2-200.fc25.x86_64-nvidia.img isn't generated on 4.9.2-200 kernel version up.
Comment 6 Nicolas Chauvet 2017-01-13 08:18:59 CET
Can you tell us which command do you use to update the kernel ?
Comment 7 dekomori9854 2017-01-13 08:26:00 CET
(In reply to Nicolas Chauvet from comment #6)
> Can you tell us which command do you use to update the kernel ?

# dnf update kernel* --enablerepo=updates-testing
Comment 8 Nicolas Chauvet 2017-01-13 09:44:00 CET
Can you attach the output of /var/cache/akmods/akmods.log ?

Can you point me to the lsinitrd of the -nvidia (original) version for the initramfs for 4.8.16-300.fc25 kernel ? I only need the fixed version.

Can you reproduce using a xorg.conf file ?
Comment 9 leigh scott 2017-01-13 09:53:20 CET
Can you also attach the output for

journalctl |grep akmod
Comment 10 Nicolas Chauvet 2017-01-13 09:57:01 CET
I only s/need/see/ the fixed version.
Comment 11 dekomori9854 2017-01-13 12:34:42 CET
Created attachment 1723 [details]
/var/cache/akmods/akmods.log logfile

> Can you point me to the lsinitrd of the -nvidia (original) version for the initramfs for 4.8.16-300.fc25 kernel ? I only need the fixed version.

sorry,It's a new kernel because a recovery was done in "# dracut /boot/initramfs-$(uname -r).img $(uname -r)"
 opration in 4.8.16-300.fc25 kernel nvidia-bug-report.sh, output, would.
Comment 12 dekomori9854 2017-01-13 12:38:42 CET
Created attachment 1724 [details]
/var/cache/akmods/akmods.log
Comment 13 dekomori9854 2017-01-13 12:43:59 CET
Created attachment 1725 [details]
journalctl |grep akmod
Comment 14 dekomori9854 2017-01-13 12:45:58 CET
(In reply to Nicolas Chauvet from comment #10)
> I only s/need/see/ the fixed version.

sorry,Is that my setting mistake?
Please tell me the bypass plan.
Comment 15 Nicolas Chauvet 2017-01-13 12:56:35 CET
(In reply to dekomori9854 from comment #14)

> sorry,Is that my setting mistake?
> Please tell me the bypass plan.

What I would expect is that the nvidia.ko isn't (sometime) build on kernel post install.

Instead, it's build on the first boot for the new kernel, then it's available and properly operating on the second boot of the same kernel.
If you regenerate the dracut image in-between, it has no effect.
(that's not the new dracut initramfs that made the fix).


Can you try to boot on a previous kernel, but with your backup initramfs ? (ending with -nvidia) ?
Comment 16 Rick 2017-01-13 22:27:57 CET
(In reply to Nicolas Chauvet from comment #15)
> (In reply to dekomori9854 from comment #14)
> 
> > sorry,Is that my setting mistake?
> > Please tell me the bypass plan.
> 
> What I would expect is that the nvidia.ko isn't (sometime) build on kernel
> post install.
> 
> Instead, it's build on the first boot for the new kernel, then it's
> available and properly operating on the second boot of the same kernel.
> If you regenerate the dracut image in-between, it has no effect.
> (that's not the new dracut initramfs that made the fix).
> 
> 
> Can you try to boot on a previous kernel, but with your backup initramfs ?
> (ending with -nvidia) ?

I experienced the same with akmod-nvidia-375 (i.e. build on first boot for the new kernel but doesn't work, then it's available and operating properly on the second boot) when upgrading from kernel-4.1.13-100.fc21.x86_64 to kernel-4.8.15-300.fc25.x86_64. But now when updating to kernel-4.8.16-300.fc25.x86_64, akmod-nvidia-375 does not appear to work at all?, although dnf reports that kmod-nvidia-4.8.16-300.fc25.x86_64-375.26-1.fc25.x86_64 is present.
Comment 17 leigh scott 2017-01-13 23:51:55 CET
(In reply to Rick from comment #16)

> all?, although dnf reports that
> kmod-nvidia-4.8.16-300.fc25.x86_64-375.26-1.fc25.x86_64 is present.

File a separate issue as this report is for 340.xx driver only.
Comment 18 dekomori9854 2017-01-14 07:32:47 CET
Hello again!
You could reappear by your procedure.
Below is a procedure in the case.

1. $ uname -r
4.8.16-300.fc25.x86_64

2. boot]$ sudo rm initramfs-4.9.2-200.fc25.x86_64-nvidia.img initramfs-4.8.16-300.fc25.x86_64-nvidia.img

3. # dnf remove *nvidia*

4. # reboot

5. # dnf install xorg-x11-drv-nvidia-340xx akmod-nvidia-340xx

6. # reboot

7. x is broken (nvidia.ko isn't)

8. # reboot (not make new dracut initramfs)

9, normaly start with nvidia340xx

Is a restart necessary why to do generate nvidia.ko twice?
Comment 19 Nicolas Chauvet 2017-01-14 10:40:51 CET
How much time between 5 and 6 ?

I don't think the module is generated twice. I think it's generated on the first boot (instead of post-installation). So this is the first bug I'm seeing.
Comment 20 dekomori9854 2017-01-14 10:54:27 CET
It's immediately from step5 to step5.

I think nvidia.ko is generated into the time of a boot, too.
Comment 21 dekomori9854 2017-01-14 10:55:27 CET
sorry miss,It's immediately from step5 to step6.
Comment 22 Nicolas Chauvet 2017-01-14 11:04:27 CET
(In reply to dekomori9854 from comment #20)
> It's immediately from step5 to step5.
In this case we need to find a way to lock any reboot before the module get installed. You need to wait up to 5 minutes in some cases.
Comment 23 leigh scott 2017-01-14 11:20:10 CET
(In reply to Nicolas Chauvet from comment #22)
> (In reply to dekomori9854 from comment #20)
> > It's immediately from step5 to step5.
> In this case we need to find a way to lock any reboot before the module get
> installed. 

I'm  against locking reboot to fix this issue/non-issue, users will have to learn to wait for the process to complete.

> You need to wait up to 5 minutes in some cases.
Comment 24 dekomori9854 2017-01-14 12:04:56 CET
i am agree.
thanks you very much for your kind support!
Comment 25 Nicolas Chauvet 2017-01-14 13:00:33 CET
(In reply to leigh scott from comment #23)
> (In reply to Nicolas Chauvet from comment #22)
> > (In reply to dekomori9854 from comment #20)
> > > It's immediately from step5 to step5.
> > In this case we need to find a way to lock any reboot before the module get
> > installed. 
> 
> I'm  against locking reboot to fix this issue/non-issue, users will have to
> learn to wait for the process to complete.
Well, there is already some inhibitors in place (you cannot shutdown during rpm transaction, or when root is connected locally or remotely).

Further, it's probably possible to optimize the build of kmod to start right after the kernel-devel is available, then install right after the rpm transaction has finished... So the lock (and wait) is shorter.
Comment 26 dekomori9854 2017-01-15 07:27:02 CET
p.s.

Sir leigh.
I waited for 5 minutes step5. and reboot,nvidia340 normally moved.

you told me information this time, so I understood change in the nvidia.ko generation.

Many people think nvidia.ko is formed into the time of shut down.

I expect whether the explanation is indicated on Howto/nVidia time of the kernel time of UP to add a lock function (reboot)....
Comment 27 Audrey Toskin 2017-02-28 07:13:28 CET
> I'm  against locking reboot to fix this issue/non-issue, users will have to
> learn to wait for the process to complete.
> 
> > You need to wait up to 5 minutes in some cases.

I'm confused. Wait for what? Obviously I wouldn't reboot before `dnf upgrade` has finished and returned to the shell prompt.
Comment 28 leigh scott 2017-02-28 08:28:15 CET
(In reply to terrycloth from comment #27)
> > I'm  against locking reboot to fix this issue/non-issue, users will have to
> > learn to wait for the process to complete.
> > 
> > > You need to wait up to 5 minutes in some cases.
> 
> I'm confused. Wait for what? Obviously I wouldn't reboot before `dnf
> upgrade` has finished and returned to the shell prompt.

The kmod is generated and installed after the dnf transaction is returned to the shell prompt!, so you need to wait an extra 5 minutes for this to complete.
Comment 29 Audrey Toskin 2017-02-28 23:58:55 CET
(In reply to leigh scott from comment #28)

Okay, that explains the comments that say something about adding a lock on the reboot process. I'm honestly baffled that there isn't such a mechanism already, and also that dnf returns before the kernel module has finished installing. As far as I know, there is no other type of package where the user has to know in advance to wait for secret invisible background processes to finish before they reboot (or just shutdown). Whenever an update involves the kernel, my thought has always been to reboot right away, so that I can start *using* the new kernel/modules.

Is there at least a process I can look up with `ps` or something, so I can *see* whether or not the kernel module has finished building?
Comment 30 Audrey Toskin 2017-03-09 11:25:27 CET
My computer's kind of slow, and since no one has mentioned a way to definitively know whether the driver is done building, I waited *15* minutes after trying to reinstall the Nvidia akmod before rebooting. However I still ended up getting the white screen with "something has gone wrong".
Comment 31 Nicolas Chauvet 2017-12-05 17:09:20 CET
This module is missing the nvidia fallback support which might be usefull for 340xx (it probably doesn't worth for 304xx).

Do you still reproduce the issue with the current distro ? Can you report the output of nvidia-bug-report.sh

One key task is probably to sync the 340xx serie with the main driver. If you can help with that it would be welcomed.
Comment 32 Emmanuel Seyman 2017-12-12 13:21:38 CET
RPMFusion is no longer releasing updates for this version of Fedora. This bug
will be set to RESOLVED:EXPIRED next week to reflect this.

If the problem persists after upgrading to the latest version of Fedora, please
update the version field of this bug (and re-open it if it has been closed).
Comment 33 Audrey Toskin 2017-12-13 07:33:58 CET
Trying again Fedora 27 Workstation x86_64 right now to see the current state of things. I just tried installing package updates with dnf, including a kernel update.

* dnf upgrade && reboot 
* With the Plymouth loading screen set to the "details" (text mode) theme, I notice a message saying Nvidia failed to load, falling back to Nouveau. Boot otherwise looks normal.
* akmods --force && reboot
* Nvidia module loads like it's supposed to.

So we have the Nouveau fallback in place, so that there isn't anymore scary white screen saying "Something went wrong." But I dunno, to me, silently failing and falling back to Nouveau seems even worse, in a way. Users could install the proprietary Nvidia drivers specifically to play Steam games, and then be confused when the games still perform poorly.

Building the akmods is necessarily a part of installing a new kernel. It baffles me that that this doesn't already block the dnf or packagekit update/installation process until akmods is finished.
Comment 34 Nicolas Chauvet 2017-12-13 08:27:01 CET
(In reply to Andrew Toskin from comment #33)
> Trying again Fedora 27 Workstation x86_64 right now to see the current state
> of things. I just tried installing package updates with dnf, including a
> kernel update.
> 
> * dnf upgrade && reboot 
That's the root error, one need to wait for the akmods task to end right after the kernel update (posttrans).
I'm currently testing a patch for that.

> * With the Plymouth loading screen set to the "details" (text mode) theme, I
> notice a message saying Nvidia failed to load, falling back to Nouveau. Boot
> otherwise looks normal.
> * akmods --force && reboot
> * Nvidia module loads like it's supposed to.
This is another issue where the akmod building process ended with an error for a previous case, and avoid to be restarted again. Solving the first issue might hide a little more this one. Still this is a separate issue. (patch welcomed).

> So we have the Nouveau fallback in place, so that there isn't anymore scary
> white screen saying "Something went wrong." But I dunno, to me, silently
> failing and falling back to Nouveau seems even worse, in a way. Users could
> install the proprietary Nvidia drivers specifically to play Steam games, and
> then be confused when the games still perform poorly.
This concern was raised when the feature was introduced. (patch welcomed).

> Building the akmods is necessarily a part of installing a new kernel. It
> baffles me that that this doesn't already block the dnf or packagekit
> update/installation process until akmods is finished.
Blocking or using an inhibitor would be a good way forward.(patch welcomed).
Comment 35 Nicolas Chauvet 2018-07-29 16:01:31 CEST
I'm closing this bug as this is not an issue with the particular nvidia-340xx driver, but eventually a problem with akmods.

The inihibitor is in place with modern akmods, so if you run sudo dnf update && reboot from a console user, it should trigger the inhibitor.