Bug 4538

Summary: akmod-nvidia-375.66 breaks with kernel 4.10.13-200 and 4.10.14-200
Product: Fedora Reporter: John Gotts <jgotts>
Component: nvidia-kmodAssignee: Nicolas Chauvet <kwizart>
Status: RESOLVED FIXED    
Severity: normal CC: leigh123linux, leigh123linux, ulatekh
Priority: P1    
Version: 25   
Hardware: x86_64   
OS: GNU/Linux   
namespace:
Attachments: Bug report log for kernel-4.10.14-200.fc25.x86_64
Patch for kmod-nvidia package
build log

Description John Gotts 2017-05-13 17:26:39 CEST
I can't use Wayland, because it can't do auto-raise properly. (There is no way to lower windows!) I've been using auto-raise for 23 years, so I'm stuck using Xorg.

When upgrading past 4.10.12-200, logging in with GNOME (Xorg) crashes back to the login prompt. dmesg indicates something about a version mismatch with 375.66. I originally thought about downgrading akmod, but that was a wild goose chase. Only downgrading the kernel works.

Hopefully everyone can reproduce this. Just click on your login and then click on the gear and select the Xorg version of GNOME. After correctly typing your password you will notice that X crashes and you are back at the login prompt.
Comment 1 leigh scott 2017-05-13 18:23:16 CEST
(In reply to John Gotts from comment #0)
> I can't use Wayland, because it can't do auto-raise properly. (There is no
> way to lower windows!) I've been using auto-raise for 23 years, so I'm stuck
> using Xorg.

You can't use wayland with nvidia driver

> 
> When upgrading past 4.10.12-200, logging in with GNOME (Xorg) crashes back
> to the login prompt. dmesg indicates something about a version mismatch with
> 375.66. 

What nvidia version were you using before 375.66?
Also post

rpm -qa *\nvidia\*



> Hopefully everyone can reproduce this. Just click on your login and then
> click on the gear and select the Xorg version of GNOME. After correctly
> typing your password you will notice that X crashes and you are back at the
> login prompt.

Post the output for

nvidia-bug-report.sh
Comment 2 John Gotts 2017-05-13 18:50:42 CEST
Created attachment 1779 [details]
Bug report log for kernel-4.10.14-200.fc25.x86_64

I upgraded to kernel-4.10.14-200.fc25.x86_64 and this is the output from nvidia-bug-report.sh. kernel-4.10.13-200.fc25.x86_64 exhibits the same behavior. I can supply output for that kernel version on request.
Comment 3 John Gotts 2017-05-13 18:55:18 CEST
Hi, Leigh, it's nice to meet you. Thanks for the rapid response!

$ rpm -qa *\nvidia\* | sort
akmod-nvidia-375.66-1.fc25.x86_64
kmod-nvidia-4.10.12-200.fc25.x86_64-375.66-1.fc25.x86_64
kmod-nvidia-4.10.13-200.fc25.x86_64-375.66-1.fc25.x86_64
kmod-nvidia-4.10.14-200.fc25.x86_64-375.66-1.fc25.x86_64
xorg-x11-drv-nvidia-375.66-1.fc25.x86_64
xorg-x11-drv-nvidia-kmodsrc-375.66-1.fc25.x86_64
xorg-x11-drv-nvidia-libs-375.66-1.fc25.x86_64

My laptop is a 2017 Dell gaming laptop with 32 GB of memory and 8 usable cores. Why would anybody buy such a ridiculous thing? My last laptop purchase was in 2000 so I would like this one to also last me 18 years. Normally I stick with Intel graphics but this was the only laptop I could find with 32 GB support for under $2,000.
Comment 4 leigh scott 2017-05-13 19:39:33 CEST
looks like nouveau is also loading

i915 1449984 7 - Live 0xffffffffc02da000
mxm_wmi 16384 1 nouveau, Live 0xffffffffc0227000
i2c_algo_bit 16384 2 nouveau,i915, Live 0xffffffffc0222000
drm_kms_helper 155648 3 nvidia_drm,nouveau,i915, Live 0xffffffffc02a6000
crc32c_intel 24576 1 - Live 0xffffffffc01e6000
drm 352256 12 nvidia_drm,nouveau,ttm,i915,drm_kms_helper, Live 0xffffffffc0237000
r8169 81920 0 - Live 0xffffffffc020d000
serio_raw 16384 0 - Live 0xffffffffc01d5000
mii 16384 1 r8169, Live 0xffffffffc01df000
i2c_hid 20480 0 - Live 0xffffffffc022d000
wmi 16384 4 dell_led,dell_wmi,nouveau,mxm_wmi, Live 0xffffffffc0208000
video 40960 3 dell_wmi,nouveau,i915, Live 0xffffffffc01f8000

Try

sudo /usr/sbin/grubby --update-kernel=ALL --remove-args='nouveau.modeset=0 rd.driver.blacklist=nouveau'


sudo /usr/sbin/grubby --update-kernel=ALL --args='rd.driver.blacklist=nouveau modprobe.blacklist=nouveau'


reboot
Comment 5 John Gotts 2017-05-13 22:51:28 CEST
That fixed the problem somehow.

Thanks for the troubleshooting.
Comment 6 leigh scott 2017-05-15 11:06:35 CEST
Fix released
Comment 7 Steve 2017-05-20 22:56:58 CEST
What was the fix?  My akmod-nvidia still won't build with kernel 4.10.15-100.  I get the following error:

/tmp/akmodsbuild.ck7ghHEr/BUILD/nvidia-kmod-375.39/_kmod_build_4.10.15-100.fc24.i686+PAE/common/inc/nv-mm.h: In function 'nv_page_fault_va':
/tmp/akmodsbuild.ck7ghHEr/BUILD/nvidia-kmod-375.39/_kmod_build_4.10.15-100.fc24.i686+PAE/common/inc/nv-mm.h:142:46: error: 'struct vm_fault' has no member named 'virtual_address' 
         return (unsigned long)(uintptr_t)(vmf->virtual_address);
                                              ^~

I tried to build nvidia-kmod myself, but it only builds kmod-nvidia-375.39-2.fc24.i686.rpm, not the kmod-nvidia-4.10.15-100.fc24.i686+PAE-375.39-2.fc24.i686 that I presumably need, and I have no idea why.
Comment 8 Steve 2017-05-20 23:11:48 CEST
Created attachment 1781 [details]
Patch for kmod-nvidia package

OK, I had to change buildforkernels in nvidia-kmod.spec to "newest" and now I can build.

"virtual_address" was renamed to "address" in struct vm_fault recently.  See https://github.com/01org/linux-sgx-driver/pull/19 for another package that had to deal with this.

The enclosed patch got kmod-nvidia to build in 4.10.15, and I'm running the driver now.
Comment 9 leigh scott 2017-05-21 09:29:03 CEST
Created attachment 1782 [details]
build log

(In reply to Steve from comment #8)
> Created attachment 1781 [details]
> Patch for kmod-nvidia package
 
> The enclosed patch got kmod-nvidia to build in 4.10.15, and I'm running the
> driver now.

It compiles fine here without the need for any additional patching, I have bumped f24 packages to  version 375.66
If that fails as well you will need to report the issue to nvidia as i686 shouldn't fail.