Bug 5153

Summary: Kernel dump: bad or missing usercopy whitelist?
Product: Fedora Reporter: Bas Mevissen <abuse>
Component: nvidia-390xx-kmodAssignee: Richard <hobbes1069>
Status: RESOLVED EOL    
Severity: normal CC: abuse, fedora, leigh123linux
Priority: P1    
Version: f29   
Hardware: x86_64   
OS: GNU/Linux   
namespace:
Attachments: kmem_cache_create_usercopy.patch

Description Bas Mevissen 2019-01-29 20:34:30 CET
Hi all,

Since quite some time, the nvidia kmod causes a kernel dump during startup:

[   11.061226] ------------[ cut here ]------------
[   11.061229] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'nvidia_stack_cache' (offset 11440, size 3)!
[   11.061246] WARNING: CPU: 0 PID: 1261 at mm/usercopy.c:83 usercopy_warn+0x7d/0xa0
[   11.061246] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables hwmon_vid squashfs zstd_decompress xxhash loop iTCO_wdt iTCO_vendor_support arc4 coretemp ath9k kvm_intel ath9k_common ath9k_hw kvm rtl8187 mac80211 irqbypass ath snd_hda_codec_hdmi uvcvideo cfg80211 i2c_i801 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev nvidia_drm(POE) nvidia_modeset(POE) joydev snd_usbmidi_lib media snd_rawmidi eeprom_93cx6 ftdi_sio nvidia(POE) lpc_ich snd_hda_codec_realtek rfkill snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq drm_kms_helper snd_seq_device snd_pcm drm snd_timer snd
[   11.061277]  i82975x_edac ipmi_devintf asus_atk0110 ipmi_msghandler soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc uas usb_storage raid0 serio_raw firewire_ohci firewire_core ata_generic pata_acpi crc_itu_t sky2 pata_jmicron i2c_dev
[   11.061288] CPU: 0 PID: 1261 Comm: Xorg Tainted: P           OE     4.20.4-200.fc29.x86_64 #1
[   11.061289] Hardware name: ASUSTEK COMPUTER INC P5W DH Deluxe/P5W DH Deluxe, BIOS 2801    07/10/2008
[   11.061291] RIP: 0010:usercopy_warn+0x7d/0xa0
[   11.061292] Code: 0e b5 41 51 4d 89 d8 48 c7 c0 8a 62 0d b5 49 89 f1 48 89 f9 48 0f 45 c2 48 c7 c7 d8 9a 0e b5 4c 89 d2 48 89 c6 e8 2d 8d e0 ff <0f> 0b 48 83 c4 18 c3 48 c7 c6 61 27 10 b5 49 89 f1 49 89 f3 eb 96
[   11.061293] RSP: 0018:ffffb67e42437b60 EFLAGS: 00010286
[   11.061294] RAX: 0000000000000000 RBX: ffff90f3b71ddcb0 RCX: 0000000000000006
[   11.061295] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff90f3b9a168c0
[   11.061296] RBP: 0000000000000003 R08: 0000000000000098 R09: 0000000000000001
[   11.061296] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[   11.061297] R13: ffff90f3b71ddcb3 R14: 0000000000000000 R15: ffff90f3b71ddcf8
[   11.061299] FS:  00007f633716bac0(0000) GS:ffff90f3b9a00000(0000) knlGS:0000000000000000
[   11.061300] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.061300] CR2: 00007f63337d2d20 CR3: 000000019af5a000 CR4: 00000000000006f0
[   11.061301] Call Trace:
[   11.061306]  __check_object_size+0x15d/0x189
[   11.061683]  os_memcpy_to_user+0x21/0x40 [nvidia]
[   11.061845]  _nv009384rm+0xbf/0xe0 [nvidia]
[   11.062020]  ? _nv028097rm+0x79/0x90 [nvidia]
[   11.062194]  ? _nv028097rm+0x55/0x90 [nvidia]
[   11.062364]  ? _nv013699rm+0xee/0x100 [nvidia]
[   11.062534]  ? _nv015347rm+0x154/0x270 [nvidia]
[   11.062692]  ? _nv008317rm+0x134/0x1a0 [nvidia]
[   11.062850]  ? _nv008296rm+0x29c/0x2b0 [nvidia]
[   11.063008]  ? _nv001072rm+0xe/0x20 [nvidia]
[   11.063168]  ? _nv007324rm+0xd8/0x100 [nvidia]
[   11.063309]  ? _nv001171rm+0x627/0x830 [nvidia]
[   11.063447]  ? rm_ioctl+0x73/0x100 [nvidia]
[   11.063604]  ? nvidia_ioctl+0x10/0x710 [nvidia]
[   11.063758]  ? nvidia_ioctl+0x561/0x710 [nvidia]
[   11.063760]  ? kmem_cache_free+0x1b1/0x1e0
[   11.063914]  ? nvidia_frontend_unlocked_ioctl+0x3a/0x50 [nvidia]
[   11.063915]  ? do_vfs_ioctl+0xa4/0x630
[   11.063917]  ? __fput+0x151/0x220
[   11.063918]  ? ksys_ioctl+0x60/0x90
[   11.063919]  ? __x64_sys_ioctl+0x16/0x20
[   11.063921]  ? do_syscall_64+0x5b/0x160
[   11.063924]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   11.063925] ---[ end trace 0b6c560b3fe08462 ]---

Card info:

06:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 809f
	Flags: bus master, fast devsel, latency 0, IRQ 28
	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=128M]
	Memory at ec000000 (64-bit, prefetchable) [size=32M]
	I/O ports at cc00 [size=128]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia

06:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 809f
	Flags: bus master, fast devsel, latency 0, IRQ 17
	Memory at fea7c000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

The problem is quite well-known and fixed in the mainline. However, it is not fixed by nvidia in the 390xx branch. See for example:

https://devtalk.nvidia.com/default/topic/1031067/linux/-linux416-nvidia-390-48-nvidia_stack_cache-rip-0010-usercopy_warn-0x7e-0xa0/

https://bugzilla.redhat.com/show_bug.cgi?id=1570493

In the latter link, a proposed patch is given. That at least removes the error. But it seems to cause screen tearing now and then. So we need something better.

The patch will most likely need to be applied to 340xx (and older?), see https://bugzilla.rpmfusion.org/show_bug.cgi?id=5086 for a report on 340xx.

It doesn't seem to have any priority at nvidia unfortunately. I hope someone here has the knowledge and time to fix this. I'm more than happy to test a patch!

Regards,

Bas.
Comment 1 leigh scott 2019-01-30 09:11:28 CET
Created attachment 2007 [details]
kmem_cache_create_usercopy.patch

(In reply to Bas Mevissen from comment #0)

> In the latter link, a proposed patch is given. That at least removes the
> error. But it seems to cause screen tearing now and then. So we need
> something better.

Would you prefer to keep the 'log error' (driver still runs ok) or would you prefer tearing?
I can apply the attached patch and that's all I'm prepared to do!
 

> It doesn't seem to have any priority at nvidia unfortunately. I hope someone
> here has the knowledge and time to fix this. I'm more than happy to test a
> patch!

I don't have the hardware or time to fix something nvidia should have fixed a year ago. 

> 
> Regards,
> 
> Bas.
Comment 2 Wolfgang Ulbrich 2019-01-30 09:45:43 CET
I  can confirm the harmless warnings with my older notebook.
But i don't see any issue with using X11 or MATE session.

Honestly, i can live very well with some warnings, but more tearing is a NOGO.
Comment 3 Bas Mevissen 2019-01-30 18:02:15 CET
As my Fedora is a moving target with the testing repositories enabled, I'm not sure whether the patch caused tearing or that it was something else.
So I'll freeze the updates for a day or two and try with and without patch and get back here with the results.
Comment 4 leigh scott 2019-11-26 23:00:53 CET
F29 is EOL