Bug 5748

Summary: gdm will not start unless vt is switched back and forth
Product: Fedora Reporter: Julian Sikorski <belegdol>
Component: nvidia-kmodAssignee: Nicolas Chauvet <kwizart>
Status: RESOLVED EXPIRED    
Severity: enhancement CC: leigh123linux, leigh123linux
Priority: P1    
Version: f32   
Hardware: x86_64   
OS: GNU/Linux   
namespace:
Attachments: /var/log/journal
journal with nvidia-fallback.service masked
journal with lightdm

Description Julian Sikorski 2020-09-08 13:01:03 CEST
Created attachment 2224 [details]
/var/log/journal

I originally reported this at gdm's gitlab page [1] but was told it is a build dependency issue. Oddly enough it only happens on my laptop (680M) and not on my desktop (2070).
I have noticed a peculiar behaviour of gdm for some time now: gdm will not start and boot will show blinking cursor only (no plymouth bar or similar) unless I switch to vt2 and then back to vt1. Journal is attached.
In the attached journal.log I waited in vt2 until 08:29:53, after which I switched to vt1 and gdm started normally.

[1] https://gitlab.gnome.org/GNOME/gdm/-/issues/626
Comment 1 Nicolas Chauvet 2020-09-08 13:10:52 CEST
Why nvidia-drm.modeset=0 ? in cmdline ?
Comment 2 Julian Sikorski 2020-09-08 13:12:32 CEST
Because I get a stack trace on this machine when setting modeset to 1:
https://forums.developer.nvidia.com/t/stack-trace-when-attempting-to-use-kms-on-fedora-31-with-geforce-680m/46353
Comment 3 Nicolas Chauvet 2020-09-08 13:15:54 CEST
There is also that:

wrz 06 08:31:35 snowball2 kernel: resource sanity check: requesting [mem 0x000e0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000e0000-0x000e3fff window]
wrz 06 08:31:35 snowball2 kernel: caller _nv030454rm+0x58/0xa0 [nvidia] mapping multiple BARs
wrz 06 08:31:35 snowball2 kernel: ACPI Warning: \_SB.PCI0.PEG0.MXM3._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200528/nsarguments-59)
wrz 06 08:31:35 snowball2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window]
wrz 06 08:31:35 snowball2 kernel: caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs

Which seems related to resources reservation conflicts with the nvidia card.
You might have a look on any bios update from your Clevo laptop vendor.

You might also need to forward the issue to nvidia (devtalk.nvidia.com) with attached the output of the nvidia-bug-report.sh archive.
Comment 4 Julian Sikorski 2020-09-08 13:20:01 CEST
This laptop is about 10 years old and has not had a bios update in years. I do not really mind having modeset disabled, it is experimental according to the documentation anyway.
Isn't the fact that the nvidia kernel module only gets loaded after gdm fails to start once not a packaging problem?
Comment 5 leigh scott 2020-09-08 13:29:34 CEST
(In reply to Julian Sikorski from comment #4)

> Isn't the fact that the nvidia kernel module only gets loaded after gdm
> fails to start once not a packaging problem?

IMO it's gdm that should wait for X before trying start.
Comment 6 Nicolas Chauvet 2020-09-08 15:36:04 CEST
(In reply to Julian Sikorski from comment #4)
> This laptop is about 10 years old and has not had a bios update in years. I
> do not really mind having modeset disabled, it is experimental according to
> the documentation anyway.
> Isn't the fact that the nvidia kernel module only gets loaded after gdm
> fails to start once not a packaging problem?

I'm not reproducing on gdm/gnome f31 f32...
But having nvidia-drm=modeset=0 is certainly the root cause of such a race.

Where have you forwarded the nvidia-bug-report.sh archive to devtalk.nvidia.com ?
Comment 7 Nicolas Chauvet 2020-09-08 15:37:52 CEST
(In reply to leigh scott from comment #5)

> IMO it's gdm that should wait for X before trying start.
Not at all (gdm starts X, not the other way), until gdm starts wayland...
Comment 8 Julian Sikorski 2020-09-08 15:42:10 CEST
(In reply to Nicolas Chauvet from comment #6)
> (In reply to Julian Sikorski from comment #4)
> > This laptop is about 10 years old and has not had a bios update in years. I
> > do not really mind having modeset disabled, it is experimental according to
> > the documentation anyway.
> > Isn't the fact that the nvidia kernel module only gets loaded after gdm
> > fails to start once not a packaging problem?
> 
> I'm not reproducing on gdm/gnome f31 f32...
> But having nvidia-drm=modeset=0 is certainly the root cause of such a race.
> 
> Where have you forwarded the nvidia-bug-report.sh archive to
> devtalk.nvidia.com ?

I did, years ago (see the linked forum thread, posts #1 and #4). Nothing happened unfortunately.
It appears to be a race condition indeed. I added vga=0x34d just to see what happens and gdm starts as expected.
Comment 9 Nicolas Chauvet 2020-09-08 15:44:30 CEST
(In reply to Nicolas Chauvet from comment #6)
> (In reply to Julian Sikorski from comment #4)
...
> Where have you forwarded the nvidia-bug-report.sh archive to
> devtalk.nvidia.com ?

Please provide a recent nvidia-bug-report archive to this thread:
https://forums.developer.nvidia.com/t/stack-trace-when-attempting-to-use-kms-on-fedora-31-with-geforce-680m/46353/10


Please also verify the option that you can enable/disable related to display and/or PCI ressources.
Comment 10 leigh scott 2020-09-08 16:23:04 CEST
I have looked at the log and I'm wondering why gdm is attempting to use wayland.
We used to disable it till gdm added there blacklisting for nvidia.

/usr/lib/udev/rules.d/61-gdm.rules

https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/commit/?h=f32&id=6f7f9a3cbb4d2f0d957a2a8f21cb2ca50238666f

Maybe try disabling gdn wayland using the old method.


wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) NOUVEAU driver for NVIDIA chipset families :
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         RIVA TNT        (NV04)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         RIVA TNT2       (NV05)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 256     (NV10)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 2       (NV11, NV15)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 4MX     (NV17, NV18)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 3       (NV20)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 4Ti     (NV25, NV28)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce FX      (NV3x)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 6       (NV4x)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 7       (G7x)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce 8       (G8x)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce GTX 200 (NVA0)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         GeForce GTX 400 (NVC0)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) modesetting: Driver for Modesetting Kernel Drivers: kms
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) FBDEV: driver for framebuffer: fbdev
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) VESA: driver for VESA chipsets: vesa
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) [drm] Failed to open DRM device for pci:0000:01:00.0: -19
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) open /dev/dri/card0: No such file or directory
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (WW) Falling back to old probe method for modesetting
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) open /dev/dri/card0: No such file or directory
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) Loading sub module "fbdevhw"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) LoadModule: "fbdevhw"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) Loading /usr/lib64/xorg/modules/libfbdevhw.so
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) Module fbdevhw: vendor="X.Org Foundation"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         compiled for 1.20.8, module version = 0.0.2
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         ABI class: X.Org Video Driver, version 24.1
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Unable to find a valid framebuffer device
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (WW) Falling back to old probe method for fbdev
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) Loading sub module "fbdevhw"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) LoadModule: "fbdevhw"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) Loading /usr/lib64/xorg/modules/libfbdevhw.so
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) Module fbdevhw: vendor="X.Org Foundation"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         compiled for 1.20.8, module version = 0.0.2
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:         ABI class: X.Org Video Driver, version 24.1
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) open /dev/fb0: No such file or directory
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: vesa: Ignoring device with a bound kernel driver
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Screen 0 deleted because of no matching config section.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) UnloadModule: "modesetting"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Screen 0 deleted because of no matching config section.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) UnloadModule: "fbdev"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) UnloadSubModule: "fbdevhw"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Screen 0 deleted because of no matching config section.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (II) UnloadModule: "vesa"
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Device(s) detected, but none match those in the config file.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: Fatal server error:
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) no screens found(EE)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: Please consult the Fedora Project support
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:          at http://wiki.x.org
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]:  for help.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE)
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1150]: (EE) Server terminated with error (1). Closing log file.
wrz 06 08:28:29 snowball2 /usr/libexec/gdm-x-session[1148]: Unable to run X server
wrz 06 08:28:29 snowball2 gdm-launch-environment][1144]: pam_unix(gdm-launch-environment:session): session closed for user gdm
wrz 06 08:28:29 snowball2 audit[1144]: USER_END pid=1144 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask acct="gdm" exe="/usr/libexec/gdm-session-worker" hostname=snowball2 addr=? terminal=/dev/tty1 res=success'
wrz 06 08:28:29 snowball2 audit[1144]: CRED_DISP pid=1144 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_permit acct="gdm" exe="/usr/libexec/gdm-session-worker" hostname=snowball2 addr=? terminal=/dev/tty1 res=success'
wrz 06 08:28:29 snowball2 gdm[991]: Child process -1148 was already dead.
wrz 06 08:28:29 snowball2 systemd[1]: session-c2.scope: Succeeded.
wrz 06 08:28:29 snowball2 systemd-logind[921]: Session c2 logged out. Waiting for processes to exit.
wrz 06 08:28:29 snowball2 systemd-logind[921]: Removed session c2.
wrz 06 08:28:30 snowball2 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  450.66  Wed Aug 12 19:42:48 UTC 2020
wrz 06 08:28:30 snowball2 systemd-udevd[667]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c 195 255'' failed with exit code 1.
wrz 06 08:28:30 snowball2 kernel: nvidia-uvm: Loaded the UVM driver, major device number 236.
wrz 06 08:28:30 snowball2 kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  450.66  Wed Aug 12 19:37:58 UTC 2020
wrz 06 08:28:30 snowball2 kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
wrz 06 08:28:30 snowball2 kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
wrz 06 08:28:30 snowball2 systemd[1]: Created slice system-systemd\x2dbacklight.slice.
wrz 06 08:28:30 snowball2 systemd[1]: Condition check resulted in Fallback to nouveau as nvidia did not load being skipped.
Comment 11 leigh scott 2020-09-08 16:29:00 CEST
(In reply to Nicolas Chauvet from comment #7)
> (In reply to leigh scott from comment #5)
> 
> > IMO it's gdm that should wait for X before trying start.
> Not at all (gdm starts X, not the other way), until gdm starts wayland...

Looking at the log, nvidia module takes too long to load so gdm loads nouveau which fails, then on the second attempt it uses nvidia.

Maybe we should revert?


https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/commit/?h=f32&id=6f7f9a3cbb4d2f0d957a2a8f21cb2ca50238666f
Comment 12 Nicolas Chauvet 2020-09-08 17:05:25 CEST
(In reply to leigh scott from comment #11)
> (In reply to Nicolas Chauvet from comment #7)
> > (In reply to leigh scott from comment #5)
> > 
> > > IMO it's gdm that should wait for X before trying start.
> > Not at all (gdm starts X, not the other way), until gdm starts wayland...
> 
> Looking at the log, nvidia module takes too long to load so gdm loads
> nouveau which fails, then on the second attempt it uses nvidia.
nouveau isn't loaded by gdm or Xorg with blacklisted in cmdline.
But more likely by nvidia-fallback.service

Try to reproduce with systemctl mask nvidia-fallback.service

> Maybe we should revert?
NO !?
Comment 13 Julian Sikorski 2020-09-08 17:34:44 CEST
Created attachment 2225 [details]
journal with nvidia-fallback.service masked

nvidia-fallback.service seems to have no effect. The last entry before switching back to vt1 is at 17:34:18.
Regarding the ACPI warning, this appears to be a known, harmless problem:
https://askubuntu.com/questions/842134/acpi-warning-argument-4-type-mismatch
Comment 14 leigh scott 2020-09-14 09:33:57 CEST
Does this issue occur with another DM?
Try reproducing the issue with lightdm.
Comment 15 Julian Sikorski 2020-09-16 16:46:26 CEST
Created attachment 2227 [details]
journal with lightdm

With lightdm X appears to start without need for VT switch, but lightdm itself appears not to work (only white screen and cursor is shown).
Comment 16 Nicolas Chauvet 2020-09-16 18:30:20 CEST
(In reply to Julian Sikorski from comment #13)
> Created attachment 2225 [details]
> journal with nvidia-fallback.service masked
> 
> nvidia-fallback.service seems to have no effect. The last entry before
What do you mean by no effect ?

Does systemctl mask nvidia-fallback prevent nouveau from loading in a "racy" condition ?
Comment 17 Nicolas Chauvet 2020-09-17 12:06:59 CEST
As a side note, I have a similar issue on my ARM devices,
I need to restart gdm before it can display anything.

Seems like there is a race between the display-manager and the graphic driver stack. And disabling modeset allows to trigger the error more easily.
Comment 18 Nicolas Chauvet 2020-11-02 10:42:05 CET
If removing nvidia-drm.modeset=1 from grub, can you replace it with rd.driver.pre=nvidia-drm instead ?

Does it workaround your problem ?
Comment 19 Julian Sikorski 2020-11-02 17:51:44 CET
It helped once and didn't once, I need to test more. I have upgraded to F33 since in case it matters.
Comment 20 Nicolas Chauvet 2021-03-23 09:11:37 CET
Is this still reproducible ?
Is a gdm report was made ?
Comment 21 Julian Sikorski 2021-03-24 21:08:33 CET
I have retired the machine affected by this bug meaning I can no longer test it. Sorry.
Comment 22 Nicolas Chauvet 2021-03-24 21:43:18 CET
No problem, let's close the bug then.