Bug 4498

Summary: [RFC] Implement nvidia-fallback.service
Product: Fedora Reporter: Nicolas Chauvet <kwizart>
Component: xorg-x11-drv-nvidiaAssignee: Nicolas Chauvet <kwizart>
Status: RESOLVED FIXED    
Severity: enhancement CC: hans, isage.dna, leigh123linux, negativo17
Priority: P1    
Version: unspecified   
Hardware: x86_64   
OS: GNU/Linux   
namespace:
Attachments: add nvidia-fallback.service
Updated nvidia-fallback.service

Description Nicolas Chauvet 2017-04-03 13:25:47 CEST
Created attachment 1763 [details]
add nvidia-fallback.service

This was initially an idea from hans, I've tried to Improve a little.
Comment 1 Hans de Goede 2017-04-03 14:27:46 CEST
ConditionKernelCommandLine=rd.driver.blacklist=nouveau

This is not enough there could be a modprobe.conf file blacklisting nouveau in the initrd

ConditionPathExists=!/dev/nvidia

ConditionPathExists=!/dev/dri/card0

This will trigger on optimus machines even if the nouveau driver is not loaded as the i915 driver will be /dev/dri/card0 then

We really need something more smart here then what can be expressed purely in a .service file I'm afraid
Comment 2 Nicolas Chauvet 2017-04-03 14:34:32 CEST
(In reply to Hans de Goede from comment #1)
> ConditionKernelCommandLine=rd.driver.blacklist=nouveau
> 
> This is not enough there could be a modprobe.conf file blacklisting nouveau
> in the initrd
This blacklist has not effect on an explict modprobe. Please at least tests things.

> ConditionPathExists=!/dev/nvidia
> 
> ConditionPathExists=!/dev/dri/card0
> 
> This will trigger on optimus machines even if the nouveau driver is not
> loaded as the i915 driver will be /dev/dri/card0 then
Yes, this is on purpose. I confirm that using intel is enough to (fallback) boot on optimus and this can be corrected later by either modprobing a fixed nvidia or nouveau.
This also have a benefit to not modprobe nouveau where no nvidia card is involved at all, if not set, the fallback will be attempted with the nvidia driver is installed and not nvidia card present which is a case that is supportable.
(think about livemedia).
Comment 3 Hans de Goede 2017-04-03 15:06:27 CEST
Hi,

(In reply to Nicolas Chauvet from comment #2)
> (In reply to Hans de Goede from comment #1)
> > ConditionKernelCommandLine=rd.driver.blacklist=nouveau
> > 
> > This is not enough there could be a modprobe.conf file blacklisting nouveau
> > in the initrd
> This blacklist has not effect on an explict modprobe. Please at least tests
> things.

"This blacklist has not effect on an explict modprobe." Correct, and yes I got that part wrong in my earlier version.

But that is not my point, you're only doing the fallback if the kernel cmdline contains "rd.driver.blacklist=nouveau" but even if it does not contain that and there is a modprobe.conf blacklisting nouveau it will still not be loaded when we execute this script and if /dev/nvidia does not exist at this point we should still fallback to nouveau.


> 
> > ConditionPathExists=!/dev/nvidia
> > 
> > ConditionPathExists=!/dev/dri/card0
> > 
> > This will trigger on optimus machines even if the nouveau driver is not
> > loaded as the i915 driver will be /dev/dri/card0 then
> Yes, this is on purpose. I confirm that using intel is enough to (fallback)
> boot on optimus and this can be corrected later by either modprobing a fixed
> nvidia or nouveau.

This depends on the laptops on laptops where the lcd-screen is routed to the nvidia GPU at boot (some macbooks for example) having the i915 driver is not enough.

> This also have a benefit to not modprobe nouveau where no nvidia card is
> involved at all, if not set, the fallback will be attempted with the nvidia
> driver is installed and not nvidia card present which is a case that is
> supportable.
> (think about livemedia).

Doing  a "modprobe nouveau" on a system without an nvidia card is harmless.

I like the idea of trying to do everything with just a .service file and without a shell script and I think this might work, but we should then only have the: "ConditionPathExists=!/dev/nvidia" condition and not the other 2, then this should work I think.

Let me know if you agree with this then I will run some tests with a version of your .service file with the other 2 conditions removed.

Regards,

Hans
Comment 4 Nicolas Chauvet 2017-04-03 16:02:50 CEST
(In reply to Hans de Goede from comment #3)
> Hi,
> 
> (In reply to Nicolas Chauvet from comment #2)
> > (In reply to Hans de Goede from comment #1)
> > > ConditionKernelCommandLine=rd.driver.blacklist=nouveau
> > > 
> > > This is not enough there could be a modprobe.conf file blacklisting nouveau
> > > in the initrd
> > This blacklist has not effect on an explict modprobe. Please at least tests
> > things.
> 
> "This blacklist has not effect on an explict modprobe." Correct, and yes I
> got that part wrong in my earlier version.
> 
> But that is not my point, you're only doing the fallback if the kernel
> cmdline contains "rd.driver.blacklist=nouveau" but even if it does not
> contain that and there is a modprobe.conf blacklisting nouveau it will still
> not be loaded when we execute this script and if /dev/nvidia does not exist
> at this point we should still fallback to nouveau.

So to sum-up your point, you say that rd.driver.blacklist=nouveau is optional for the nvidia driver to boot. This is true. 

But this requires the blacklist-nouveau.conf to be bundled into the initramfs. If the blacklist is only available in the rootfs this is too late as KMS drivers are loaded early.



To me the question become: Which element should determine which driver should be attempted ? (either rd.driver.blacklist=nouveau or blacklist-nouveau.conf)

My point is that it should be rd.driver.blacklist=nouveau that implicitly say we want nvidia-drm and fallback to nouveau if something breaks.

From the opposite side, we could have rd.driver.blacklist=nvidia-drm to only load FOSS drivers while keeping the nvidia driver installed.

If that rd.driver.blacklist controls which driver is loaded from early boot to final boot, we could have a dracut script (bundled by default within dracut), that will write down a blacklist-dracut.conf appropriately according to the rd.driver.blacklist values.

Then if one wants a userspace application to switch from FOSS/Proprietary stack, it's only a matter to remove rd.driver.blacklist=nouveau and add rd.driver.blacklist=nvidia-drm (using grubby)
Everything else should be kept as implementation details.


> > > ConditionPathExists=!/dev/nvidia
> > > 
> > > ConditionPathExists=!/dev/dri/card0
> > > 
[...]
> This depends on the laptops on laptops where the lcd-screen is routed to the
> nvidia GPU at boot (some macbooks for example) having the i915 driver is not
> enough.
I don't know this case very well, but if the switch is made at boot, I don't think there is an intel adapter involved at all for such a given boot, so no /dev/dri/card0 either. (as it would be the device hold by nouveau).

> > This also have a benefit to not modprobe nouveau where no nvidia card is
> > involved at all, if not set, the fallback will be attempted with the nvidia
> > driver is installed and not nvidia card present which is a case that is
> > supportable.
> > (think about livemedia).
> 
> Doing  a "modprobe nouveau" on a system without an nvidia card is harmless.
True, but also pointless. As everything pointless, there is miss-design involved. So not very satisfying for the mind.
But keeping "-" in /sbin/modprobe nouveau would 


> I like the idea of trying to do everything with just a .service file and
> without a shell script and I think this might work, but we should then only
> have the: "ConditionPathExists=!/dev/nvidia" condition and not the other 2,
> then this should work I think.
> 
> Let me know if you agree with this then I will run some tests with a version
> of your .service file with the other 2 conditions removed.

Any tests are good, I'm not totally against removing the 2 conditions, but controlling the blacklist*.conf will be harder from a systemd service unit point of view. So that's why I think it should be better to move the issue.
Comment 5 Hans de Goede 2017-04-12 12:48:30 CEST
(In reply to Nicolas Chauvet from comment #4)
> (In reply to Hans de Goede from comment #3)

<snip>

> > But that is not my point, you're only doing the fallback if the kernel
> > cmdline contains "rd.driver.blacklist=nouveau" but even if it does not
> > contain that and there is a modprobe.conf blacklisting nouveau it will still
> > not be loaded when we execute this script and if /dev/nvidia does not exist
> > at this point we should still fallback to nouveau.
> 
> So to sum-up your point, you say that rd.driver.blacklist=nouveau is
> optional for the nvidia driver to boot. This is true. 
> 
> But this requires the blacklist-nouveau.conf to be bundled into the
> initramfs. If the blacklist is only available in the rootfs this is too late
> as KMS drivers are loaded early.

As things currently work, we install a modprobe.conf blacklisting nouveau and after the first kernel upgrade / initrd re-generation this will end up in the initrd too, so even if the user in an attempt to fix things has removed rd.driver.blacklist=nouveau from the cmdline then nouveau may still not load.

Conclusion: our fallback.service should not check for rd.driver.blacklist=nouveau because even if that is absent the nouveau driver may still not have loaded.

> To me the question become: Which element should determine which driver
> should be attempted ? (either rd.driver.blacklist=nouveau or
> blacklist-nouveau.conf)
> 
> My point is that it should be rd.driver.blacklist=nouveau that implicitly
> say we want nvidia-drm and fallback to nouveau if something breaks.
> 
> From the opposite side, we could have rd.driver.blacklist=nvidia-drm to only
> load FOSS drivers while keeping the nvidia driver installed.
> 
> If that rd.driver.blacklist controls which driver is loaded from early boot
> to final boot, we could have a dracut script (bundled by default within
> dracut), that will write down a blacklist-dracut.conf appropriately
> according to the rd.driver.blacklist values.
> 
> Then if one wants a userspace application to switch from FOSS/Proprietary
> stack, it's only a matter to remove rd.driver.blacklist=nouveau and add
> rd.driver.blacklist=nvidia-drm (using grubby)
> Everything else should be kept as implementation details.

I think that having dracut generate a dynamic modprobe.conf from the cmdline and dropping the one we install is a good idea for the long term, but in the short term I would like to focus on getting things ready so that we can ship F26 with a rpmfusion-drivers repo file present. I do want to add a switching utility later and I like your idea, but lets do that later.

> > > > ConditionPathExists=!/dev/nvidia
> > > > 
> > > > ConditionPathExists=!/dev/dri/card0
> > > > 
> [...]
> > This depends on the laptops on laptops where the lcd-screen is routed to the
> > nvidia GPU at boot (some macbooks for example) having the i915 driver is not
> > enough.
> I don't know this case very well, but if the switch is made at boot, I don't
> think there is an intel adapter involved at all for such a given boot, so no
> /dev/dri/card0 either. (as it would be the device hold by nouveau).

Some macbooks keep both GPUs active but boot with the LCD connected to the discrete GPU and in general the rule with hw is if we can think of a weird combo, some manufacturer will have thought of it too and shipped hardware using it (as the macbooks in question show), so the ConditionPathExists=!/dev/dri/card0 really needs to go.

> > > This also have a benefit to not modprobe nouveau where no nvidia card is
> > > involved at all, if not set, the fallback will be attempted with the nvidia
> > > driver is installed and not nvidia card present which is a case that is
> > > supportable.
> > > (think about livemedia).
> > 
> > Doing  a "modprobe nouveau" on a system without an nvidia card is harmless.
> True, but also pointless. As everything pointless, there is miss-design
> involved. So not very satisfying for the mind.
> But keeping "-" in /sbin/modprobe nouveau would 
> 
> 
> > I like the idea of trying to do everything with just a .service file and
> > without a shell script and I think this might work, but we should then only
> > have the: "ConditionPathExists=!/dev/nvidia" condition and not the other 2,
> > then this should work I think.
> > 
> > Let me know if you agree with this then I will run some tests with a version
> > of your .service file with the other 2 conditions removed.
> 
> Any tests are good, I'm not totally against removing the 2 conditions, but
> controlling the blacklist*.conf will be harder from a systemd service unit
> point of view. So that's why I think it should be better to move the issue.

Ok, so let me repeat that I like your idea of solving this with just a .service file, thank you for both the idea and thank you for taking the time to implement it.

I think this should work fine, but we need to remove the 2 conditions. First lets get Simone's latest work on the spec-file merged, then I will look into adding this and prepate a new branch on Leigh's github for review with this and some other fixes for auto-fallback.
Comment 6 Epifanov Ivan 2017-04-12 13:14:14 CEST
Why do you insist on falling back to nouveau?
By doing this you only hide problems. User will get working desktop, true, but he won't know that there's something wrong with nvidia drivers (until he checks logs) and you'll have to deal with bugreports against nvidia, while in fact they are nouveau bugs.

I'd, personally, rather end up with console, and definitely know that there's something wrong with nvidia drivers, than with working, but buggy desktop (and i do have a burned nvidia gpu by nouveau's awful pm).
Comment 7 Hans de Goede 2017-05-08 10:59:15 CEST
I've tested the proposed .service file with the 2 discussed Conditions removed, with that it works fine.

I've submitted a pull-req for Simone's nvidia pkg to add it:
https://github.com/negativo17/nvidia-driver/pull/20

We Should add it to the rpmfusion pkgs.
Comment 8 Hans de Goede 2017-05-12 10:52:42 CEST
Correction further testing has shown that:

ConditionPathExists=!/dev/nvidia

Is always true at the time the service runs, the /dev/nvidia* files do not get created until the X server is started, I've fixed this by updating the check to:

ConditionPathExists=!/sys/modules/nvidia
Comment 9 Hans de Goede 2017-05-12 10:53:56 CEST
Ugh, that should be:

ConditionPathExists=!/sys/module/nvidia
Comment 10 Hans de Goede 2017-05-12 10:54:34 CEST
Created attachment 1778 [details]
Updated nvidia-fallback.service
Comment 11 Nicolas Chauvet 2017-05-13 13:32:20 CEST
(In reply to Epifanov Ivan from comment #6)
> Why do you insist on falling back to nouveau?
> By doing this you only hide problems. User will get working desktop, true,
> but he won't know that there's something wrong with nvidia drivers (until he
> checks logs) and you'll have to deal with bugreports against nvidia, while

@Hans
I partly agree with what Ivan said. This fallback script should report an error somehow so at least systemctl status show a degraded state when the nvidia driver should be used because the hardware is present and the module failed for some reason.

@Ivan
Now, if you don't like it to recover for you it still possible to mask the service using:
systemctl mask nvidia-fallback.service.
Just for your record, nvidia has bricked some devices from the past (or it was because of a bad implementation from the ODM or else). It was escalated by nvidia to the developer mailing list requesting us to issue an updated driver.

@Hans
I would prefer to fix every single known situation where the driver could fail to avoid to even have a fallback script [1]. But the principle is probably a better interim solution. Couldn't you envision the fallback to be implemented in the xorg-server side instead for the long run ? (so it attempts the next driver on the list based on real hw detection).

Also have you tested using graphical.target instead of multi-user.target or is it too late for a fallback here ? (If I need to recover and boot in multi-user, I don't want nouveau to bind the hw).

Right now modprobe.blacklist=nouveau is what allows us to avoid using a blacklist script, but so far it's only documented in fedora (not any el).

Based on your last change, I still mean theses options are needed.
ConditionKernelCommandLine=rd.driver.blacklist=nouveau
ConditionPathExists=!/dev/dri/card0
Using !/dev/dri/card0 will helps in the situation where there is already a driver binded with the hw so it will avoid to consider the system in a degraded state.

Also using
ConditionKernelCommandLine=rd.driver.blacklist=nouveau will also avoid to issue a spurious error when the script is run whereas the needed driver was already loaded.


[1] One long standing issue that can probably fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=1450577
Comment 12 Hans de Goede 2017-05-13 13:51:13 CEST
Hi,

(In reply to Nicolas Chauvet from comment #11)
> (In reply to Epifanov Ivan from comment #6)
> > Why do you insist on falling back to nouveau?
> > By doing this you only hide problems. User will get working desktop, true,
> > but he won't know that there's something wrong with nvidia drivers (until he
> > checks logs) and you'll have to deal with bugreports against nvidia, while
> 
> @Hans
> I partly agree with what Ivan said. This fallback script should report an
> error somehow so at least systemctl status show a degraded state when the
> nvidia driver should be used because the hardware is present and the module
> failed for some reason.

Now that I've the ConditionPathExists right, I believe that systemd will only show "Starting Fallback to nouveau if nvidia did not load" if it is actually going to modprobe nouveau, that is the best we can do without adding a shell script or some such I'm afraid.

> @Hans
> I would prefer to fix every single known situation where the driver could
> fail to avoid to even have a fallback script [1].

That is not going to work, we can never rule out user error, such as e.g. install the latest nvidia driver on a too old nvidia GPU.

> But the principle is
> probably a better interim solution. Couldn't you envision the fallback to be
> implemented in the xorg-server side instead for the long run ? (so it
> attempts the next driver on the list based on real hw detection).

This really does not belong in Xorg, besides that we are moving to wayland.

> Also have you tested using graphical.target instead of multi-user.target or
> is it too late for a fallback here ? (If I need to recover and boot in
> multi-user, I don't want nouveau to bind the hw).

I've deliberately used multi-user.target, because some people always use that + startx. For recovery you can add a modprobe.blacklist=nouveau to the kernel cmdline.

> Right now modprobe.blacklist=nouveau is what allows us to avoid using a
> blacklist script, but so far it's only documented in fedora (not any el).

Yes I saw you started using that, I did not know about it. I plan to take a look at using this for F25+ to add a script to easily switch between the 2 drivers by just manipulating the kernel cmdline through grubby (and dropping the "blacklist nouveau" from the modprobe.d/nvidia.conf file).

> Based on your last change, I still mean theses options are needed.
> ConditionKernelCommandLine=rd.driver.blacklist=nouveau
> ConditionPathExists=!/dev/dri/card0
> Using !/dev/dri/card0 will helps in the situation where there is already a
> driver binded with the hw so it will avoid to consider the system in a
> degraded state.

We've been over this already, please read my answers to this again, nothing has changed and this still will not work in all scenarios.
Comment 13 Nicolas Chauvet 2017-05-13 17:02:41 CEST
(In reply to Hans de Goede from comment #12)
...

> > Also have you tested using graphical.target instead of multi-user.target or
> > is it too late for a fallback here ? (If I need to recover and boot in
> > multi-user, I don't want nouveau to bind the hw).
> 
> I've deliberately used multi-user.target, because some people always use
> that + startx. For recovery you can add a modprobe.blacklist=nouveau to the
> kernel cmdline.
I expect that modprobe.blacklist=nouveau would only take into action when using modprobe -b nouveau (not tested).

I think users using startx can probably recover themselves, I don't think you can break a common way to recover from graphical boot using "init 3" for startx users.
Using the fallback will be annoying for server cases as those can recover remotely without issue as they are in multi-user.target by default.
 
> > Right now modprobe.blacklist=nouveau is what allows us to avoid using a
> > blacklist script, but so far it's only documented in fedora (not any el).
> 
> Yes I saw you started using that, I did not know about it. I plan to take a
> look at using this for F25+ to add a script to easily switch between the 2
> drivers by just manipulating the kernel cmdline through grubby (and dropping
> the "blacklist nouveau" from the modprobe.d/nvidia.conf file).
Yes, this is a good step forward. Let's move on this in:
https://bugzilla.rpmfusion.org/show_bug.cgi?id=4315

> > Based on your last change, I still mean theses options are needed.
> > ConditionKernelCommandLine=rd.driver.blacklist=nouveau
> > ConditionPathExists=!/dev/dri/card0
> > Using !/dev/dri/card0 will helps in the situation where there is already a
> > driver binded with the hw so it will avoid to consider the system in a
> > degraded state.
> 
> We've been over this already, please read my answers to this again, nothing
> has changed and this still will not work in all scenarios.
quoting you, "This depends on the laptops on laptops where the lcd-screen is routed to the nvidia GPU at boot (some macbooks for example) having the i915 driver is not enough."
Sure old/pre optimus are routed to nvidia or intel at boot which means only one hw is available at the same time so when nvidia is selected, /dev/dri/card0 won't be available and the fallback will run. (at least I have one hardware at hands that behave like this).
Having /dev/dri/card0 means there is enough as a fallback (one can still have graphical boot using intel adapter on other optimus laptop).

If the service is made to fail, which is totally appropriate, there is a need to seek for a good reason to fail. So as you said, not using /dev/dri/card0 might be very complex without using a more complex shell script.
Comment 14 Hans de Goede 2017-05-13 17:10:30 CEST
Hi,

(In reply to Nicolas Chauvet from comment #13)
> (In reply to Hans de Goede from comment #12)
> ...
> 
> > > Also have you tested using graphical.target instead of multi-user.target or
> > > is it too late for a fallback here ? (If I need to recover and boot in
> > > multi-user, I don't want nouveau to bind the hw).
> > 
> > I've deliberately used multi-user.target, because some people always use
> > that + startx. For recovery you can add a modprobe.blacklist=nouveau to the
> > kernel cmdline.
> I expect that modprobe.blacklist=nouveau would only take into action when
> using modprobe -b nouveau (not tested).

Ah right, sorry.

> I think users using startx can probably recover themselves, I don't think
> you can break a common way to recover from graphical boot using "init 3" for
> startx users.
> Using the fallback will be annoying for server cases as those can recover
> remotely without issue as they are in multi-user.target by default.

I see your point I wanted to be consistent independent of users are using multi-user.target or graphical.target, but using graphical.target indeed seems better, so lets move to using that.

Regards,

Hans
Comment 15 Nicolas Chauvet 2017-05-17 14:29:32 CEST
I expect one way to avoid to run nvidia-fallback when not relevant is to active it using udev instead of having the service enabled by default.
But I don't know if service activated that way won't run too early. 
and if the conditions to start will still apply.
Comment 16 Nicolas Chauvet 2017-06-21 11:42:00 CEST
So I've tested this:

cat > /etc/udev/rules.d/10-nvidia.rules<<EOF
# RPM Fusion nvidia udev rules
SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", ATTRS{boot_vga}=="1", TAG+="systemd", ENV{SYSTEMD_WANTS}="nvidia-fallback.service"
EOF

cat > /etc/systemd/system/nvidia-fallback.service<<EOF
[Unit]
Description=Fallback to nouveau if nvidia did not load
After=akmods.service
Before=display-manager.service
ConditionKernelCommandLine=modprobe.blacklist=nouveau
ConditionPathExists=!/sys/module/nvidia


[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=-/sbin/modprobe nouveau


[Install]
WantedBy=multi-user.target
EOF

With the udev boot_vga rule, it's now certain that there is a nvidia card driving the "boot" display. This isn't the case in (modern) optimus where the boot vga property is available for the intel card. There, the fallback isn't triggeed on purpose since the nouveau module is optional to have a working display.
I'm not sure of the situation for older MAC, but I expect this can be fixed later by a more specific udev rule.

Right now the status of the service is "active" if the fallback was triggered (no nvidia.ko was loaded) and inactive if the nvidia is available or the wrong cmdline (nouveau or else should then load the normal way). 
A later version can report an error and a degraded state to the user with systemctl status. (TODO).
Then if one want to avoid the nvidia-fallback to tweak in his back, the service can be masked.

I haven't tested any situation where the nvidia-fallback.service could end in the initramfs...

Please review.
Comment 17 Hans de Goede 2017-06-21 14:39:29 CEST
(In reply to Nicolas Chauvet from comment #16)
> With the udev boot_vga rule, it's now certain that there is a nvidia card
> driving the "boot" display. This isn't the case in (modern) optimus where
> the boot vga property is available for the intel card. There, the fallback
> isn't triggeed on purpose since the nouveau module is optional to have a
> working display.

That is not true, some external connectors (vga hdmi) may be attach the nvidia GPU only, so we do need either the nvidia or the nouveau driver even if the Intel GPU is the boot GPU to make sure all connected displays will work.

Really you're making this way more complicated then necessary. In the mean time the negativo17.org packages have been using the simple version of the systemd service for a while now with 0 issues reported by users.
Comment 18 Nicolas Chauvet 2017-06-21 16:04:21 CEST
(In reply to Hans de Goede from comment #17)
...
> Really you're making this way more complicated then necessary. In the mean
> time the negativ...
I'm really annoyed by you still referencing the fallback feature to be from that repo whereas this was community elaborated and discussed here (and nowhere else).
With the initial idea from you or even others.

So to me there is a remaining (non-blocking) issue as reported in
https://bugzilla.rpmfusion.org/show_bug.cgi?id=4498#c6:
- That the fallback mode needs to be reported to the users. (systemctl status show degrated state)
- As a consequence the fallback script only needs to be run when the nvidia devices is detected.


Using PCI ATTRS{class}=="0x030000" and ATTRS{class}=="0x030200" might be other means to detect the nvidia VGA card , according to #nouveau imirkin

I'm not fundamentally opposed to modprobe nouveau even in the Optimus case.
Comment 19 Hans de Goede 2017-06-21 16:26:47 CEST
(In reply to Nicolas Chauvet from comment #18)
> (In reply to Hans de Goede from comment #17)
> ...
> > Really you're making this way more complicated then necessary. In the mean
> > time the negativ...
> I'm really annoyed by you still referencing the fallback feature to be from
> that repo whereas this was community elaborated and discussed here (and
> nowhere else).
> With the initial idea from you or even others.

I've not written that this feature comes from the negativo17 org repo, I wrote that that repository has been using (shipping packages with) the feature for a while now.

> So to me there is a remaining (non-blocking) issue as reported in
> https://bugzilla.rpmfusion.org/show_bug.cgi?id=4498#c6:
> - That the fallback mode needs to be reported to the users. (systemctl
> status show degrated state)
> - As a consequence the fallback script only needs to be run when the nvidia
> devices is detected.

So this second non-blocking issue comes solely from the first one and IMHO will make things needlessly fragile. Anyways I'm open to suggestions how to report a degraded state if we end up modprobing nouveau from the service.

As for the second blocker of then not showing degraded state when their is no nvidia GPU: NACK that is going to be fragile / we are going to get this wrong and not fallback on some systems due to misdetection causing problems for users which is exactly what we are trying to avoid.
Comment 20 Nicolas Chauvet 2017-06-21 16:36:39 CEST
(In reply to Hans de Goede from comment #19)
...
> As for the second blocker of then not showing degraded state when their is
> no nvidia GPU: NACK that is going to be fragile / we are going to get this
> wrong and not fallback on some systems due to misdetection causing problems
> for users which is exactly what we are trying to avoid.
Can you elaborate ? With the PCI class we are certain to have the right device
One is VGA compatible controller: NVIDIA Corporation
the other is 3D controller: NVIDIA Corporation (as often found is modern optimus) 

There is also the 12d2 pci vendor class used by early nvidia riva chip.
This seems solid for me.
Comment 21 Hans de Goede 2017-06-21 16:43:01 CEST
(In reply to Nicolas Chauvet from comment #20)
> (In reply to Hans de Goede from comment #19)
> ...
> > As for the second blocker of then not showing degraded state when their is
> > no nvidia GPU: NACK that is going to be fragile / we are going to get this
> > wrong and not fallback on some systems due to misdetection causing problems
> > for users which is exactly what we are trying to avoid.
> Can you elaborate ? With the PCI class we are certain to have the right
> device
> One is VGA compatible controller: NVIDIA Corporation
> the other is 3D controller: NVIDIA Corporation (as often found is modern
> optimus) 
> 
> There is also the 12d2 pci vendor class used by early nvidia riva chip.
> This seems solid for me.

Until some future card uses a different PCI class, or uses an OEM vendor-id instead nvidia for the vendor-id.

Adding complexity always has a cost and always makes code more prone to bugs. IMHO the cost/benefit balance of checking that a nvidia gpu is present, vs not falls out in favor of not doing it, because KISS.

If you're worried about the degraded state causing bug-reports on systems without a nvidia gpu, then the solution is to simply not do the report degraded thingy either. Remember this is a fallback service, normally it should not do anything, so no need for extra feedback to the user since that will normally just always be "ok". As such is it best to keep this as simple as possible.
Comment 22 Nicolas Chauvet 2017-06-21 16:47:46 CEST
(In reply to Hans de Goede from comment #21)
.. 
> Until some future card uses a different PCI class, or uses an OEM vendor-id
> instead nvidia for the vendor-id.
Is this a real case ? The OEM thing ? how would nouveau handle this ?
Comment 23 Hans de Goede 2017-06-21 16:58:50 CEST
(In reply to Nicolas Chauvet from comment #22)
> (In reply to Hans de Goede from comment #21)
> .. 
> > Until some future card uses a different PCI class, or uses an OEM vendor-id
> > instead nvidia for the vendor-id.
> Is this a real case ? The OEM thing ? how would nouveau handle this ?

I've seen it happen with plenty of USB devices, but it seems that with PCI devices OEMs or content on putting their vendor-id in the sub-vendor-id field, so it seems that no this does not happen with PCI.
Comment 24 Nicolas Chauvet 2017-08-07 08:29:02 CEST
It's implemented in f25+ currently holding in updates-testing because of the hdmi/dp audio bug.