| Summary: | wireguard causes CPU soft lockup | ||
|---|---|---|---|
| Product: | Fedora EPEL | Reporter: | Brian J. Murrell <brian> |
| Component: | wireguard-kmod | Assignee: | Nicolas Chauvet <kwizart> |
| Status: | RESOLVED WONTFIX | ||
| Severity: | enhancement | ||
| Priority: | P1 | ||
| Version: | 8 | ||
| Hardware: | x86_64 | ||
| OS: | GNU/Linux | ||
| namespace: | |||
|
Description
Brian J. Murrell
2023-01-16 23:30:04 CET
Thanks for the report: https://koji.rpmfusion.org/koji/taskinfo?taskID=580262 I've backported a fix from elrepo (not upstream unfortunately). The package will be pushed to testing repos in the coming days... Please report. component created. Thanks for the update. However https://koji.rpmfusion.org/koji/taskinfo?taskID=580262 has the kernel module for kernel 4.18.0-425.3.1.el8.x86_64 however EL8.7 is currently on 4.18.0-425.10.1.el8_7.x86_64. While I can appreciate that there is a[n a]kmod for it, if a binary kernel module is going to be produced, shouldn't it be for the current kernel? Oh, wait. I suppose kmod-wireguard-4.18.0-425.3.1.el8.x86_64-1.0.20220627-2.el8.x86_64 is kABI module that will also load into the 4.18.0-425.10.1.el8_7.x86_64 kernel. If so, then the update in https://koji.rpmfusion.org/koji/taskinfo?taskID=580262 does not fix the soft lockups. So maybe this one is different from https://elrepo.org/bugs/view.php?id=1283? I've tried another attempt https://koji.rpmfusion.org/koji/taskinfo?taskID=580272 Basically it's the same patch on the same kernel that elrepo has applied. Alternative would be that newer RHEL kernel has broke kABI with this particular module, so I will have to rebuild against a newer kernel (despite the minor version is the same). If still not work, please see: https://lists.zx2c4.com/pipermail/wireguard/2022-June/007664.html IN which case I would suggest to consider using https://copr.fedorainfracloud.org/coprs/kwizart/kernel-longterm-5.15/ That are wireguard support... (or move to RHEL9)... Still your feedback is appreciate and also if you can figure out a patch. rfpkg clone free/wireguard-kmod Unfortunately that new build is still causing soft lockups: [12897.416072] wireguard: module verification failed: signature and/or required key missing - tainting kernel [12897.424844] wireguard: WireGuard 1.0.20220627 loaded. See www.wireguard.com for information. [12897.429423] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. [12923.108393] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:4:60411] [12923.113325] Modules linked in: wireguard(E) ip6_udp_tunnel udp_tunnel binfmt_misc ib_core mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tun nft_counter xt_conntrack ipt_REJECT xt_comment xt_owner nft_compat nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat intel_rapl_msr intel_rapl_common joydev pcspkr i2c_piix4 xfs libcrc32c nvme_tcp(X) nvme_fabrics nvme nvme_core crct10dif_pclmul crc32_pclmul crc32c_intel bochs sd_mod drm_vram_helper drm_kms_helper t10_pi syscopyarea sg sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm drm ata_generic ghash_clmulni_intel ata_piix virtio_net libata serio_raw virtio_scsi net_failover failover dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb [12923.113396] qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [12923.223678] Red Hat flags: eBPF/rawtrace [12923.225862] CPU: 1 PID: 60411 Comm: kworker/1:4 Kdump: loaded Tainted: G E X --------- - - 4.18.0-425.10.1.el8_7.x86_64 #1 [12923.232543] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.4.1 12/03/2020 [12923.236979] Workqueue: events_power_efficient wg_ratelimiter_gc_entries [wireguard] [12923.241227] RIP: 0010:native_safe_halt+0xe/0x20 [12923.243873] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d d6 96 41 00 fb f4 <e9> ed 09 21 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00 [12923.302473] RSP: 0018:ffffa60e45317e10 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [12923.306652] RAX: 0000000000000003 RBX: ffffffffc09b5fb8 RCX: 0000000000000008 [12923.310790] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffffc09b5fb8 [12923.314931] RBP: ffff8f71bd32bcc0 R08: 0000000000000008 R09: 000000000000005c [12923.320546] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000 [12923.325500] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000080000 [12923.329398] FS: 0000000000000000(0000) GS:ffff8f71bd300000(0000) knlGS:0000000000000000 [12923.334009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12923.337117] CR2: 00005618fbfb2f44 CR3: 0000000018c10000 CR4: 00000000003506e0 [12923.340925] Call Trace: [12923.393409] kvm_wait+0x58/0x60 [12923.397646] __pv_queued_spin_lock_slowpath+0x268/0x2a0 [12923.401519] _raw_spin_lock+0x1e/0x30 [12923.403835] wg_ratelimiter_gc_entries+0x49/0x170 [wireguard] [12923.407317] process_one_work+0x1a7/0x360 [12923.410226] worker_thread+0x30/0x390 [12923.412572] ? create_worker+0x1a0/0x1a0 [12923.415222] kthread+0x10b/0x130 [12923.417588] ? set_kthread_struct+0x50/0x50 [12923.420876] ret_from_fork+0x35/0x40 What is strange is that this all worked for a few days (at least) before I rebooted the machine where it started with this soft lockup business. Can you verify that 4.18.0-425.3.1.el8.x86_64 also has the error ? Maybe you could give a try to akmod-wireguard ? FWIW: Name : kmod-wireguard Epoch : 7 Version : 1.0.20220627 Release : 4.el8_7.elrepo Architecture: x86_64 Install Date: Tue 17 Jan 2023 05:57:12 PM GMT Group : System Environment/Kernel Size : 363219 License : GPLv2 Signature : DSA/SHA256, Sat 14 Jan 2023 10:25:50 PM GMT, Key ID 309bc305baadae52 Source RPM : kmod-wireguard-1.0.20220627-4.el8_7.elrepo.src.rpm Build Date : Sat 14 Jan 2023 10:23:28 PM GMT Build Host : Build64R8.elrepo.org Relocations : (not relocatable) Packager : Philip J Perry <phil@elrepo.org> Vendor : The ELRepo Project (https://elrepo.org) URL : https://git.zx2c4.com/wireguard-linux-compat/ seems to be working. I'd rather only configure rpmfusion TBH though. Latest wireguard package from elrepo is a rebuild with newer kernel. But the release bump wasn't published in git https://github.com/elrepo/packages/blob/master/wireguard-kmod/el8/wireguard-kmod.spec#L19 This break our assumption as we expect to build for the GA kernel, not any more or less random kernel during a minor release. That's for reproducibility with all our kmod. Now I can probably drop that and build with newer kernel... Now the issue will remain, wireguard on RHEL <9 will break from time to time... EL Repo bug tracker references https://access.redhat.com/solutions/6985596 and https://elrepo.org/bugs/view.php?id=1316 as the cause of this and specifically councils rebuilding the kmod on 4.18.0-425.10.1.el8_7. OK, here is a new attempt with the newer kernel as a base: https://koji.rpmfusion.org/koji/buildinfo?buildID=24709 Anyone to confirm usability with current package on current kernel ? Upstream wireguard has stopped supporting RHEL8 kernel and RHEL9 has wireguard by default. Also this kmod failstobuild from sources, so I've removed it from the el8 repos. Closing to WONTFIX |