aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-30ipv4: fix data-races around inet->inet_idEric Dumazet
[ Upstream commit f866fbc842de5976e41ba874b76ce31710b634b5 ] UDP sendmsg() is lockless, so ip_select_ident_segs() can very well be run from multiple cpus [1] Convert inet->inet_id to an atomic_t, but implement a dedicated path for TCP, avoiding cost of a locked instruction (atomic_add_return()) Note that this patch will cause a trivial merge conflict because we added inet->flags in net-next tree. v2: added missing change in drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c (David Ahern) [1] BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1: ip_select_ident_segs include/net/ip.h:542 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0: ip_select_ident_segs include/net/ip.h:541 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x184d -> 0x184e Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 ================================================================== Fixes: 23f57406b82d ("ipv4: avoid using shared IP generator for connected sockets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30net: validate veth and vxcan peer ifindexesJakub Kicinski
[ Upstream commit f534f6581ec084fe94d6759f7672bd009794b07e ] veth and vxcan need to make sure the ifindexes of the peer are not negative, core does not validate this. Using iproute2 with user-space-level checking removed: Before: # ./ip link add index 10 type veth peer index -1 # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff 10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff Now: $ ./ip link add index 10 type veth peer index -1 Error: ifindex can't be negative. This problem surfaced in net-next because an explicit WARN() was added, the root cause is older. Fixes: e6f8f1a739b6 ("veth: Allow to create peer link with given ifindex") Fixes: a8f820a380a2 ("can: add Virtual CAN Tunnel driver (vxcan)") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30net: bcmgenet: Fix return value check for fixed_phy_register()Ruan Jinjie
[ Upstream commit 32bbe64a1386065ab2aef8ce8cae7c689d0add6e ] The fixed_phy_register() function returns error pointers and never returns NULL. Update the checks accordingly. Fixes: b0ba512e25d7 ("net: bcmgenet: enable driver to work without a device tree") Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30net: bgmac: Fix return value check for fixed_phy_register()Ruan Jinjie
[ Upstream commit 23a14488ea5882dea5851b65c9fce2127ee8fcad ] The fixed_phy_register() function returns error pointers and never returns NULL. Update the checks accordingly. Fixes: c25b23b8a387 ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets") Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30net: dsa: mt7530: fix handling of 802.1X PAE framesArınç ÜNAL
[ Upstream commit e94b590abfff2cdbf0bdaa7d9904364c8d480af5 ] 802.1X PAE frames are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as regular multicast frames, therefore flooding them to user ports. To fix this, set 802.1X PAE frames to be trapped to the CPU port(s). Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30selftests: mlxsw: Fix test failure on Spectrum-4Ido Schimmel
[ Upstream commit f520489e99a35b0a5257667274fbe9afd2d8c50b ] Remove assumptions about shared buffer cell size and instead query the cell size from devlink. Adjust the test to send small packets that fit inside a single cell. Tested on Spectrum-{1,2,3,4}. Fixes: 4735402173e6 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mlxsw: Fix the size of 'VIRT_ROUTER_MSB'Amit Cohen
[ Upstream commit 348c976be0a599918b88729def198a843701c9fe ] The field 'virtual router' was extended to 12 bits in Spectrum-4. Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for Spectrum < 4 and 4 bits for Spectrum >= 4. The elements are stored in an internal storage scratchpad. Currently, the MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs can be used for multicast routing, as the highest bit is not really used by the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust the definitions of 'virtual router' field in the blocks accordingly - use '_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask in parse function to use 4 bits. Fixes: 6d5d8ebb881c ("mlxsw: Rename virtual router flex key element") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.1692268427.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mlxsw: reg: Fix SSPR register layoutIdo Schimmel
[ Upstream commit 0dc63b9cfd4c5666ced52c829fdd65dcaeb9f0f1 ] The two most significant bits of the "local_port" field in the SSPR register are always cleared since they are overwritten by the deprecated and overlapping "sub_port" field. On systems with more than 255 local ports (e.g., Spectrum-4), this results in the firmware maintaining invalid mappings between system port and local port. Specifically, two different systems ports (0x1 and 0x101) point to the same local port (0x1), which eventually leads to firmware errors. Fix by removing the deprecated "sub_port" field. Fixes: fd24b29a1b74 ("mlxsw: reg: Align existing registers to use extended local_port field") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.1692268427.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTCDanielle Ratson
[ Upstream commit bc2de151ab6ad0762a04563527ec42e54dde572a ] Currently, in Spectrum-2 and above, time stamps are extracted from the CQE into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE time stamp type is UTC. The time stamps are read directly from the CQE and software can get the time stamp in UTC format using CQEv2. From Spectrum-4, the time stamps that are read from the CQE are allowed to be also from MIRROR_UTC type. Therefore, we get a warning [1] from the driver that the time stamp fields were not set, when LLDP control packet is sent. Allow the time stamp type to be MIRROR_UTC and set the time stamp in this case as well. [1] WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum] [...] Call Trace: <IRQ> mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum] mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core] mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci] tasklet_action_common.constprop.0+0x9f/0x110 __do_softirq+0xbb/0x296 irq_exit_rcu+0x79/0xa0 common_interrupt+0x86/0xa0 </IRQ> <TASK> Fixes: 4735402173e6 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.1692268427.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()Lu Wei
[ Upstream commit 043d5f68d0ccdda91029b4b6dce7eeffdcfad281 ] There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with L3S mode and ipvlan2 with L2 mode are created based on them as figure (1). In this case, ipvlan_register_nf_hook() will be called to register nf hook which is needed by ipvlans in L3S mode in ns1 and value of ipvl_nf_hook_refcnt is set to 1. (1) ns1 ns2 ------------ ------------ veth1--ipvlan1 (L3S) veth3--ipvlan2 (L2) (2) ns1 ns2 ------------ ------------ veth1--ipvlan1 (L3S) ipvlan2 (L2) veth3 | | |------->-------->--------->-------- migrate When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event: dev_change_net_namespace call_netdevice_notifiers ipvlan_device_event ipvlan_migrate_l3s_hook ipvlan_register_nf_hook(newnet) (I) ipvlan_unregister_nf_hook(oldnet) (II) In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0 since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to register nf_hook in ns2 and unregister nf_hook in ns1. As a result, ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2 is increased incorrectly. When the second net namespace is removed, a reference count leak warning in ipvlan_ns_exit() will be triggered. This patch add a check before ipvlan_migrate_l3s_hook() is called. The warning can be triggered as follows: $ ip netns add ns1 $ ip netns add ns2 $ ip netns exec ns1 ip link add veth1 type veth peer name veth2 $ ip netns exec ns1 ip link add veth3 type veth peer name veth4 $ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s $ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2 $ ip netns exec ns1 ip link set veth3 netns ns2 $ ip net del ns2 Fixes: 3133822f5ac1 ("ipvlan: use pernet operations and restrict l3s hooks to master netns") Signed-off-by: Lu Wei <luwei32@huawei.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30dccp: annotate data-races in dccp_poll()Eric Dumazet
[ Upstream commit cba3f1786916063261e3e5ccbb803abc325b24ef ] We changed tcp_poll() over time, bug never updated dccp. Note that we also could remove dccp instead of maintaining it. Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30sock: annotate data-races around prot->memory_pressureEric Dumazet
[ Upstream commit 76f33296d2e09f63118db78125c95ef56df438e9 ] *prot->memory_pressure is read/writen locklessly, we need to add proper annotations. A recent commit added a new race, it is time to audit all accesses. Fixes: 2d0c88e84e48 ("sock: Fix misuse of sk_under_memory_pressure()") Fixes: 4d93df0abd50 ("[SCTP]: Rewrite of sctp buffer management code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Abel Wu <wuyun.abel@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gatesVladimir Oltean
[ Upstream commit d44036cad31170da0cb9c728e80743f84267da6e ] The blamed commit resolved a bug where frames would still get stuck at egress, even though they're smaller than the maxSDU[tc], because the driver did not take into account the extra 33 ns that the queue system needs for scheduling the frame. It now takes that into account, but the arithmetic that we perform in vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on 64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS may become a very large integer if gate_len_ns < 33 ns. In practice, this means that we've introduced a regression where all traffic class gates which are permanently closed will not get detected by the driver, and we won't enable oversize frame dropping for them. Before: mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000 mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS After: mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000 mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30devlink: add missing unregister linecard notificationJiri Pirko
[ Upstream commit 2ebbc9752d06bb1d01201fe632cb6da033b0248d ] Cited fixes commit introduced linecard notifications for register, however it didn't add them for unregister. Fix that by adding them. Fixes: c246f9b5fd61 ("devlink: add support to create line card and expose to user") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30devlink: move code to a dedicated directoryJakub Kicinski
[ Upstream commit f05bd8ebeb69c803efd6d8a76d96b7fcd7011094 ] The devlink code is hard to navigate with 13kLoC in one file. I really like the way Michal split the ethtool into per-command files and core. It'd probably be too much to split it all up, but we can at least separate the core parts out of the per-cmd implementations and put it in a directory so that new commands can be separate files. Move the code, subsequent commit will do a partial split. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 2ebbc9752d06 ("devlink: add missing unregister linecard notification") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30octeontx2-af: SDP: fix receive link configHariprasad Kelam
[ Upstream commit 05f3d5bc23524bed6f043dfe6b44da687584f9fb ] On SDP interfaces, frame oversize and undersize errors are observed as driver is not considering packet sizes of all subscribers of the link before updating the link config. This patch fixes the same. Fixes: 9b7dd87ac071 ("octeontx2-af: Support to modify min/max allowed packet lengths") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230817063006.10366-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30tracing: Fix memleak due to race between current_tracer and traceZheng Yejian
[ Upstream commit eecb91b9f98d6427d4af5fdb8f108f52572a39e7 ] Kmemleak report a leak in graph_trace_open(): unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s) hex dump (first 32 bytes): e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}.......... f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e... backtrace: [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0 [<000000007df90faa>] graph_trace_open+0xb0/0x344 [<00000000737524cd>] __tracing_open+0x450/0xb10 [<0000000098043327>] tracing_open+0x1a0/0x2a0 [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0 [<000000004015bcd6>] vfs_open+0x98/0xd0 [<000000002b5f60c9>] do_open+0x520/0x8d0 [<00000000376c7820>] path_openat+0x1c0/0x3e0 [<00000000336a54b5>] do_filp_open+0x14c/0x324 [<000000002802df13>] do_sys_openat2+0x2c4/0x530 [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4 [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394 [<00000000313647bf>] do_el0_svc+0xac/0xec [<000000002ef1c651>] el0_svc+0x20/0x30 [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4 [<000000000c309c35>] el0_sync+0x160/0x180 The root cause is descripted as follows: __tracing_open() { // 1. File 'trace' is being opened; ... *iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is // currently set; ... iter->trace->open(iter); // 3. Call graph_trace_open() here, // and memory are allocated in it; ... } s_start() { // 4. The opened file is being read; ... *iter->trace = *tr->current_trace; // 5. If tracer is switched to // 'nop' or others, then memory // in step 3 are leaked!!! ... } To fix it, in s_start(), close tracer before switching then reopen the new tracer after switching. And some tracers like 'wakeup' may not update 'iter->private' in some cases when reopen, then it should be cleared to avoid being mistakenly closed again. Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30tracing: Fix cpu buffers unavailable due to 'record_disabled' missedZheng Yejian
[ Upstream commit b71645d6af10196c46cbe3732de2ea7d36b3ff6d ] Trace ring buffer can no longer record anything after executing following commands at the shell prompt: # cd /sys/kernel/tracing # cat tracing_cpumask fff # echo 0 > tracing_cpumask # echo 1 > snapshot # echo fff > tracing_cpumask # echo 1 > tracing_on # echo "hello world" > trace_marker -bash: echo: write error: Bad file descriptor The root cause is that: 1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask()); 2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped with 'tr->max_buffer.buffer', then the 'record_disabled' became 0 (see update_max_tr()); 3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1; Then array_buffer and max_buffer are both unavailable due to value of 'record_disabled' is not 0. To fix it, enable or disable both array_buffer and max_buffer at the same time in tracing_set_cpumask(). Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Cc: <vnagarnaik@google.com> Cc: <shuah@kernel.org> Fixes: 71babb2705e2 ("tracing: change CPU ring buffer state from tracing_cpumask") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/i915/gt: Support aux invalidation on all enginesAndi Shyti
[ Upstream commit 6a35f22d222528e1b157c6978c9424d2f8cbe0a1 ] Perform some refactoring with the purpose of keeping in one single place all the operations around the aux table invalidation. With this refactoring add more engines where the invalidation should be performed. Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines") Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-8-andi.shyti@linux.intel.com (cherry picked from commit 76ff7789d6e63d1a10b3b58f5c70b2e640c7a880) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/i915/gt: Poll aux invalidation register bit on invalidationJonathan Cavitt
[ Upstream commit 0fde2f23516a00fd90dfb980b66b4665fcbfa659 ] For platforms that use Aux CCS, wait for aux invalidation to complete by checking the aux invalidation register bit is cleared. Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines") Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-7-andi.shyti@linux.intel.com (cherry picked from commit d459c86f00aa98028d155a012c65dc42f7c37e76) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/i915/gt: Ensure memory quiesced before invalidationJonathan Cavitt
[ Upstream commit 78a6ccd65fa3a7cc697810db079cc4b84dff03d5 ] All memory traffic must be quiesced before requesting an aux invalidation on platforms that use Aux CCS. Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines") Requires: a2a4aa0eef3b ("drm/i915: Add the gen12_needs_ccs_aux_inv helper") Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-4-andi.shyti@linux.intel.com (cherry picked from commit ad8ebf12217e451cd19804b1c3e97ad56491c74a) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/i915: Add the gen12_needs_ccs_aux_inv helperAndi Shyti
[ Upstream commit b2f59e9026038a5bbcbc0019fa58f963138211ee ] We always assumed that a device might either have AUX or FLAT CCS, but this is an approximation that is not always true, e.g. PVC represents an exception. Set the basis for future finer selection by implementing a boolean gen12_needs_ccs_aux_inv() function that tells whether aux invalidation is needed or not. Currently PVC is the only exception to the above mentioned rule. Requires: 059ae7ae2a1c ("drm/i915/gt: Cleanup aux invalidation registers") Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-3-andi.shyti@linux.intel.com (cherry picked from commit c827655b87ad201ebe36f2e28d16b5491c8f7801) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30s390/zcrypt: fix reply buffer calculations for CCA repliesHarald Freudenberger
[ Upstream commit 4cfca532ddc3474b3fc42592d0e4237544344b1a ] The length information for available buffer space for CCA replies is covered with two fields in the T6 header prepended on each CCA reply: fromcardlen1 and fromcardlen2. The sum of these both values must not exceed the AP bus limit for this card (24KB for CEX8, 12KB CEX7 and older) minus the always present headers. The current code adjusted the fromcardlen2 value in case of exceeding the AP bus limit when there was a non-zero value given from userspace. Some tests now showed that this was the wrong assumption. Instead the userspace value given for this field should always be trusted and if the sum of the two fields exceeds the AP bus limit for this card the first field fromcardlen1 should be adjusted instead. So now the calculation is done with this new insight in mind. Also some additional checks for overflow have been introduced and some comments to provide some documentation for future maintainers of this complicated calculation code. Furthermore the 128 bytes of fix overhead which is used in the current code is not correct. Investigations showed that for a reply always the same two header structs are prepended before a possible payload. So this is also fixed with this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30s390/zcrypt: remove unnecessary (void *) conversionsYu Zhe
[ Upstream commit 72c2112ce9d72e6c40dd893f32187a3d34453113 ] Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20230303052155.21072-1-yuzhe@nfschina.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Stable-dep-of: 4cfca532ddc3 ("s390/zcrypt: fix reply buffer calculations for CCA replies") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30can: raw: fix lockdep issue in raw_release()Eric Dumazet
[ Upstream commit 11c9027c983e9e4b408ee5613b6504d24ebd85be ] syzbot complained about a lockdep issue [1] Since raw_bind() and raw_setsockopt() first get RTNL before locking the socket, we must adopt the same order in raw_release() [1] WARNING: possible circular locking dependency detected 6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0 Not tainted ------------------------------------------------------ syz-executor.0/14110 is trying to acquire lock: ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1708 [inline] ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: raw_bind+0xb1/0xab0 net/can/raw.c:435 but task is already holding lock: ffffffff8e3df368 (rtnl_mutex){+.+.}-{3:3}, at: raw_bind+0xa7/0xab0 net/can/raw.c:434 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x181/0x1340 kernel/locking/mutex.c:747 raw_release+0x1c6/0x9b0 net/can/raw.c:391 __sock_release+0xcd/0x290 net/socket.c:654 sock_close+0x1c/0x20 net/socket.c:1386 __fput+0x3fd/0xac0 fs/file_table.c:384 task_work_run+0x14d/0x240 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (sk_lock-AF_CAN){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492 lock_sock include/net/sock.h:1708 [inline] raw_bind+0xb1/0xab0 net/can/raw.c:435 __sys_bind+0x1ec/0x220 net/socket.c:1792 __do_sys_bind net/socket.c:1803 [inline] __se_sys_bind net/socket.c:1801 [inline] __x64_sys_bind+0x72/0xb0 net/socket.c:1801 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_CAN); lock(rtnl_mutex); lock(sk_lock-AF_CAN); *** DEADLOCK *** 1 lock held by syz-executor.0/14110: stack backtrace: CPU: 0 PID: 14110 Comm: syz-executor.0 Not tainted 6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 check_noncircular+0x311/0x3f0 kernel/locking/lockdep.c:2195 check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492 lock_sock include/net/sock.h:1708 [inline] raw_bind+0xb1/0xab0 net/can/raw.c:435 __sys_bind+0x1ec/0x220 net/socket.c:1792 __do_sys_bind net/socket.c:1803 [inline] __se_sys_bind net/socket.c:1801 [inline] __x64_sys_bind+0x72/0xb0 net/socket.c:1801 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd89007cb29 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd890d2a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 RAX: ffffffffffffffda RBX: 00007fd89019bf80 RCX: 00007fd89007cb29 RDX: 0000000000000010 RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007fd8900c847a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fd89019bf80 R15: 00007ffebf8124f8 </TASK> Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ziyang Xuan <william.xuanziyang@huawei.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: stable@vger.kernel.org Cc: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/all/20230720114438.172434-1-edumazet@google.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30can: raw: fix receiver memory leakZiyang Xuan
[ Upstream commit ee8b94c8510ce64afe0b87ef548d23e00915fb10 ] Got kmemleak errors with the following ltp can_filter testcase: for ((i=1; i<=100; i++)) do ./can_filter & sleep 0.1 done ============================================================== [<00000000db4a4943>] can_rx_register+0x147/0x360 [can] [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw] [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0 [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70 [<00000000fd468496>] do_syscall_64+0x33/0x40 [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 It's a bug in the concurrent scenario of unregister_netdevice_many() and raw_release() as following: cpu0 cpu1 unregister_netdevice_many(can_dev) unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this net_set_todo(can_dev) raw_release(can_socket) dev = dev_get_by_index(, ro->ifindex); // dev == NULL if (dev) { // receivers in dev_rcv_lists not free because dev is NULL raw_disable_allfilters(, dev, ); dev_put(dev); } ... ro->bound = 0; ... call_netdevice_notifiers(NETDEV_UNREGISTER, ) raw_notify(, NETDEV_UNREGISTER, ) if (ro->bound) // invalid because ro->bound has been set 0 raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed Add a net_device pointer member in struct raw_sock to record bound can_dev, and use rtnl_lock to serialize raw_socket members between raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use ro->dev to decide whether to free receivers in dev_rcv_lists. Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier") Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30jbd2: fix a race when checking checkpoint buffer busyZhang Yi
[ Upstream commit 46f881b5b1758dc4a35fba4a643c10717d0cf427 ] Before removing checkpoint buffer from the t_checkpoint_list, we have to check both BH_Dirty and BH_Lock bits together to distinguish buffers have not been or were being written back. But __cp_buffer_busy() checks them separately, it first check lock state and then check dirty, the window between these two checks could be raced by writing back procedure, which locks buffer and clears buffer dirty before I/O completes. So it cannot guarantee checkpointing buffers been written back to disk if some error happens later. Finally, it may clean checkpoint transactions and lead to inconsistent filesystem. jbd2_journal_forget() and __journal_try_to_free_buffer() also have the same problem (journal_unmap_buffer() escape from this issue since it's running under the buffer lock), so fix them through introducing a new helper to try holding the buffer lock and remove really clean buffer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Cc: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30jbd2: remove journal_clean_one_cp_list()Zhang Yi
[ Upstream commit b98dba273a0e47dbfade89c9af73c5b012a4eabb ] journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost the same, so merge them into journal_shrink_one_cp_list(), remove the nr_to_scan parameter, always scan and try to free the whole checkpoint list. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30jbd2: remove t_checkpoint_io_listZhang Yi
[ Upstream commit be22255360f80d3af789daad00025171a65424a5 ] Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint() now, it's time to remove the whole t_checkpoint_io_list logic. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30MIPS: cpu-features: Use boot_cpu_type for CPU type based featuresJiaxun Yang
[ Upstream commit 5487a7b60695a92cf998350e4beac17144c91fcd ] Some CPU feature macros were using current_cpu_type to mark feature availability. However current_cpu_type will use smp_processor_id, which is prohibited under preemptable context. Since those features are all uniform on all CPUs in a SMP system, use boot_cpu_type instead of current_cpu_type to fix preemptable kernel. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30MIPS: cpu-features: Enable octeon_cache by cpu_typeJiaxun Yang
[ Upstream commit f641519409a73403ee6612b8648b95a688ab85c2 ] cpu_has_octeon_cache was tied to 0 for generic cpu-features, whith this generic kernel built for octeon CPU won't boot. Just enable this flag by cpu_type. It won't hurt orther platforms because compiler will eliminate the code path on other processors. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Stable-dep-of: 5487a7b60695 ("MIPS: cpu-features: Use boot_cpu_type for CPU type based features") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30PCI: acpiphp: Reassign resources on bridge if necessaryIgor Mammedov
[ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ] When using ACPI PCI hotplug, hotplugging a device with large BARs may fail if bridge windows programmed by firmware are not large enough. Reproducer: $ qemu-kvm -monitor stdio -M q35 -m 4G \ -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=on \ -device id=rp1,pcie-root-port,bus=pcie.0,chassis=4 \ disk_image wait till linux guest boots, then hotplug device: (qemu) device_add qxl,bus=rp1 hotplug on guest side fails with: pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000 pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff] pci 0000:01:00.0: reg 0x1c: [io 0x0000-0x001f] pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000] pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000] pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000] pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff] pci 0000:01:00.0: BAR 3: assigned [io 0x1000-0x101f] qxl 0000:01:00.0: enabling device (0000 -> 0003) Unable to create vram_mapping qxl: probe of 0000:01:00.0 failed with error -12 However when using native PCIe hotplug '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' it works fine, since kernel attempts to reassign unused resources. Use the same machinery as native PCIe hotplug to (re)assign resources. Link: https://lore.kernel.org/r/20230424191557.2464760-1-imammedo@redhat.com Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30video/aperture: Move vga handling to pci functionDaniel Vetter
[ Upstream commit f1d599d315fb7b7343cddaf365e671aaa8453aca ] A few reasons for this: - It's really the only one where this matters. I tried looking around, and I didn't find any non-pci vga-compatible controllers for x86 (since that's the only platform where we had this until a few patches ago), where a driver participating in the aperture claim dance would interfere. - I also don't expect that any future bus anytime soon will not just look like pci towards the OS, that's been the case for like 25+ years by now for practically everything (even non non-x86). - Also it's a bit funny if we have one part of the vga removal in the pci function, and the other in the generic one. v2: Rebase. v4: - fix Daniel's S-o-b address v5: - add back an S-o-b tag with Daniel's Intel address Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-6-tzimmermann@suse.de Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30video/aperture: Only kick vgacon when the pdev is decoding vgaDaniel Vetter
[ Upstream commit 7450cd235b45d43ee6f3c235f89e92623458175d ] Otherwise it's a bit silly, and we might throw out the driver for the screen the user is actually looking at. I haven't found a bug report for this case yet, but we did get bug reports for the analog case where we're throwing out the efifb driver. v2: Flip the check around to make it clear it's a special case for kicking out the vgacon driver only (Thomas) v4: - fixes to commit message - fix Daniel's S-o-b address v5: - add back an S-o-b tag with Daniel's Intel address Link: https://bugzilla.kernel.org/show_bug.cgi?id=216303 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-5-tzimmermann@suse.de Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/aperture: Remove primary argumentDaniel Vetter
[ Upstream commit 62aeaeaa1b267c5149abee6b45967a5df3feed58 ] Only really pci devices have a business setting this - it's for figuring out whether the legacy vga stuff should be nuked too. And with the preceding two patches those are all using the pci version of this. Which means for all other callers primary == false and we can remove it now. v2: - Reorder to avoid compile fail (Thomas) - Include gma500, which retained it's called to the non-pci version. v4: - fix Daniel's S-o-b address v5: - add back an S-o-b tag with Daniel's Intel address Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Deepak Rawat <drawat.floss@gmail.com> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Emma Anholt <emma@anholt.net> Cc: Helge Deller <deller@gmx.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: linux-hyperv@vger.kernel.org Cc: linux-amlogic@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-tegra@vger.kernel.org Cc: linux-fbdev@vger.kernel.org Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-4-tzimmermann@suse.de Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffersDaniel Vetter
[ Upstream commit 80e993988b97fe794f3ec2be6db05fe30f9353c3 ] This one nukes all framebuffers, which is a bit much. In reality gma500 is igpu and never shipped with anything discrete, so there should not be any difference. v2: Unfortunately the framebuffer sits outside of the pci bars for gma500, and so only using the pci helpers won't be enough. Otoh if we only use non-pci helper, then we don't get the vga handling, and subsequent refactoring to untangle these special cases won't work. It's not pretty, but the simplest fix (since gma500 really is the only quirky pci driver like this we have) is to just have both calls. v4: - fix Daniel's S-o-b address v5: - add back an S-o-b tag with Daniel's Intel address Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-2-tzimmermann@suse.de Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30fbdev/radeon: use pci aperture helpersDaniel Vetter
[ Upstream commit 9b539c4d1b921bc9c8c578d4d50f0a7e7874d384 ] It's not exactly the same since the open coded version doesn't set primary correctly. But that's a bugfix, so shouldn't hurt really. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linux-fbdev@vger.kernel.org Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230111154112.90575-7-daniel.vetter@ffwll.ch Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30drm/ast: Use drm_aperture_remove_conflicting_pci_framebuffersDaniel Vetter
[ Upstream commit c1ebead36099deb85384f6fb262fe619a04cee73 ] It's just open coded and matches. Note that Thomas said that his version apparently failed for some reason, but hey maybe we should try again. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Tested-by: Thomas Zimmmermann <tzimmermann@suse.de> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230111154112.90575-1-daniel.vetter@ffwll.ch Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30xprtrdma: Remap Receive buffers after a reconnectChuck Lever
[ Upstream commit 895cedc1791916e8a98864f12b656702fad0bb67 ] On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA- unmapping the Receive buffers, but rpcrdma_post_recvs() neglected to remap them after a new connection had been established. The result was immediate failure of the new connection with the Receives flushing with LOCAL_PROT_ERR. Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30NFSv4: fix out path in __nfs4_get_acl_uncachedFedor Pchelkin
[ Upstream commit f4e89f1a6dab4c063fc1e823cc9dddc408ff40cf ] Another highly rare error case when a page allocating loop (inside __nfs4_get_acl_uncached, this time) is not properly unwound on error. Since pages array is allocated being uninitialized, need to free only lower array indices. NULL checks were useful before commit 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had been initialized to zero on stack. Found by Linux Verification Center (linuxtesting.org). Fixes: 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30NFSv4.2: fix error handling in nfs42_proc_getxattrFedor Pchelkin
[ Upstream commit 4e3733fd2b0f677faae21cf838a43faf317986d3 ] There is a slight issue with error handling code inside nfs42_proc_getxattr(). If page allocating loop fails then we free the failing page array element which is NULL but __free_page() can't deal with NULL args. Found by Linux Verification Center (linuxtesting.org). Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-27Linux 6.1.49Greg Kroah-Hartman
Link: https://lore.kernel.org/r/20230826154625.450325166@linuxfoundation.org Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27Revert "f2fs: fix to do sanity check on direct node in truncate_dnode()"Greg Kroah-Hartman
This reverts commit a78a8bcdc26de5ef3a0ee27c9c6c512e54a6051c which is commit a6ec83786ab9f13f25fb18166dee908845713a95 upstream. Something is currently broken in the f2fs code, Guenter has reported boot problems with it for a few releases now, so revert the most recent f2fs changes in the hope to get this back to a working filesystem. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b392e1a8-b987-4993-bd45-035db9415a6e@roeck-us.net Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27Revert "f2fs: fix to set flush_merge opt and show noflush_merge"Greg Kroah-Hartman
This reverts commit 6ba0594a81f91d6fd8ca9bd4ad23aa1618635a0f which is commit 967eaad1fed5f6335ea97a47d45214744dc57925 upstream. Something is currently broken in the f2fs code, Guenter has reported boot problems with it for a few releases now, so revert the most recent f2fs changes in the hope to get this back to a working filesystem. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b392e1a8-b987-4993-bd45-035db9415a6e@roeck-us.net Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Yangtao Li <frank.li@vivo.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27Revert "f2fs: don't reset unchangable mount option in f2fs_remount()"Greg Kroah-Hartman
This reverts commit e2fb24ce37caeaecff08af4e9967c8462624312b which is commit 458c15dfbce62c35fefd9ca637b20a051309c9f1 upstream. Something is currently broken in the f2fs code, Guenter has reported boot problems with it for a few releases now, so revert the most recent f2fs changes in the hope to get this back to a working filesystem. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b392e1a8-b987-4993-bd45-035db9415a6e@roeck-us.net Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27objtool/x86: Fix SRSO messPeter Zijlstra
commit 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40 upstream. Objtool --rethunk does two things: - it collects all (tail) call's of __x86_return_thunk and places them into .return_sites. These are typically compiler generated, but RET also emits this same. - it fudges the validation of the __x86_return_thunk symbol; because this symbol is inside another instruction, it can't actually find the instruction pointed to by the symbol offset and gets upset. Because these two things pertained to the same symbol, there was no pressing need to separate these two separate things. However, alas, along comes SRSO and more crazy things to deal with appeared. The SRSO patch itself added the following symbol names to identify as rethunk: 'srso_untrain_ret', 'srso_safe_ret' and '__ret' Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a new similarly embedded return thunk, and 'srso_untrain_ret' is completely unrelated to anything the above does (and was only included because of that INT3 vs UD2 issue fixed previous). Clear things up by adding a second category for the embedded instruction thing. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26Linux 6.1.48Greg Kroah-Hartman
Link: https://lore.kernel.org/r/20230824141447.155846739@linuxfoundation.org Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/srso: Correct the mitigation status when SMT is disabledBorislav Petkov (AMD)
commit 6405b72e8d17bd1875a56ae52d23ec3cd51b9d66 upstream. Specify how is SRSO mitigated when SMT is disabled. Also, correct the SMT check for that. Fixes: e9fbc47b818b ("x86/srso: Disable the mitigation on unaffected configurations") Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26objtool/x86: Fixup frame-pointer vs rethunkPeter Zijlstra
commit dbf46008775516f7f25c95b7760041c286299783 upstream. For stack-validation of a frame-pointer build, objtool validates that every CALL instruction is preceded by a frame-setup. The new SRSO return thunks violate this with their RSB stuffing trickery. Extend the __fentry__ exception to also cover the embedded_insn case used for this. This cures: vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANGPetr Pavlu
commit 79cd2a11224eab86d6673fe8a11d2046ae9d2757 upstream. The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows: .text { [...] TEXT_TEXT [...] __indirect_thunk_start = .; *(.text.__x86.*) __indirect_thunk_end = .; [...] } Macro TEXT_TEXT references TEXT_MAIN which normally expands to only ".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes ".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk sections. The output layout is then different than expected. For instance, the currently defined range [__indirect_thunk_start, __indirect_thunk_end] becomes empty. Prevent the problem by using ".." as the first separator, for example, ".text..__x86.indirect_thunk". This pattern is utilized by other explicit section names which start with one of the standard prefixes, such as ".text" or ".data", and that need to be individually selected in the linker script. [ nathan: Fix conflicts with SRSO and fold in fix issue brought up by Andrew Cooper in post-review: https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ] Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>