aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-07-19wifi: cfg80211: fix regulatory disconnect for non-MLOJohannes Berg
commit b22552fcaf1970360005c805d7fba4046cf2ab4a upstream. The multi-link loop here broke disconnect when multi-link operation (MLO) isn't active for a given interface, since in that case valid_links is 0 (indicating no links, i.e. no MLO.) Fix this by taking that into account properly and skipping the link only if there are valid_links in the first place. Cc: stable@vger.kernel.org Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20230616222844.eb073d650c75.I72739923ef80919889ea9b50de9e4ba4baa836ae@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19net: dsa: sja1105: always enable the send_meta optionsVladimir Oltean
[ Upstream commit a372d66af48506d9f7aaae2a474cd18f14d98cb8 ] incl_srcpt has the limitation, mentioned in commit b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless PTP RX timestamping is enabled. The incl_srcpt option was initially unconditionally enabled, then that changed with commit 42824463d38d ("net: dsa: sja1105: Limit use of incl_srcpt to bridge+vlan mode"), then again with b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that it now needs to be always enabled, otherwise the driver does not have a reliable source of information regarding source_port and switch_id for link-local traffic (tag_8021q VLANs may be imprecise since now they identify an entire bridging domain when ports are not standalone). If we accept that PTP RX timestamping (and therefore, meta frame generation) is always enabled in hardware, then that limitation could be avoided and packets with any MAC DA can be properly received, because meta frames do contain the original bytes from the MAC DA of their associated link-local packet. This change enables meta frame generation unconditionally, which also has the nice side effects of simplifying the switch control path (a switch reset is no longer required on hwtstamping settings change) and the tagger data path (it no longer needs to be informed whether to expect meta frames or not - it always does). Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net: dsa: tag_sja1105: fix MAC DA patching from meta framesVladimir Oltean
[ Upstream commit 1dcf6efd5f0c1f4496b3ef7ec5a7db104a53b38c ] The SJA1105 manual says that at offset 4 into the meta frame payload we have "MAC destination byte 2" and at offset 5 we have "MAC destination byte 1". These are counted from the LSB, so byte 1 is h_dest[ETH_HLEN-2] aka h_dest[4] and byte 2 is h_dest[ETH_HLEN-3] aka h_dest[3]. The sja1105_meta_unpack() function decodes these the other way around, so a frame with MAC DA 01:80:c2:11:22:33 is received by the network stack as having 01:80:c2:22:11:33. Fixes: e53e18a6fe4d ("net: dsa: sja1105: Receive and decode meta frames") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EXLin Ma
[ Upstream commit 30c45b5361d39b4b793780ffac5538090b9e2eb1 ] The attribute TCA_PEDIT_PARMS_EX is not be included in pedit_policy and one malicious user could fake a TCA_PEDIT_PARMS_EX whose length is smaller than the intended sizeof(struct tc_pedit). Hence, the dereference in tcf_pedit_init() could access dirty heap data. static int tcf_pedit_init(...) { // ... pattr = tb[TCA_PEDIT_PARMS]; // TCA_PEDIT_PARMS is included if (!pattr) pattr = tb[TCA_PEDIT_PARMS_EX]; // but this is not // ... parm = nla_data(pattr); index = parm->index; // parm is able to be smaller than 4 bytes // and this dereference gets dirty skb_buff // data created in netlink_sendmsg } This commit adds TCA_PEDIT_PARMS_EX length in pedit_policy which avoid the above case, just like the TCA_PEDIT_PARMS. Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230703110842.590282-1-linma@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19xsk: Honor SO_BINDTODEVICE on bindIlya Maximets
[ Upstream commit f7306acec9aae9893d15e745c8791124d42ab10a ] Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A privileged process might create the socket and pass it to a non-privileged process for later use. However, that process will be able to bind the socket to any network interface. Even though it will not be able to receive any traffic without modification of the BPF map, the situation is not ideal. Sockets already have a mechanism that can be used to restrict what interface they can be attached to. That is SO_BINDTODEVICE. To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow when non-privileged process is using AF_XDP. The intended workflow is following: 1. First process creates a bare socket with socket(AF_XDP, ...). 2. First process loads the XSK program to the interface. 3. First process adds the socket fd to a BPF map. 4. First process ties socket fd to a particular interface using SO_BINDTODEVICE. 5. First process sends socket fd to a second process. 6. Second process allocates UMEM. 7. Second process binds socket to the interface with bind(...). 8. Second process sends/receives the traffic. All the steps above are possible today if the first process is privileged and the second one has sufficient RLIMIT_MEMLOCK and no capabilities. However, the second process will be able to bind the socket to any interface it wants on step 7 and send traffic from it. With the proposed change, the second process will be able to bind the socket only to a specific interface chosen by the first process at step 4. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/20230703175329.3259672-1-i.maximets@ovn.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19tcp: annotate data races in __tcp_oow_rate_limited()Eric Dumazet
[ Upstream commit 998127cdb4699b9d470a9348ffe9f1154346be5f ] request sockets are lockless, __tcp_oow_rate_limited() could be called on the same object from different cpus. This is harmless. Add READ_ONCE()/WRITE_ONCE() annotations to avoid a KCSAN report. Fixes: 4ce7e93cb3fe ("tcp: rate limit ACK sent by SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge modeVladimir Oltean
[ Upstream commit a398b9ea0c3b791b7a0f4c6029a62cf628f97f22 ] There was a regression introduced by the blamed commit, where pinging to a VLAN-unaware bridge would fail with the repeated message "Couldn't decode source port" coming from the tagging protocol driver. When receiving packets with a bridge_vid as determined by dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode: - source_port = 0 (which isn't really valid, more like "don't know") - switch_id = 0 (which isn't really valid, more like "don't know") - vbid = value in range 1-7 Since the blamed patch has reversed the order of the checks, we are now going to believe that source_port != -1 and switch_id != -1, so they're valid, but they aren't. The minimal solution to the problem is to only populate source_port and switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero, i.e. the source port information is trustworthy. Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC modeVladimir Oltean
[ Upstream commit 6ca3c005d0604e8d2b439366e3923ea58db99641 ] According to the synchronization rules for .ndo_get_stats() as seen in Documentation/networking/netdevices.rst, acquiring a plain spin_lock() should not be illegal, but the bridge driver implementation makes it so. After running these commands, I am being faced with the following lockdep splat: $ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up $ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set macsec0 master br0 && ip link set macsec0 up ======================================================== WARNING: possible irq lock inversion dependency detected 6.4.0-04295-g31b577b4bd4a #603 Not tainted -------------------------------------------------------- swapper/1/0 just changed the state of lock: ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198 but this lock took another, SOFTIRQ-unsafe lock in the past: (&ocelot->stats_lock){+.+.}-{3:3} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &br->lock --> &br->hash_lock --> &ocelot->stats_lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ocelot->stats_lock); local_irq_disable(); lock(&br->lock); lock(&br->hash_lock); <Interrupt> lock(&br->lock); *** DEADLOCK *** (details about the 3 locks skipped) swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this only matters to the extent that its .ndo_get_stats64() method calls spin_lock(&ocelot->stats_lock). Documentation/locking/lockdep-design.rst says: | A lock is irq-safe means it was ever used in an irq context, while a lock | is irq-unsafe means it was ever acquired with irq enabled. (...) | Furthermore, the following usage based lock dependencies are not allowed | between any two lock-classes:: | | <hardirq-safe> -> <hardirq-unsafe> | <softirq-safe> -> <softirq-unsafe> Lockdep marks br->hash_lock as softirq-safe, because it is sometimes taken in softirq context (for example br_fdb_update() which runs in NET_RX softirq), and when it's not in softirq context it blocks softirqs by using spin_lock_bh(). Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never blocks softirqs from running, and it is never taken from softirq context. So it can always be interrupted by softirqs. There is a call path through which a function that holds br->hash_lock: fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock: ocelot_port_get_stats64(). This can be seen below: ocelot_port_get_stats64+0x3c/0x1e0 felix_get_stats64+0x20/0x38 dsa_slave_get_stats64+0x3c/0x60 dev_get_stats+0x74/0x2c8 rtnl_fill_stats+0x4c/0x150 rtnl_fill_ifinfo+0x5cc/0x7b8 rtmsg_ifinfo_build_skb+0xe4/0x150 rtmsg_ifinfo+0x5c/0xb0 __dev_notify_flags+0x58/0x200 __dev_set_promiscuity+0xa0/0x1f8 dev_set_promiscuity+0x30/0x70 macsec_dev_change_rx_flags+0x68/0x88 __dev_set_promiscuity+0x1a8/0x1f8 __dev_set_rx_mode+0x74/0xa8 dev_uc_add+0x74/0xa0 fdb_add_hw_addr+0x68/0xd8 fdb_add_local+0xc4/0x110 br_fdb_add_local+0x54/0x88 br_add_if+0x338/0x4a0 br_add_slave+0x20/0x38 do_setlink+0x3a4/0xcb8 rtnl_newlink+0x758/0x9d0 rtnetlink_rcv_msg+0x2f0/0x550 netlink_rcv_skb+0x128/0x148 rtnetlink_rcv+0x24/0x38 the plain English explanation for it is: The macsec0 bridge port is created without p->flags & BR_PROMISC, because it is what br_manage_promisc() decides for a VLAN filtering bridge with a single auto port. As part of the br_add_if() procedure, br_fdb_add_local() is called for the MAC address of the device, and this results in a call to dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken. Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up calling __dev_set_promiscuity() for macsec0, which is propagated by its implementation, macsec_dev_change_rx_flags(), to the lower device: swp0. This triggers the call path: dev_set_promiscuity(swp0) -> rtmsg_ifinfo() -> dev_get_stats() -> ocelot_port_get_stats64() with a calling context that lockdep doesn't like (br->hash_lock held). Normally we don't see this, because even though many drivers that can be bridge ports don't support IFF_UNICAST_FLT, we need a driver that (a) doesn't support IFF_UNICAST_FLT, *and* (b) it forwards the IFF_PROMISC flag to another driver, and (c) *that* driver implements ndo_get_stats64() using a softirq-unsafe spinlock. Condition (b) is necessary because the first __dev_set_rx_mode() calls __dev_set_promiscuity() with "bool notify=false", and thus, the rtmsg_ifinfo() code path won't be entered. The same criteria also hold true for DSA switches which don't report IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its ndo_get_stats64() method, the same lockdep splat can be seen. I think the deadlock possibility is real, even though I didn't reproduce it, and I'm thinking of the following situation to support that claim: fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally disabled and br->hash_lock held, and may end up attempting to acquire ocelot->stats_lock. In parallel, ocelot->stats_lock is currently held by a thread B (say, ocelot_check_stats_work()), which is interrupted while holding it by a softirq which attempts to lock br->hash_lock. Thread B cannot make progress because br->hash_lock is held by A. Whereas thread A cannot make progress because ocelot->stats_lock is held by B. When taking the issue at face value, the bridge can avoid that problem by simply making the ports promiscuous from a code path with a saner calling context (br->hash_lock not held). A bridge port without IFF_UNICAST_FLT is going to become promiscuous as soon as we call dev_uc_add() on it (which we do unconditionally), so why not be preemptive and make it promiscuous right from the beginning, so as to not be taken by surprise. With this, we've broken the links between code that holds br->hash_lock or br->lock and code that calls into the ndo_change_rx_flags() or ndo_get_stats64() ops of the bridge port. Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19Bluetooth: MGMT: Fix marking SCAN_RSP as not connectableLuiz Augusto von Dentz
[ Upstream commit 73f55453ea5236a586a7f1b3d5e2ee051d655351 ] When receiving a scan response there is no way to know if the remote device is connectable or not, so when it cannot be merged don't make any assumption and instead just mark it with a new flag defined as MGMT_DEV_FOUND_SCAN_RSP so userspace can tell it is a standalone SCAN_RSP. Link: https://lore.kernel.org/linux-bluetooth/CABBYNZ+CYMsDSPTxBn09Js3BcdC-x7vZFfyLJ3ppZGGwJKmUTw@mail.gmail.com/ Fixes: c70a7e4cc8d2 ("Bluetooth: Add support for Not Connectable flag for Device Found events") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19Bluetooth: MGMT: add CIS feature bits to controller informationPauli Virtanen
[ Upstream commit 2394186a2cefb9a45a029281a55749804dd8c556 ] Userspace needs to know whether the adapter has feature support for Connected Isochronous Stream - Central/Peripheral, so it can set up LE Audio features accordingly. Expose these feature bits as settings in MGMT controller info. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 73f55453ea52 ("Bluetooth: MGMT: Fix marking SCAN_RSP as not connectable") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19Bluetooth: ISO: use hci_sync for setting CIG parametersPauli Virtanen
[ Upstream commit 6b9545dc9f8ff01d8bc1229103960d9cd265343f ] When reconfiguring CIG after disconnection of the last CIS, LE Remove CIG shall be sent before LE Set CIG Parameters. Otherwise, it fails because CIG is in the inactive state and not configurable (Core v5.3 Vol 6 Part B Sec. 4.5.14.3). This ordering is currently wrong under suitable timing conditions, because LE Remove CIG is sent via the hci_sync queue and may be delayed, but Set CIG Parameters is via hci_send_cmd. Make the ordering well-defined by sending also Set CIG Parameters via hci_sync. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19Bluetooth: fix invalid-bdaddr quirk for non-persistent setupJohan Hovold
[ Upstream commit 0cb7365850bacb8c2a9975cae672d65714d8daa1 ] Devices that lack persistent storage for the device address can indicate this by setting the HCI_QUIRK_INVALID_BDADDR which causes the controller to be marked as unconfigured until user space has set a valid address. Once configured, the device address must be set on every setup for controllers with HCI_QUIRK_NON_PERSISTENT_SETUP to avoid marking the controller as unconfigured and requiring the address to be set again. Fixes: 740011cfe948 ("Bluetooth: Add new quirk for non-persistent setup settings") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPTVladimir Oltean
[ Upstream commit c1ae02d876898b1b8ca1e12c6f84d7b406263800 ] Currently the sja1105 tagging protocol prefers using the source port information from the VLAN header if that is available, falling back to the INCL_SRCPT option if it isn't. The VLAN header is available for all frames except for META frames initiated by the switch (containing RX timestamps), and thus, the "if (is_link_local)" branch is practically dead. The tag_8021q source port identification has become more loose ("imprecise") and will report a plausible rather than exact bridge port, when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local traffic always needs to know the precise source port. With incorrect source port reporting, for example PTP traffic over 2 bridged ports will all be seen on sockets opened on the first such port, which is incorrect. Now that the tagging protocol has been changed to make link-local frames always contain source port information, we can reverse the order of the checks so that we always give precedence to that information (which is always precise) in lieu of the tag_8021q VID which is only precise for a standalone port. Fixes: d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21fcec ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net/sched: act_ipt: add sanity checks on skb before calling targetFlorian Westphal
[ Upstream commit b2dc32dcba08bf55cec600caa76f4afd2e3614df ] Netfilter targets make assumptions on the skb state, for example iphdr is supposed to be in the linear area. This is normally done by IP stack, but in act_ipt case no such checks are made. Some targets can even assume that skb_dst will be valid. Make a minimum effort to check for this: - Don't call the targets eval function for non-ipv4 skbs. - Don't call the targets eval function for POSTROUTING emulation when the skb has no dst set. v3: use skb_protocol helper (Davide Caratti) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net/sched: act_ipt: add sanity checks on table name and hook locationsFlorian Westphal
[ Upstream commit b4ee93380b3c891fea996af8d1d3ca0e36ad31f0 ] Looks like "tc" hard-codes "mangle" as the only supported table name, but on kernel side there are no checks. This is wrong. Not all xtables targets are safe to call from tc. E.g. "nat" targets assume skb has a conntrack object assigned to it. Normally those get called from netfilter nat core which consults the nat table to obtain the address mapping. "tc" userspace either sets PRE or POSTROUTING as hook number, but there is no validation of this on kernel side, so update netlink policy to reject bogus numbers. Some targets may assume skb_dst is set for input/forward hooks, so prevent those from being used. act_ipt uses the hook number in two places: 1. the state hook number, this is fine as-is 2. to set par.hook_mask The latter is a bit mask, so update the assignment to make xt_check_target() to the right thing. Followup patch adds required checks for the skb/packet headers before calling the targets evaluation function. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19sctp: fix potential deadlock on &net->sctp.addr_wq_lockChengfeng Ye
[ Upstream commit 6feb37b3b06e9049e20dcf7e23998f92c9c5be9a ] As &net->sctp.addr_wq_lock is also acquired by the timer sctp_addr_wq_timeout_handler() in protocal.c, the same lock acquisition at sctp_auto_asconf_init() seems should disable irq since it is called from sctp_accept() under process context. Possible deadlock scenario: sctp_accept() -> sctp_sock_migrate() -> sctp_auto_asconf_init() -> spin_lock(&net->sctp.addr_wq_lock) <timer interrupt> -> sctp_addr_wq_timeout_handler() -> spin_lock_bh(&net->sctp.addr_wq_lock); (deadlock here) This flaw was found using an experimental static analysis tool we are developing for irq-related deadlock. The tentative patch fix the potential deadlock by spin_lock_bh(). Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Fixes: 34e5b0118685 ("sctp: delay auto_asconf init until binding the first addr") Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230627120340.19432-1-dg573847474@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19SUNRPC: Fix UAF in svc_tcp_listen_data_ready()Ding Hui
commit fc80fc2d4e39137869da3150ee169b40bf879287 upstream. After the listener svc_sock is freed, and before invoking svc_tcp_accept() for the established child sock, there is a window that the newsock retaining a freed listener svc_sock in sk_user_data which cloning from parent. In the race window, if data is received on the newsock, we will observe use-after-free report in svc_tcp_listen_data_ready(). Reproduce by two tasks: 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] Read of size 8 at addr ffff888139d96228 by task nc/102553 CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Call Trace: <IRQ> dump_stack_lvl+0x33/0x50 print_address_description.constprop.0+0x27/0x310 print_report+0x3e/0x70 kasan_report+0xae/0xe0 svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] tcp_data_queue+0x9f4/0x20e0 tcp_rcv_established+0x666/0x1f60 tcp_v4_do_rcv+0x51c/0x850 tcp_v4_rcv+0x23fc/0x2e80 ip_protocol_deliver_rcu+0x62/0x300 ip_local_deliver_finish+0x267/0x350 ip_local_deliver+0x18b/0x2d0 ip_rcv+0x2fb/0x370 __netif_receive_skb_one_core+0x166/0x1b0 process_backlog+0x24c/0x5e0 __napi_poll+0xa2/0x500 net_rx_action+0x854/0xc90 __do_softirq+0x1bb/0x5de do_softirq+0xcb/0x100 </IRQ> <TASK> ... </TASK> Allocated by task 102371: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7b/0x90 svc_setup_socket+0x52/0x4f0 [sunrpc] svc_addsock+0x20d/0x400 [sunrpc] __write_ports_addfd+0x209/0x390 [nfsd] write_ports+0x239/0x2c0 [nfsd] nfsctl_transaction_write+0xac/0x110 [nfsd] vfs_write+0x1c3/0xae0 ksys_write+0xed/0x1c0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 102551: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0x106/0x190 __kmem_cache_free+0x133/0x270 svc_xprt_free+0x1e2/0x350 [sunrpc] svc_xprt_destroy_all+0x25a/0x440 [sunrpc] nfsd_put+0x125/0x240 [nfsd] nfsd_svc+0x2cb/0x3c0 [nfsd] write_threads+0x1ac/0x2a0 [nfsd] nfsctl_transaction_write+0xac/0x110 [nfsd] vfs_write+0x1c3/0xae0 ksys_write+0xed/0x1c0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Fix the UAF by simply doing nothing in svc_tcp_listen_data_ready() if state != TCP_LISTEN, that will avoid dereferencing svsk for all child socket. Link: https://lore.kernel.org/lkml/20230507091131.23540-1-dinghui@sangfor.com.cn/ Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19netlink: Add __sock_i_ino() for __netlink_diag_dump().Kuniyuki Iwashima
[ Upstream commit 25a9c8a4431c364f97f75558cb346d2ad3f53fbb ] syzbot reported a warning in __local_bh_enable_ip(). [0] Commit 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()") converted read_lock(&nl_table_lock) to read_lock_irqsave() in __netlink_diag_dump() to prevent a deadlock. However, __netlink_diag_dump() calls sock_i_ino() that uses read_lock_bh() and read_unlock_bh(). If CONFIG_TRACE_IRQFLAGS=y, read_unlock_bh() finally enables IRQ even though it should stay disabled until the following read_unlock_irqrestore(). Using read_lock() in sock_i_ino() would trigger a lockdep splat in another place that was fixed in commit f064af1e500a ("net: fix a lockdep splat"), so let's add __sock_i_ino() that would be safe to use under BH disabled. [0]: WARNING: CPU: 0 PID: 5012 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Modules linked in: CPU: 0 PID: 5012 Comm: syz-executor487 Not tainted 6.4.0-rc7-syzkaller-00202-g6f68fc395f49 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Code: 45 bf 01 00 00 00 e8 91 5b 0a 00 e8 3c 15 3d 00 fb 65 8b 05 ec e9 b5 7e 85 c0 74 58 5b 5d c3 65 8b 05 b2 b6 b4 7e 85 c0 75 a2 <0f> 0b eb 9e e8 89 15 3d 00 eb 9f 48 89 ef e8 6f 49 18 00 eb a8 0f RSP: 0018:ffffc90003a1f3d0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 1ffffffff1cf5996 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffff8805c6f3 RBP: ffffffff8805c6f3 R08: 0000000000000001 R09: ffff8880152b03a3 R10: ffffed1002a56074 R11: 0000000000000005 R12: 00000000000073e4 R13: dffffc0000000000 R14: 0000000000000002 R15: 0000000000000000 FS: 0000555556726300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 000000007c646000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_i_ino+0x83/0xa0 net/core/sock.c:2559 __netlink_diag_dump+0x45c/0x790 net/netlink/diag.c:171 netlink_diag_dump+0xd6/0x230 net/netlink/diag.c:207 netlink_dump+0x570/0xc50 net/netlink/af_netlink.c:2269 __netlink_dump_start+0x64b/0x910 net/netlink/af_netlink.c:2374 netlink_dump_start include/linux/netlink.h:329 [inline] netlink_diag_handler_dump+0x1ae/0x250 net/netlink/diag.c:238 __sock_diag_cmd net/core/sock_diag.c:238 [inline] sock_diag_rcv_msg+0x31e/0x440 net/core/sock_diag.c:269 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2547 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 ____sys_sendmsg+0x71c/0x900 net/socket.c:2503 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2557 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2586 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5303aaabb9 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc7506e548 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5303aaabb9 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007f5303a6ed60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5303a6edf0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()") Reported-by: syzbot+5da61cf6a9bc1902d422@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=5da61cf6a9bc1902d422 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230626164313.52528-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return ↵Ilia.Gavrilov
value. [ Upstream commit f188d30087480eab421cd8ca552fb15f55d57f4d ] ct_sip_parse_numerical_param() returns only 0 or 1 now. But process_register_request() and process_register_response() imply checking for a negative value if parsing of a numerical header parameter failed. The invocation in nf_nat_sip() looks correct: if (ct_sip_parse_numerical_param(...) > 0 && ...) { ... } Make the return value of the function ct_sip_parse_numerical_param() a tristate to fix all the cases a) return 1 if value is found; *val is set b) return 0 if value is not found; *val is unchanged c) return -1 on error; *val is undefined Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 0f32a40fc91a ("[NETFILTER]: nf_conntrack_sip: create signalling expectations") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19netfilter: conntrack: dccp: copy entire header to stack buffer, not just ↵Florian Westphal
basic one [ Upstream commit ff0a3a7d52ff7282dbd183e7fc29a1fe386b0c30 ] Eric Dumazet says: nf_conntrack_dccp_packet() has an unique: dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh); And nothing more is 'pulled' from the packet, depending on the content. dh->dccph_doff, and/or dh->dccph_x ...) So dccp_ack_seq() is happily reading stuff past the _dh buffer. BUG: KASAN: stack-out-of-bounds in nf_conntrack_dccp_packet+0x1134/0x11c0 Read of size 4 at addr ffff000128f66e0c by task syz-executor.2/29371 [..] Fix this by increasing the stack buffer to also include room for the extra sequence numbers and all the known dccp packet type headers, then pull again after the initial validation of the basic header. While at it, mark packets invalid that lack 48bit sequence bit but where RFC says the type MUST use them. Compile tested only. v2: first skb_header_pointer() now needs to adjust the size to only pull the generic header. (Eric) Heads-up: I intend to remove dccp conntrack support later this year. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19net: nfc: Fix use-after-free caused by nfc_llcp_find_localLin Ma
[ Upstream commit 6709d4b7bc2e079241fdef15d1160581c5261c10 ] This commit fixes several use-after-free that caused by function nfc_llcp_find_local(). For example, one UAF can happen when below buggy time window occurs. // nfc_genl_llc_get_params | // nfc_unregister_device | dev = nfc_get_device(idx); | device_lock(...) if (!dev) | dev->shutting_down = true; return -ENODEV; | device_unlock(...); | device_lock(...); | // nfc_llcp_unregister_device | nfc_llcp_find_local() nfc_llcp_find_local(...); | | local_cleanup() if (!local) { | rc = -ENODEV; | // nfc_llcp_local_put goto exit; | kref_put(.., local_release) } | | // local_release | list_del(&local->list) // nfc_genl_send_params | kfree() local->dev->idx !!!UAF!!! | | and the crash trace for the one of the discussed UAF like: BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045 Read of size 8 at addr ffff888105b0e410 by task 20114 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:319 [inline] print_report+0xcc/0x620 mm/kasan/report.c:430 kasan_report+0xb2/0xe0 mm/kasan/report.c:536 nfc_genl_send_params net/nfc/netlink.c:999 [inline] nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045 genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968 genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline] genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065 netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x1b6/0x200 net/socket.c:747 ____sys_sendmsg+0x6e9/0x890 net/socket.c:2501 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2555 __sys_sendmsg+0xf7/0x1d0 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f34640a2389 RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006 RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000 </TASK> Allocated by task 20116: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383 kmalloc include/linux/slab.h:580 [inline] kzalloc include/linux/slab.h:720 [inline] nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567 nfc_register_device+0x61/0x260 net/nfc/core.c:1124 nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257 virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148 misc_open+0x379/0x4a0 drivers/char/misc.c:165 chrdev_open+0x26c/0x780 fs/char_dev.c:414 do_dentry_open+0x6c4/0x12a0 fs/open.c:920 do_open fs/namei.c:3560 [inline] path_openat+0x24fe/0x37e0 fs/namei.c:3715 do_filp_open+0x1ba/0x410 fs/namei.c:3742 do_sys_openat2+0x171/0x4c0 fs/open.c:1356 do_sys_open fs/open.c:1372 [inline] __do_sys_openat fs/open.c:1388 [inline] __se_sys_openat fs/open.c:1383 [inline] __x64_sys_openat+0x143/0x200 fs/open.c:1383 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 20115: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free mm/kasan/common.c:200 [inline] __kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:162 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook mm/slub.c:1807 [inline] slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x7a/0x190 mm/slub.c:3800 local_release net/nfc/llcp_core.c:174 [inline] kref_put include/linux/kref.h:65 [inline] nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline] nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline] nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620 nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179 virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163 __fput+0x252/0xa20 fs/file_table.c:321 task_work_run+0x174/0x270 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297 do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc Last potentially related work creation: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 __kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491 kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328 drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735 unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773 unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753 neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895 addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684 notifier_call_chain+0xbe/0x210 kernel/notifier.c:87 call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937 call_netdevice_notifiers_extack net/core/dev.c:1975 [inline] call_netdevice_notifiers net/core/dev.c:1989 [inline] dev_change_name+0x3c3/0x870 net/core/dev.c:1211 dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376 dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542 sock_do_ioctl+0x160/0x260 net/socket.c:1213 sock_ioctl+0x3f9/0x670 net/socket.c:1316 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff888105b0e400 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 16 bytes inside of freed 1024-byte region [ffff888105b0e400, ffff888105b0e800) The buggy address belongs to the physical page: head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10 raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb In summary, this patch solves those use-after-free by 1. Re-implement the nfc_llcp_find_local(). The current version does not grab the reference when getting the local from the linked list. For example, the llcp_sock_bind() gets the reference like below: // llcp_sock_bind() local = nfc_llcp_find_local(dev); // A ..... \ | raceable ..... / llcp_sock->local = nfc_llcp_local_get(local); // B There is an apparent race window that one can drop the reference and free the local object fetched in (A) before (B) gets the reference. 2. Some callers of the nfc_llcp_find_local() do not grab the reference at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions. We add the nfc_llcp_local_put() for them. Moreover, we add the necessary error handling function to put the reference. 3. Add the nfc_llcp_remove_local() helper. The local object is removed from the linked list in local_release() when all reference is gone. This patch removes it when nfc_llcp_unregister_device() is called. Therefore, every caller of nfc_llcp_find_local() will get a reference even when the nfc_llcp_unregister_device() is called. This promises no use-after-free for the local object is ever possible. Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support") Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19netlink: do not hard code device address lenth in fdb dumpsEric Dumazet
[ Upstream commit aa5406950726e336c5c9585b09799a734b6e77bf ] syzbot reports that some netdev devices do not have a six bytes address [1] Replace ETH_ALEN by dev->addr_len. [1] (Case of a device where dev->addr_len = 4) BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in copyout+0xb8/0x100 lib/iov_iter.c:169 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copyout+0xb8/0x100 lib/iov_iter.c:169 _copy_to_iter+0x6d8/0x1d00 lib/iov_iter.c:536 copy_to_iter include/linux/uio.h:206 [inline] simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:513 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:527 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline] netlink_recvmsg+0x4ae/0x15a0 net/netlink/af_netlink.c:1970 sock_recvmsg_nosec net/socket.c:1019 [inline] sock_recvmsg net/socket.c:1040 [inline] ____sys_recvmsg+0x283/0x7f0 net/socket.c:2722 ___sys_recvmsg+0x223/0x840 net/socket.c:2764 do_recvmmsg+0x4f9/0xfd0 net/socket.c:2858 __sys_recvmmsg net/socket.c:2937 [inline] __do_sys_recvmmsg net/socket.c:2960 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0x397/0x490 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Uninit was stored to memory at: __nla_put lib/nlattr.c:1009 [inline] nla_put+0x1c6/0x230 lib/nlattr.c:1067 nlmsg_populate_fdb_fill+0x2b8/0x600 net/core/rtnetlink.c:4071 nlmsg_populate_fdb net/core/rtnetlink.c:4418 [inline] ndo_dflt_fdb_dump+0x616/0x840 net/core/rtnetlink.c:4456 rtnl_fdb_dump+0x14ff/0x1fc0 net/core/rtnetlink.c:4629 netlink_dump+0x9d1/0x1310 net/netlink/af_netlink.c:2268 netlink_recvmsg+0xc5c/0x15a0 net/netlink/af_netlink.c:1995 sock_recvmsg_nosec+0x7a/0x120 net/socket.c:1019 ____sys_recvmsg+0x664/0x7f0 net/socket.c:2720 ___sys_recvmsg+0x223/0x840 net/socket.c:2764 do_recvmmsg+0x4f9/0xfd0 net/socket.c:2858 __sys_recvmmsg net/socket.c:2937 [inline] __do_sys_recvmmsg net/socket.c:2960 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0x397/0x490 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Uninit was created at: slab_post_alloc_hook+0x12d/0xb60 mm/slab.h:716 slab_alloc_node mm/slub.c:3451 [inline] __kmem_cache_alloc_node+0x4ff/0x8b0 mm/slub.c:3490 kmalloc_trace+0x51/0x200 mm/slab_common.c:1057 kmalloc include/linux/slab.h:559 [inline] __hw_addr_create net/core/dev_addr_lists.c:60 [inline] __hw_addr_add_ex+0x2e5/0x9e0 net/core/dev_addr_lists.c:118 __dev_mc_add net/core/dev_addr_lists.c:867 [inline] dev_mc_add+0x9a/0x130 net/core/dev_addr_lists.c:885 igmp6_group_added+0x267/0xbc0 net/ipv6/mcast.c:680 ipv6_mc_up+0x296/0x3b0 net/ipv6/mcast.c:2754 ipv6_mc_remap+0x1e/0x30 net/ipv6/mcast.c:2708 addrconf_type_change net/ipv6/addrconf.c:3731 [inline] addrconf_notify+0x4d3/0x1d90 net/ipv6/addrconf.c:3699 notifier_call_chain kernel/notifier.c:93 [inline] raw_notifier_call_chain+0xe4/0x430 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1935 [inline] call_netdevice_notifiers_extack net/core/dev.c:1973 [inline] call_netdevice_notifiers+0x1ee/0x2d0 net/core/dev.c:1987 bond_enslave+0xccd/0x53f0 drivers/net/bonding/bond_main.c:1906 do_set_master net/core/rtnetlink.c:2626 [inline] rtnl_newlink_create net/core/rtnetlink.c:3460 [inline] __rtnl_newlink net/core/rtnetlink.c:3660 [inline] rtnl_newlink+0x378c/0x40e0 net/core/rtnetlink.c:3673 rtnetlink_rcv_msg+0x16a6/0x1840 net/core/rtnetlink.c:6395 netlink_rcv_skb+0x371/0x650 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0xf28/0x1230 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x122f/0x13d0 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x999/0xd50 net/socket.c:2503 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2557 __sys_sendmsg net/socket.c:2586 [inline] __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x304/0x490 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Bytes 2856-2857 of 3500 are uninitialized Memory access of size 3500 starts at ffff888018d99104 Data copied to user address 0000000020000480 Fixes: d83b06036048 ("net: add fdb generic dump routine") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230621174720.1845040-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19netlink: fix potential deadlock in netlink_set_err()Eric Dumazet
[ Upstream commit 8d61f926d42045961e6b65191c09e3678d86a9cf ] syzbot reported a possible deadlock in netlink_set_err() [1] A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()") in netlink_lock_table() This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump() which were not covered by cited commit. [1] WARNING: possible irq lock inversion dependency detected 6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted syz-executor.2/23011 just changed the state of lock: ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612 but this lock was taken by another, SOFTIRQ-safe lock in the past: (&local->queue_stop_reason_lock){..-.}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nl_table_lock); local_irq_disable(); lock(&local->queue_stop_reason_lock); lock(nl_table_lock); <Interrupt> lock(&local->queue_stop_reason_lock); *** DEADLOCK *** Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()") Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindingsGilad Sever
[ Upstream commit 9a5cb79762e0eda17ca15c2a6eaca4622383c21c ] When calling bpf_sk_lookup_tcp(), bpf_sk_lookup_udp() or bpf_skc_lookup_tcp() from tc/xdp ingress, VRF socket bindings aren't respoected, i.e. unbound sockets are returned, and bound sockets aren't found. VRF binding is determined by the sdif argument to sk_lookup(), however when called from tc the IP SKB control block isn't initialized and thus inet{,6}_sdif() always returns 0. Fix by calculating sdif for the tc/xdp flows by observing the device's l3 enslaved state. The cg/sk_skb hooking points which are expected to support inet{,6}_sdif() pass sdif=-1 which makes __bpf_skc_lookup() use the existing logic. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Gilad Sever <gilad9366@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/20230621104211.301902-4-gilad9366@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpointGilad Sever
[ Upstream commit 97fbfeb86917bdbe9c41d5143e335a929147f405 ] skb->dev always exists in the tc flow. There is no need to use bpf_skc_lookup(), bpf_sk_lookup() from this code path. This change facilitates fixing the tc flow to be VRF aware. Signed-off-by: Gilad Sever <gilad9366@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230621104211.301902-3-gilad9366@gmail.com Stable-dep-of: 9a5cb79762e0 ("bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: Factor out socket lookup functions for the TC hookpoint.Gilad Sever
[ Upstream commit 6e98730bc0b44acaf86eccc75f823128aa9c9e79 ] Change BPF helper socket lookup functions to use TC specific variants: bpf_tc_sk_lookup_tcp() / bpf_tc_sk_lookup_udp() / bpf_tc_skc_lookup_tcp() instead of sharing implementation with the cg / sk_skb hooking points. This allows introducing a separate logic for the TC flow. The tc functions are identical to the original code. Signed-off-by: Gilad Sever <gilad9366@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230621104211.301902-2-gilad9366@gmail.com Stable-dep-of: 9a5cb79762e0 ("bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: cfg80211: fix regulatory disconnect with OCB/NANJohannes Berg
[ Upstream commit e8c2af660ba0790afd14d5cbc2fd05c6dc85e207 ] Since regulatory disconnect was added, OCB and NAN interface types were added, which made it completely unusable for any driver that allowed OCB/NAN. Add OCB/NAN (though NAN doesn't do anything, we don't have any info) and also remove all the logic that opts out, so it won't be broken again if/when new interface types are added. Fixes: 6e0bd6c35b02 ("cfg80211: 802.11p OCB mode handling") Fixes: cb3b7d87652a ("cfg80211: add start / stop NAN commands") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20230616222844.2794d1625a26.I8e78a3789a29e6149447b3139df724a6f1b46fc3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: cfg80211: drop incorrect nontransmitted BSS update codeBenjamin Berg
[ Upstream commit 39432f8a3752a87a53fd8d5e51824a43aaae5cab ] The removed code ran for any BSS that was not included in the MBSSID element in order to update it. However, instead of using the correct inheritance rules, it would simply copy the elements from the transmitting AP. The result is that we would report incorrect elements in this case. After some discussions, it seems that there are likely not even APs actually using this feature. Either way, removing the code decreases complexity and makes the cfg80211 behaviour more correct. Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230616094949.cfd6d8db1f26.Ia1044902b86cd7d366400a4bfb93691b8f05d68c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: cfg80211: rewrite merging of inherited elementsBenjamin Berg
[ Upstream commit dfd9aa3e7a456d57b18021d66472ab7ff8373ab7 ] The cfg80211_gen_new_ie function merges the IEs using inheritance rules. Rewrite this function to fix issues around inheritance rules. In particular, vendor elements do not require any special handling, as they are either all inherited or overridden by the subprofile. Also, add fragmentation handling as this may be needed in some cases. This also changes the function to not require making a copy. The new version could be optimized a bit by explicitly tracking which IEs have been handled already rather than looking that up again every time. Note that a small behavioural change is the removal of the SSID special handling. This should be fine for the MBSSID element, as the SSID must be included in the subelement. Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230616094949.bc6152e146db.I2b5f3bc45085e1901e5b5192a674436adaf94748@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: mac80211: Remove "Missing iftype sband data/EHT cap" spamNicolas Cavallari
[ Upstream commit 6e21e7b8cd897193cee3c2649640efceb3004ba5 ] In mesh mode, ieee80211_chandef_he_6ghz_oper() is called by mesh_matches_local() for every received mesh beacon. On a 6 GHz mesh of a HE-only phy, this spams that the hardware does not have EHT capabilities, even if the received mesh beacon does not have an EHT element. Unlike HE, not supporting EHT in the 6 GHz band is not an error so do not print anything in this case. Fixes: 5dca295dd767 ("mac80211: Add initial support for EHT and 320 MHz channels") Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230614132648.28995-1-nicolas.cavallari@green-communications.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFOEdwin Peer
[ Upstream commit fa0e21fa44438a0e856d42224bfa24641d37b979 ] This filter already exists for excluding IPv6 SNMP stats. Extend its definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK. This patch constitutes a partial fix for a netlink attribute nesting overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the requester doesn't need them, the truncation of the VF list is avoided. While it was technically only the stats added in commit c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") breaking the camel's back, the appreciable size of the stats data should never have been included without due consideration for the maximum number of VFs supported by PCI. Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice") Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Cc: Edwin Peer <espeer@gmail.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20230611105108.122586-1-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: mac80211: Fix permissions for valid_links debugfs entryIlan Peer
[ Upstream commit 4cacadc0dbd8013e6161aa8843d8e9d8ad435b47 ] The entry should be a read only one and not a write only one. Fix it. Fixes: 3d9011029227 ("wifi: mac80211: implement link switching") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230611121219.c75316990411.I1565a7fcba8a37f83efffb0cc6b71c572b896e94@changeid [remove x16 change since it doesn't work yet] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: mac80211: recalc min chandef for new STA linksJohannes Berg
[ Upstream commit ba7af2654e3b7b810c750b3c6106f6f20b81cc88 ] When adding a new link to a station, this needs to cause a recalculation of the minimum chandef since otherwise we can have a higher bandwidth station connected on that link than the link is operating at. Do the appropriate recalc. Fixes: cb71f1d136a6 ("wifi: mac80211: add sta link addition/removal") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.377adf3c789a.I91bf28f399e16e6ac1f83bacd1029a698b4e6685@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect()Krzysztof Kozlowski
[ Upstream commit 0d9b41daa5907756a31772d8af8ac5ff25cf17c1 ] If sock->service_name is NULL, the local variable service_name_tlv_length will not be assigned by nfc_llcp_build_tlv(), later leading to using value frmo the stack. Smatch warning: net/nfc/llcp_commands.c:442 nfc_llcp_send_connect() error: uninitialized symbol 'service_name_tlv_length'. Fixes: de9e5aeb4f40 ("NFC: llcp: Fix usage of llcp_add_tlv()") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19sctp: add bpf_bypass_getsockopt proto callbackAlexander Mikhalitsyn
[ Upstream commit 2598619e012cee5273a2821441b9a051ad931249 ] Implement ->bpf_bypass_getsockopt proto callback and filter out SCTP_SOCKOPT_PEELOFF, SCTP_SOCKOPT_PEELOFF_FLAGS and SCTP_SOCKOPT_CONNECTX3 socket options from running eBPF hook on them. SCTP_SOCKOPT_PEELOFF and SCTP_SOCKOPT_PEELOFF_FLAGS options do fd_install(), and if BPF_CGROUP_RUN_PROG_GETSOCKOPT hook returns an error after success of the original handler sctp_getsockopt(...), userspace will receive an error from getsockopt syscall and will be not aware that fd was successfully installed into a fdtable. As pointed by Marcelo Ricardo Leitner it seems reasonable to skip bpf getsockopt hook for SCTP_SOCKOPT_CONNECTX3 sockopt too. Because internaly, it triggers connect() and if error is masked then userspace will be confused. This patch was born as a result of discussion around a new SCM_PIDFD interface: https://lore.kernel.org/all/20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com/ Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Christian Brauner <brauner@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Suggested-by: Stanislav Fomichev <sdf@google.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19svcrdma: Prevent page release when nothing was receivedChuck Lever
[ Upstream commit baf6d18b116b7dc84ed5e212c3a89f17cdc3f28c ] I noticed that svc_rqst_release_pages() was still unnecessarily releasing a page when svc_rdma_recvfrom() returns zero. Fixes: a53d5cb0646a ("svcrdma: Avoid releasing a page in svc_xprt_release()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-01can: isotp: isotp_sendmsg(): fix return error fix on TX pathOliver Hartkopp
commit e38910c0072b541a91954682c8b074a93e57c09b upstream. With commit d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") the missing correct return value in the case of a protocol error was introduced. But the way the error value has been read and sent to the user space does not follow the common scheme to clear the error after reading which is provided by the sock_error() function. This leads to an error report at the following write() attempt although everything should be working. Fixes: d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") Reported-by: Carsten Schmidt <carsten.schmidt-achim@t-online.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mptcp: ensure listener is unhashed before updating the sk statusPaolo Abeni
commit 57fc0f1ceaa4016354cf6f88533e20b56190e41a upstream. The MPTCP protocol access the listener subflow in a lockless manner in a couple of places (poll, diag). That works only if the msk itself leaves the listener status only after that the subflow itself has been closed/disconnected. Otherwise we risk deadlock in diag, as reported by Christoph. Address the issue ensuring that the first subflow (the listener one) is always disconnected before updating the msk socket status. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407 Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28revert "net: align SO_RCVMARK required privileges with SO_MARK"Maciej Żenczykowski
[ Upstream commit a9628e88776eb7d045cf46467f1afdd0f7fe72ea ] This reverts commit 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK") because the reasoning in the commit message is not really correct: SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which sets the socket mark and does require privs. Additionally incoming skb->mark may already be visible if sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled. Furthermore, it is easier to block the getsockopt via bpf (either cgroup setsockopt hook, or via syscall filters) then to unblock it if it requires CAP_NET_RAW/ADMIN. On Android the socket mark is (among other things) used to store the network identifier a socket is bound to. Setting it is privileged, but retrieving it is not. We'd like unprivileged userspace to be able to read the network id of incoming packets (where mark is set via iptables [to be moved to bpf])... An alternative would be to add another sysctl to control whether setting SO_RCVMARK is privilged or not. (or even a MASK of which bits in the mark can be exposed) But this seems like over-engineering... Note: This is a non-trivial revert, due to later merged commit e42c7beee71d ("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()") which changed both 'ns_capable' into 'sockopt_ns_capable' calls. Fixes: 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK") Cc: Larysa Zaremba <larysa.zaremba@intel.com> Cc: Simon Horman <simon.horman@corigine.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eyal Birger <eyal.birger@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Patrick Rohr <prohr@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28sch_netem: acquire qdisc lock in netem_change()Eric Dumazet
[ Upstream commit 2174a08db80d1efeea382e25ac41c4e7511eb6d6 ] syzbot managed to trigger a divide error [1] in netem. It could happen if q->rate changes while netem_enqueue() is running, since q->rate is read twice. It turns out netem_change() always lacked proper synchronization. [1] divide error: 0000 [#1] SMP KASAN CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 RIP: 0010:div64_u64 include/linux/math64.h:69 [inline] RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline] RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576 Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03 RSP: 0018:ffffc9000dccea60 EFLAGS: 00010246 RAX: 000001a442624200 RBX: 000001a442624200 RCX: ffff888108a4f000 RDX: 0000000000000000 RSI: 000000000000070d RDI: 000000000000070d RBP: ffffc9000dcceb90 R08: ffffffff849c5e26 R09: fffffbfff10e1297 R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888108a4f358 R13: dffffc0000000000 R14: 0000001a8cd9a7ec R15: 0000000000000000 FS: 00007fa73fe18700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa73fdf7718 CR3: 000000011d36e000 CR4: 0000000000350ee0 Call Trace: <TASK> [<ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline] [<ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290 [<ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline] [<ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline] [<ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline] [<ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235 [<ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0 [<ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323 [<ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437 [<ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline] [<ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline] [<ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542 [<ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nfnetlink_osf: fix module autoloadPablo Neira Ayuso
[ Upstream commit 62f9a68a36d4441a6c412b81faed102594bc6670 ] Move the alias from xt_osf to nfnetlink_osf. Fixes: f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: disallow updates of anonymous setsPablo Neira Ayuso
[ Upstream commit b770283c98e0eee9133c47bc03b6cc625dc94723 ] Disallow updates of set timeout and garbage collection parameters for anonymous sets. Fixes: 123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: reject unbound chain set before commit phasePablo Neira Ayuso
[ Upstream commit 62e1e94b246e685d89c3163aaef4b160e42ceb02 ] Use binding list to track set transaction and to check for unbound chains before entering the commit phase. Bail out if chain binding remain unused before entering the commit step. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: reject unbound anonymous set before commit phasePablo Neira Ayuso
[ Upstream commit 938154b93be8cd611ddfd7bafc1849f3c4355201 ] Add a new list to track set transaction and to check for unbound anonymous sets before entering the commit phase. Bail out at the end of the transaction handling if an anonymous set remains unbound. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: disallow element updates of bound anonymous setsPablo Neira Ayuso
[ Upstream commit c88c535b592d3baeee74009f3eceeeaf0fdd5e1b ] Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API allows to create anonymous sets without NFT_SET_CONSTANT, it makes no sense to allow to add and to delete elements for bound anonymous sets. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nft_set_pipapo: .walk does not deal with generationsPablo Neira Ayuso
[ Upstream commit 2b84e215f87443c74ac0aa7f76bb172d43a87033 ] The .walk callback iterates over the current active set, but it might be useful to iterate over the next generation set. Use the generation mask to determine what set view (either current or next generation) is use for the walk iteration. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: drop map element references from preparation phasePablo Neira Ayuso
[ Upstream commit 628bd3e49cba1c066228e23d71a852c23e26da73 ] set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path. Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets. Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chainPablo Neira Ayuso
[ Upstream commit 26b5a5712eb85e253724e56a54c17f8519bd8e4e ] Add a new state to deal with rule expressions deactivation from the newrule error path, otherwise the anonymous set remains in the list in inactive state for the next generation. Mark the set/chain transaction as unbound so the abort path releases this object, set it as inactive in the next generation so it is not reachable anymore from this transaction and reference counter is dropped. Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28netfilter: nf_tables: fix chain binding transaction logicPablo Neira Ayuso
[ Upstream commit 4bedf9eee016286c835e3d8fa981ddece5338795 ] Add bound flag to rule and chain transactions as in 6a0a8d10a366 ("netfilter: nf_tables: use-after-free in failing rule with bound set") to skip them in case that the chain is already bound from the abort path. This patch fixes an imbalance in the chain use refcnt that triggers a WARN_ON on the table and chain destroy path. This patch also disallows nested chain bindings, which is not supported from userspace. The logic to deal with chain binding in nft_data_hold() and nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a special handling in case a chain is bound but next expressions in the same rule fail to initialize as described by 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE"). The chain is left bound if rule construction fails, so the objects stored in this chain (and the chain itself) are released by the transaction records from the abort path, follow up patch ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain") completes this error handling. When deleting an existing rule, chain bound flag is set off so the rule expression .destroy path releases the objects. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28ipvs: align inner_mac_header for encapsulationTerin Stock
[ Upstream commit d7fce52fdf96663ddc2eb21afecff3775588612a ] When using encapsulation the original packet's headers are copied to the inner headers. This preserves the space for an inner mac header, which is not used by the inner payloads for the encapsulation types supported by IPVS. If a packet is using GUE or GRE encapsulation and needs to be segmented, flow can be passed to __skb_udp_tunnel_segment() which calculates a negative tunnel header length. A negative tunnel header length causes pskb_may_pull() to fail, dropping the packet. This can be observed by attaching probes to ip_vs_in_hook(), __dev_queue_xmit(), and __skb_udp_tunnel_segment(): perf probe --add '__dev_queue_xmit skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header' perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen' perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header' These probes the headers and tunnel header length for packets which traverse the IPVS encapsulation path. A TCP packet can be forced into the segmentation path by being smaller than a calculated clamped MSS, but larger than the advertised MSS. probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52 probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2 When using veth-based encapsulation, the interfaces are set to be mac-less, which does not preserve space for an inner mac header. This prevents this issue from occurring. In our real-world testing of sending a 32KB file we observed operation time increasing from ~75ms for veth-based encapsulation to over 1.5s using IPVS encapsulation due to retries from dropped packets. This changeset modifies the packet on the encapsulation path in ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac header offset. This fixes UDP segmentation for both encapsulation types, and corrects the inner headers for any IPIP flows that may use it. Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation") Signed-off-by: Terin Stock <terin@cloudflare.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>