Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter.
Current release - regressions:
- smc: fix af_ops of child socket pointing to released memory
- wifi: ath9k: fix usage of driver-private space in tx_info
Previous releases - regressions:
- ipv6: fix panic when forwarding a pkt with no in6 dev
- sctp: use the correct skb for security_sctp_assoc_request
- smc: fix NULL pointer dereference in smc_pnet_find_ib()
- sched: fix initialization order when updating chain 0 head
- phy: don't defer probe forever if PHY IRQ provider is missing
- dsa: revert "net: dsa: setup master before ports"
- dsa: felix: fix tagging protocol changes with multiple CPU ports
- eth: ice:
- fix use-after-free when freeing @rx_cpu_rmap
- revert "iavf: fix deadlock occurrence during resetting VF
interface"
- eth: lan966x: stop processing the MAC entry is port is wrong
Previous releases - always broken:
- sched:
- flower: fix parsing of ethertype following VLAN header
- taprio: check if socket flags are valid
- nfc: add flush_workqueue to prevent uaf
- veth: ensure eth header is in skb's linear part
- eth: stmmac: fix altr_tse_pcs function when using a fixed-link
- eth: macb: restart tx only if queue pointer is lagging
- eth: macvlan: fix leaking skb in source mode with nodst option"
* tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
net: dsa: felix: fix tagging protocol changes with multiple CPU ports
tun: annotate access to queue->trans_start
nfc: nci: add flush_workqueue to prevent uaf
net: dsa: realtek: don't parse compatible string for RTL8366S
net: dsa: realtek: fix Kconfig to assure consistent driver linkage
net: ftgmac100: access hardware register after clock ready
Revert "net: dsa: setup master before ports"
macvlan: Fix leaking skb in source mode with nodst option
netfilter: nf_tables: nft_parse_register can return a negative value
net: lan966x: Stop processing the MAC entry is port is wrong.
net: lan966x: Fix when a port's upper is changed.
net: lan966x: Fix IGMP snooping when frames have vlan tag
net: lan966x: Update lan966x_ptp_get_nominal_value
sctp: Initialize daddr on peeled off socket
net/smc: Fix af_ops of child socket pointing to released memory
net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
net/smc: use memcpy instead of snprintf to avoid out of bounds read
net: macb: Restart tx only if queue pointer is lagging
...
|
|
When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns
size of 0, which is supposed to be an indication that the corresponding
attribute should not be emitted. However, instead, the current code
reserves a 0-byte attribute.
The reason this does not show up as a citation on a kasan kernel is that
netdev_offload_xstats_get(), which is supposed to fill in the data, never
ends up getting called, because rtnl_offload_xstats_get_stats() notices
that the stats are not actually used and skips the call.
Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a
response, confusing the userspace.
Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill().
Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).
The race can be demonstrated below:
Thread-1 Thread-2
| nci_dev_up()
| nci_open_device()
| __nci_request(nci_reset_req)
| nci_send_cmd
| queue_work(cmd_work)
nci_unregister_device() |
nci_close_device() | ...
del_timer_sync(cmd_timer)[1] |
... | Worker
nci_free_device() | nci_cmd_work()
kfree(ndev)[3] | mod_timer(cmd_timer)[2]
In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.
This UAF is easy to trigger, crash trace by POC is like below
[ 66.703713] ==================================================================
[ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[ 66.703974]
[ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[ 66.703974] Call Trace:
[ 66.703974] <TASK>
[ 66.703974] dump_stack_lvl+0x57/0x7d
[ 66.703974] print_report.cold+0x5e/0x5db
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] kasan_report+0xbe/0x1c0
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] enqueue_timer+0x448/0x490
[ 66.703974] __mod_timer+0x5e6/0xb80
[ 66.703974] ? mark_held_locks+0x9e/0xe0
[ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 66.703974] ? queue_work_on+0x61/0x80
[ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130
[ 66.703974] process_one_work+0x8bb/0x1510
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230
[ 66.703974] ? rwlock_bug.part.0+0x90/0x90
[ 66.703974] ? _raw_spin_lock_irq+0x41/0x50
[ 66.703974] worker_thread+0x575/0x1190
[ 66.703974] ? process_one_work+0x1510/0x1510
[ 66.703974] kthread+0x2a0/0x340
[ 66.703974] ? kthread_complete_and_exit+0x20/0x20
[ 66.703974] ret_from_fork+0x22/0x30
[ 66.703974] </TASK>
[ 66.703974]
[ 66.703974] Allocated by task 267:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] __kasan_kmalloc+0x81/0xa0
[ 66.703974] nci_allocate_device+0xd3/0x390
[ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0
[ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd
[ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0
[ 66.703974] tty_ioctl+0x764/0x1310
[ 66.703974] __x64_sys_ioctl+0x122/0x190
[ 66.703974] do_syscall_64+0x3b/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 66.703974]
[ 66.703974] Freed by task 406:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] kasan_set_track+0x21/0x30
[ 66.703974] kasan_set_free_info+0x20/0x30
[ 66.703974] __kasan_slab_free+0x108/0x170
[ 66.703974] kfree+0xb0/0x330
[ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0
[ 66.703974] nci_uart_tty_close+0xdf/0x180
[ 66.703974] tty_ldisc_kill+0x73/0x110
[ 66.703974] tty_ldisc_hangup+0x281/0x5b0
[ 66.703974] __tty_hangup.part.0+0x431/0x890
[ 66.703974] tty_release+0x3a8/0xc80
[ 66.703974] __fput+0x1f0/0x8c0
[ 66.703974] task_work_run+0xc9/0x170
[ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0
[ 66.703974] syscall_exit_to_user_mode+0x19/0x50
[ 66.703974] do_syscall_64+0x48/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v5.18
First set of fixes for v5.18. Maintainers file updates, two
compilation warning fixes, one revert for ath11k and smaller fixes to
drivers and stack. All the usual stuff.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 11fd667dac315ea3f2469961f6d2869271a46cae.
dsa_slave_change_mtu() updates the MTU of the DSA master and of the
associated CPU port, but only if it detects a change to the master MTU.
The blamed commit in the Fixes: tag below addressed a regression where
dsa_slave_change_mtu() would return early and not do anything due to
ds->ops->port_change_mtu() not being implemented.
However, that commit also had the effect that the master MTU got set up
to the correct value by dsa_master_setup(), but the associated CPU port's
MTU did not get updated. This causes breakage for drivers that rely on
the ->port_change_mtu() DSA call to account for the tagging overhead on
the CPU port, and don't set up the initial MTU during the setup phase.
Things actually worked before because they were in a fragile equilibrium
where dsa_slave_change_mtu() was called before dsa_master_setup() was.
So dsa_slave_change_mtu() could actually detect a change and update the
CPU port MTU too.
Restore the code to the way things used to work by reverting the reorder
of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did
not have a concrete motivation going for it anyway, it just looked
better.
Fixes: 066dfc429040 ("Revert "net: dsa: stop updating master MTU from master.c"")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a write performance regression
- Fix crashes during request deferral on RDMA transports
* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix the svc_deferred_event trace class
SUNRPC: Fix NFSD's request deferral on RDMA transports
nfsd: Clean up nfsd_file_put()
nfsd: Fix a write performance regression
SUNRPC: Return true/false (not 1/0) from bool functions
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix cgroupv2 from the input path, from Florian Westphal.
2) Fix incorrect return value of nft_parse_register(), from Antoine Tenart.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: nft_parse_register can return a negative value
netfilter: nft_socket: make cgroup match work in input too
====================
Link: https://lore.kernel.org/r/20220412094246.448055-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 6e1acfa387b9 ("netfilter: nf_tables: validate registers
coming from userspace.") nft_parse_register can return a negative value,
but the function prototype is still returning an unsigned int.
Fixes: 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Function sctp_do_peeloff() wrongly initializes daddr of the original
socket instead of the peeled off socket, which makes getpeername()
return zeroes instead of the primary address. Initialize the new socket
instead.
Fixes: d570ee490fb1 ("[SCTP]: Correctly set daddr for IPv6 sockets during peeloff")
Signed-off-by: Petr Malat <oss@malat.biz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20220409063611.673193-1-oss@malat.biz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Child sockets may inherit the af_ops from the parent listen socket.
When the listen socket is released then the af_ops of the child socket
points to released memory.
Solve that by restoring the original af_ops for child sockets which
inherited the parent af_ops. And clear any inherited user_data of the
parent socket.
Fixes: 8270d9c21041 ("net/smc: Limit backlog connections")
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_name() was called with dev.parent as argument but without to
NULL-check it before.
Solve this by checking the pointer before the call to dev_name().
Fixes: af5f60c7e3d5 ("net/smc: allow PCI IDs as ib device names in the pnet table")
Reported-by: syzbot+03e3e228510223dabd34@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using snprintf() to convert not null-terminated strings to null
terminated strings may cause out of bounds read in the source string.
Therefore use memcpy() and terminate the target string with a null
afterwards.
Fixes: fa0866625543 ("net/smc: add support for user defined EIDs")
Fixes: 3c572145c24e ("net/smc: add generic netlink support for system EID")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
kongweibin reported a kernel panic in ip6_forward() when input interface
has no in6 dev associated.
The following tc commands were used to reproduce this panic:
tc qdisc del dev vxlan100 root
tc qdisc add dev vxlan100 root netem corrupt 5%
CC: stable@vger.kernel.org
Fixes: ccd27f05ae7b ("ipv6: fix 'disable_policy' for fwd packets")
Reported-by: kongweibin <kongweibin2@huawei.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
cgroupv2 helper function ignores the already-looked up sk
and uses skb->sk instead.
Just pass sk from the calling function instead; this will
make cgroup matching work for udp and tcp in input even when
edemux did not set skb->sk already.
Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Don't use sizeof(pointer) when calculating scnprintf offset.
Fixes: 01f84f0ed3b4 ("mac80211: reduce stack usage in debugfs")
Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220406175659.20611-1-greearb@candelatech.com
[correct the Fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Synchronize additions to nontrans_list of transmitting BSS with
bss_lock to avoid races. Also when cfg80211_add_nontrans_list() fails
__cfg80211_unlink_bss() needs bss_lock to be held (has lockdep assert
on bss_lock). So protect the whole block with bss_lock to avoid
races and warnings. Found during code review.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com>
Link: https://lore.kernel.org/r/1649668071-9370-1-git-send-email-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
We need this to be at least two bytes, so we can access
alpha2[0] and alpha2[1]. It may be three in case some
userspace used NUL-termination since it was NLA_STRING
(and we also push it out with NUL-termination).
Cc: stable@vger.kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed0d626fc54becf7ea799@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
A user may set the SO_TXTIME socket option to ensure a packet is send
at a given time. The taprio scheduler has to confirm, that it is allowed
to send a packet at that given time, by a check against the packet time
schedule. The scheduler drop the packet, if the gates are closed at the
given send time.
The check, if SO_TXTIME is set, may fail since sk_flags are part of an
union and the union is used otherwise. This happen, if a socket is not
a full socket, like a request socket for example.
Add a check to verify, if the union is used for sk_flags.
Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when inserting a new filter that needs to sit at the head
of chain 0, it will first update the heads pointer on all devices using
the (shared) block, and only then complete the initialization of the new
element so that it has a "next" element.
This can lead to a situation that the chain 0 head is propagated to
another CPU before the "next" initialization is done. When this race
condition is triggered, packets being matched on that CPU will simply
miss all other filters, and will flow through the stack as if there were
no other filters installed. If the system is using OVS + TC, such
packets will get handled by vswitchd via upcall, which results in much
higher latency and reordering. For other applications it may result in
packet drops.
This is reproducible with a tc only setup, but it varies from system to
system. It could be reproduced with a shared block amongst 10 veth
tunnels, and an ingress filter mirroring packets to another veth.
That's because using the last added veth tunnel to the shared block to
do the actual traffic, it makes the race window bigger and easier to
trigger.
The fix is rather simple, to just initialize the next pointer of the new
filter instance (tp) before propagating the head change.
The fixes tag is pointing to the original code though this issue should
only be observed when using it unlocked.
Fixes: 2190d1d0944f ("net: sched: introduce helpers to work with filter chains")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/b97d5f4eaffeeb9d058155bcab63347527261abf.1649341369.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Yi Chen reported an unexpected sctp connection abort, and it occurred when
COOKIE_ECHO is bundled with DATA Fragment by SCTP HW GSO. As the IP header
is included in chunk->head_skb instead of chunk->skb, it failed to check
IP header version in security_sctp_assoc_request().
According to Ondrej, SELinux only looks at IP header (address and IPsec
options) and XFRM state data, and these are all included in head_skb for
SCTP HW GSO packets. So fix it by using head_skb when calling
security_sctp_assoc_request() in processing COOKIE_ECHO.
v1->v2:
- As Ondrej noticed, chunk->head_skb should also be used for
security_sctp_assoc_established() in sctp_sf_do_5_1E_ca().
Fixes: e215dab1c490 ("security: call security_sctp_assoc_request in sctp_sf_do_5_1D_ce")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/71becb489e51284edf0c11fc15246f4ed4cef5b6.1649337862.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull NFS client fixes from Trond Myklebust:
"Stable fixes:
- SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
Bugfixes:
- Fix an Oopsable condition due to SLAB_ACCOUNT setting in the
NFSv4.2 xattr code.
- Fix for open() using an file open mode of '3' in NFSv4
- Replace readdir's use of xxhash() with hash_64()
- Several patches to handle malloc() failure in SUNRPC"
* tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()
SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
SUNRPC: Handle allocation failure in rpc_new_task()
NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()
NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget
SUNRPC: Handle low memory situations in call_status()
SUNRPC: Handle ENOMEM in call_transmit_status()
NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation
SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
NFS: Replace readdir's use of xxhash() with hash_64()
SUNRPC: handle malloc failure in ->request_prepare
NFSv4: fix open failure with O_ACCMODE flag
Revert "NFSv4: Handle the special Linux file open access mode"
|
|
Bounds checking is unhappy that we try to copy both Ethernet
addresses but pass pointer to the first one. Luckily destination
address is the first field so pass the pointer to the entire header,
whatever.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to
match the L2 ethertype following the first VLAN header, as confirmed by
linked discussion with the maintainer. However, such rule also matches
packets that have additional second VLAN header, even though filter has
both eth_type and vlan_ethtype set to "ipv4". Looking at the code this
seems to be mostly an artifact of the way flower uses flow dissector.
First, even though looking at the uAPI eth_type and vlan_ethtype appear
like a distinct fields, in flower they are all mapped to the same
key->basic.n_proto. Second, flow dissector skips following VLAN header as
no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the
value of n_proto to last parsed header. With these, such filters ignore any
headers present between first VLAN header and first "non magic"
header (ipv4 in this case) that doesn't result
FLOW_DISSECT_RET_PROTO_AGAIN.
Fix the issue by extending flow dissector VLAN key structure with new
'vlan_eth_type' field that matches first ethertype following previously
parsed VLAN header. Modify flower classifier to set the new
flow_dissector_key_vlan->vlan_eth_type with value obtained from
TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs.
Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/
Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support")
Fixes: d64efd0926ba ("net/sched: flower: Add supprt for matching on QinQ vlan headers")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The client and server have different requirements for their memory
allocation, so move the allocation of the send buffer out of the socket
send code that is common to both.
Reported-by: NeilBrown <neilb@suse.de>
Fixes: b2648015d452 ("SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The allocation is done with GFP_KERNEL, but it could still fail in a low
memory situation.
Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the call to rpc_alloc_task() fails, then ensure that the calldata is
released, and that rpc_run_task() and rpc_run_bc_task() bail out early.
Reported-by: NeilBrown <neilb@suse.de>
Fixes: 910ad38697d9 ("NFS: Fix memory allocation in rpc_alloc_task()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We need to handle ENFILE, ENOBUFS, and ENOMEM, because
xprt_wake_pending_tasks() can be called with any one of these due to
socket creation failures.
Fixes: b61d59fffd3e ("SUNRPC: xs_tcp_connect_worker{4,6}: merge common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Both call_transmit() and call_bc_transmit() can now return ENOMEM, so
let's make sure that we handle the errors gracefully.
Fixes: 0472e4766049 ("SUNRPC: Convert socket page send code to use iov_iter()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.
Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2022-04-06
We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).
The main changes are:
1) rethook related fixes, from Jiri and Masami.
2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
bpf: selftests: Test fentry tracing a struct_ops program
bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
rethook: Fix to use WRITE_ONCE() for rethook:: Handler
selftests/bpf: Fix warning comparing pointer to 0
bpf: Fix sparse warnings in kprobe_multi_resolve_syms
bpftool: Explicit errno handling in skeletons
====================
Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
investigation shows that the crash occurred while NFSD was handling
a deferred request.
This patch addresses two inter-related issues that prevent request
deferral from working correctly for RPC/RDMA requests:
1. Prevent the crash by ensuring that the original
svc_rqst::rq_xprt_ctxt value is available when the request is
revisited. Otherwise svc_rdma_sendto() does not have a Receive
context available with which to construct its reply.
2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
compute .len and .buflen for received RPC Calls"),
svc_rdma_recvfrom() did not include the transport header in the
returned xdr_buf. There should have been no need for svc_defer()
and friends to save and restore that header, as of that commit.
This issue is addressed in a backport-friendly way by simply
having svc_rdma_recvfrom() set rq_xprt_hlen to zero
unconditionally, just as svc_tcp_recvfrom() does. This enables
svc_deferred_recv() to correctly reconstruct an RPC message
received via RPC/RDMA.
Reported-by: Trond Myklebust <trondmy@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
|
|
bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.
This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.
Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
|
|
net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'
Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.
Fixes: 4b340a5a726d ("net: ip6mr: add support for passing full packet on wrong mif")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code can lead to the following race:
CPU0 CPU1
rxrpc_exit_net()
rxrpc_peer_keepalive_worker()
if (rxnet->live)
rxnet->live = false;
del_timer_sync(&rxnet->peer_keepalive_timer);
timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
leading to use-after-free.
syzbot report was:
ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
kfree+0xd6/0x310 mm/slab.c:3809
ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
ops_free_list net/core/net_namespace.c:174 [inline]
cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case). The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new
actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked. sample() and clone() actions
are mixing extra attributes into the user-provided action list. That
prevents some code generalization too.
Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst")
Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html
Reported-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel. It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.
The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump. However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:
# ovs-dpctl add-dp system@ovs-system
# ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)"
# ovs-dpctl dump-flows
<flow match>, packets:0, bytes:0, used:never,
actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
ct(commit),0)
With the fix:
# ovs-dpctl dump-flows
<flow match>, packets:0, bytes:0, used:never,
actions:clone(ct(commit),0)
Additionally fixed an incorrect attribute name in the comment.
Fixes: b233504033db ("openvswitch: kernel datapath clone action")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Incorrect comparison in bitmask .reduce, from Jeremy Sowden.
2) Missing GFP_KERNEL_ACCOUNT for dynamically allocated objects,
from Vasily Averin.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: memcg accounting for dynamically allocated objects
netfilter: bitwise: fix reduce comparisons
====================
Link: https://lore.kernel.org/r/20220405100923.7231-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.
Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: Pudak, Filip <Filip.Pudak@windriver.com>
Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to
use __GFP_ACCOUNT flag for objects that are dynamically
allocated from the packet path.
Such objects are allocated inside nft_expr_ops->init() callbacks
executed in task context while processing netlink messages.
In addition, this patch adds accounting to nft_set_elem_expr_clone()
used for the same purposes.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN-
COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks"
counter available to the assoc owner.
These are all control chunks so they should be counted as such.
Add counting of singleton chunks so they are properly accounted for.
Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
FRR folks have hit a kernel warning[1] while deleting routes[2] which is
caused by trying to delete a route pointing to a nexthop id without
specifying nhid but matching on an interface. That is, a route is found
but we hit a warning while matching it. The warning is from
fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
with nexthop object. The call chain is:
inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
the fib_info and triggering the warning). The fix is to not do any
matching in that branch if the fi has a nexthop object because those are
managed separately. I.e. we should match when deleting without nh spec and
should fail when deleting a nexthop route with old-style nh spec because
nexthop objects are managed separately, e.g.:
$ ip r show 1.2.3.4/32
1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
$ ip r del 1.2.3.4/32
$ ip r del 1.2.3.4/32 nhid 12
<both should work>
$ ip r del 1.2.3.4/32 dev dummy0
<should fail with ESRCH>
[1]
[ 523.462226] ------------[ cut here ]------------
[ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
[ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
[ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
[ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1
[ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
[ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460
[ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
[ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
[ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
[ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
[ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
[ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
[ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
[ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
[ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
[ 523.462315] Call Trace:
[ 523.462316] <TASK>
[ 523.462320] fib_table_delete+0x1a9/0x310
[ 523.462323] inet_rtm_delroute+0x93/0x110
[ 523.462325] rtnetlink_rcv_msg+0x133/0x370
[ 523.462327] ? _copy_to_iter+0xb5/0x6f0
[ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110
[ 523.462331] netlink_rcv_skb+0x50/0xf0
[ 523.462334] netlink_unicast+0x211/0x330
[ 523.462336] netlink_sendmsg+0x23f/0x480
[ 523.462338] sock_sendmsg+0x5e/0x60
[ 523.462340] ____sys_sendmsg+0x22c/0x270
[ 523.462341] ? import_iovec+0x17/0x20
[ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90
[ 523.462344] ? __mod_lruvec_page_state+0x85/0x110
[ 523.462348] ___sys_sendmsg+0x81/0xc0
[ 523.462350] ? netlink_seq_start+0x70/0x70
[ 523.462352] ? __dentry_kill+0x13a/0x180
[ 523.462354] ? __fput+0xff/0x250
[ 523.462356] __sys_sendmsg+0x49/0x80
[ 523.462358] do_syscall_64+0x3b/0x90
[ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 523.462364] RIP: 0033:0x7f24552aa337
[ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
[ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
[ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
[ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
[ 523.462373] </TASK>
[ 523.462374] ---[ end trace ba537bc16f6bf4ed ]---
[2] https://github.com/FRRouting/frr/issues/6412
Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN,
but that isn't sufficient if we are using devs that are not MCTP
specific.
This also adds a check that the smctp_halen provided to sendmsg for
extended addressing is the correct size for the netdev.
Fixes: 833ef3b91de6 ("mctp: Populate socket implementation")
Reported-by: Matthew Rinaldi <mjrinal@g.clemson.edu>
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_hard_header() returns the length of the header, so
we need to test for negative errors rather than non-zero.
Fixes: 889b7da23abf ("mctp: Add initial routing framework")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit a1ff94c2973c43bc1e2677ac63ebb15b1d1ff846.
Switch drivers that don't implement ->port_change_mtu() will cause the
DSA master to remain with an MTU of 1500, since we've deleted the other
code path. In turn, this causes a regression for those systems, where
MTU-sized traffic can no longer be terminated.
Revert the change taking into account the fact that rtnl_lock() is now
taken top-level from the callers of dsa_master_setup() and
dsa_master_teardown(). Also add a comment in order for it to be
absolutely clear why it is still needed.
Fixes: a1ff94c2973c ("net: dsa: stop updating master MTU from master.c")
Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a use-after-free when using page_pool with page fragments. We
encountered this problem during normal RX in the hns3 driver:
(1) Initially we have three descriptors in the RX queue. The first one
allocates PAGE1 through page_pool, and the other two allocate one
half of PAGE2 each. Page references look like this:
RX_BD1 _______ PAGE1
RX_BD2 _______ PAGE2
RX_BD3 _________/
(2) Handle RX on the first descriptor. Allocate SKB1, eventually added
to the receive queue by tcp_queue_rcv().
(3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
netif_receive_skb():
netif_receive_skb(SKB2)
ip_rcv(SKB2)
SKB3 = skb_clone(SKB2)
SKB2 and SKB3 share a reference to PAGE2 through
skb_shinfo()->dataref. The other ref to PAGE2 is still held by
RX_BD3:
SKB2 ---+- PAGE2
SKB3 __/ /
RX_BD3 _________/
(3b) Now while handling TCP, coalesce SKB3 with SKB1:
tcp_v4_rcv(SKB3)
tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds
kfree_skb_partial(SKB3)
skb_release_data(SKB3) // drops one dataref
SKB1 _____ PAGE1
\____
SKB2 _____ PAGE2
/
RX_BD3 _________/
In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
PAGE2, where it should instead have increased the page_pool frag
reference, pp_frag_count. Without coalescing, when releasing both
SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
when releasing SKB1 and SKB2, two references to PAGE2 will be
dropped, resulting in underflow.
(3c) Drop SKB2:
af_packet_rcv(SKB2)
consume_skb(SKB2)
skb_release_data(SKB2) // drops second dataref
page_pool_return_skb_page(PAGE2) // drops one pp_frag_count
SKB1 _____ PAGE1
\____
PAGE2
/
RX_BD3 _________/
(4) Userspace calls recvmsg()
Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
release the SKB3 page as well:
tcp_eat_recv_skb(SKB1)
skb_release_data(SKB1)
page_pool_return_skb_page(PAGE1)
page_pool_return_skb_page(PAGE2) // drops second pp_frag_count
(5) PAGE2 is freed, but the third RX descriptor was still using it!
In our case this causes IOMMU faults, but it would silently corrupt
memory if the IOMMU was disabled.
Change the logic that checks whether pp_recycle SKBs can be coalesced.
We still reject differing pp_recycle between 'from' and 'to' SKBs, but
in order to avoid the situation described above, we also reject
coalescing when both 'from' and 'to' are pp_recycled and 'from' is
cloned.
The new logic allows coalescing a cloned pp_recycle SKB into a page
refcounted one, because in this case the release (4) will drop the right
reference, the one taken by skb_try_coalesce().
Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
tls_set_sw_offload(). The return value of crypto_aead_ivsize()
for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
memory space will trigger slab-out-of-bounds bug as following:
==================================================================
BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
Read of size 16 at addr ffff888114e84e60 by task tls/10911
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x5e/0x5db
? decrypt_internal+0x385/0xc40 [tls]
kasan_report+0xab/0x120
? decrypt_internal+0x385/0xc40 [tls]
kasan_check_range+0xf9/0x1e0
memcpy+0x20/0x60
decrypt_internal+0x385/0xc40 [tls]
? tls_get_rec+0x2e0/0x2e0 [tls]
? process_rx_list+0x1a5/0x420 [tls]
? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
decrypt_skb_update+0x9d/0x400 [tls]
tls_sw_recvmsg+0x3c8/0xb50 [tls]
Allocated by task 10911:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
tls_set_sw_offload+0x2eb/0xa20 [tls]
tls_setsockopt+0x68c/0x700 [tls]
__sys_setsockopt+0xfe/0x1b0
Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
when memcpy() iv value in TLS_1_3_VERSION scenario.
Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking updates from Jakub Kicinski:
"Networking fixes and rethook patches.
Features:
- kprobes: rethook: x86: replace kretprobe trampoline with rethook
Current release - regressions:
- sfc: avoid null-deref on systems without NUMA awareness in the new
queue sizing code
Current release - new code bugs:
- vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
- eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
interface is down
Previous releases - always broken:
- openvswitch: correct neighbor discovery target mask field in the
flow dump
- wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
- rxrpc: fix call timer start racing with call destruction
- rxrpc: fix null-deref when security type is rxrpc_no_security
- can: fix UAF bugs around echo skbs in multiple drivers
Misc:
- docs: move netdev-FAQ to the 'process' section of the
documentation"
* tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
openvswitch: Add recirc_id to recirc warning
rxrpc: fix some null-ptr-deref bugs in server_key.c
rxrpc: Fix call timer start racing with call destruction
net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
net: hns3: fix the concurrency between functions reading debugfs
docs: netdev: move the netdev-FAQ to the process pages
docs: netdev: broaden the new vs old code formatting guidelines
docs: netdev: call out the merge window in tag checking
docs: netdev: add missing back ticks
docs: netdev: make the testing requirement more stringent
docs: netdev: add a question about re-posting frequency
docs: netdev: rephrase the 'should I update patchwork' question
docs: netdev: rephrase the 'Under review' question
docs: netdev: shorten the name and mention msgid for patch status
docs: netdev: note that RFC postings are allowed any time
docs: netdev: turn the net-next closed into a Warning
docs: netdev: move the patch marking section up
docs: netdev: minor reword
docs: netdev: replace references to old archives
...
|
|
When hitting the recirculation limit, the kernel would currently log
something like this:
[ 58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action
Which isn't all that useful to debug as we only have the interface name
to go on but can't track it down to a specific flow.
With this change, we now instead get:
[ 58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action (recirc_id=0x9e)
Which can now be correlated with the flow entries from OVS.
Suggested-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Tested-by: Stephane Graber <stgraber@ubuntu.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20220330194244.3476544-1-stgraber@ubuntu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-03-31
The first patch is by Oliver Hartkopp and fixes MSG_PEEK feature in
the CAN ISOTP protocol (broken in net-next for v5.18 only).
Tom Rix's patch for the mcp251xfd driver fixes the propagation of an
error value in case of an error.
A patch by me for the m_can driver fixes a use-after-free in the xmit
handler for m_can IP cores v3.0.x.
Hangyu Hua contributes 3 patches fixing the same double free in the
error path of the xmit handler in the ems_usb, usb_8dev and mcba_usb
USB CAN driver.
Pavel Skripkin contributes a patch for the mcba_usb driver to properly
check the endpoint type.
The last patch is by me and fixes a mem leak in the gs_usb, which was
introduced in net-next for v5.18.
* tag 'linux-can-fixes-for-5.18-20220331' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gs_usb: gs_make_candev(): fix memory leak for devices with extended bit timing configuration
can: mcba_usb: properly check endpoint type
can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path
can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
can: m_can: m_can_tx_handler(): fix use after free of skb
can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value
can: isotp: restore accidentally removed MSG_PEEK feature
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some function calls are not implemented in rxrpc_no_security, there are
preparse_server_key, free_preparse_server_key and destroy_server_key.
When rxrpc security type is rxrpc_no_security, user can easily trigger a
null-ptr-deref bug via ioctl. So judgment should be added to prevent it
The crash log:
user@syzkaller:~$ ./rxrpc_preparse_s
[ 37.956878][T15626] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 37.957645][T15626] #PF: supervisor instruction fetch in kernel mode
[ 37.958229][T15626] #PF: error_code(0x0010) - not-present page
[ 37.958762][T15626] PGD 4aadf067 P4D 4aadf067 PUD 4aade067 PMD 0
[ 37.959321][T15626] Oops: 0010 [#1] PREEMPT SMP
[ 37.959739][T15626] CPU: 0 PID: 15626 Comm: rxrpc_preparse_ Not tainted 5.17.0-01442-gb47d5a4f6b8d #43
[ 37.960588][T15626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 37.961474][T15626] RIP: 0010:0x0
[ 37.961787][T15626] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 37.962480][T15626] RSP: 0018:ffffc9000d9abdc0 EFLAGS: 00010286
[ 37.963018][T15626] RAX: ffffffff84335200 RBX: ffff888012a1ce80 RCX: 0000000000000000
[ 37.963727][T15626] RDX: 0000000000000000 RSI: ffffffff84a736dc RDI: ffffc9000d9abe48
[ 37.964425][T15626] RBP: ffffc9000d9abe48 R08: 0000000000000000 R09: 0000000000000002
[ 37.965118][T15626] R10: 000000000000000a R11: f000000000000000 R12: ffff888013145680
[ 37.965836][T15626] R13: 0000000000000000 R14: ffffffffffffffec R15: ffff8880432aba80
[ 37.966441][T15626] FS: 00007f2177907700(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 37.966979][T15626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.967384][T15626] CR2: ffffffffffffffd6 CR3: 000000004aaf1000 CR4: 00000000000006f0
[ 37.967864][T15626] Call Trace:
[ 37.968062][T15626] <TASK>
[ 37.968240][T15626] rxrpc_preparse_s+0x59/0x90
[ 37.968541][T15626] key_create_or_update+0x174/0x510
[ 37.968863][T15626] __x64_sys_add_key+0x139/0x1d0
[ 37.969165][T15626] do_syscall_64+0x35/0xb0
[ 37.969451][T15626] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 37.969824][T15626] RIP: 0033:0x43a1f9
Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
Tested-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005069.html
Fixes: 12da59fcab5a ("rxrpc: Hand server key parsing off to the security class")
Link: https://lore.kernel.org/r/164865013439.2941502.8966285221215590921.stgit@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|