aboutsummaryrefslogtreecommitdiff
path: root/net/mptcp
AgeCommit message (Collapse)Author
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-06-01Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Introduce crypto_shash_tfm_digest() and use it wherever possible. - Fix use-after-free and race in crypto_spawn_alg. - Add support for parallel and batch requests to crypto_engine. Algorithms: - Update jitter RNG for SP800-90B compliance. - Always use jitter RNG as seed in drbg. Drivers: - Add Arm CryptoCell driver cctrng. - Add support for SEV-ES to the PSP driver in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits) crypto: hisilicon - fix driver compatibility issue with different versions of devices crypto: engine - do not requeue in case of fatal error crypto: cavium/nitrox - Fix a typo in a comment crypto: hisilicon/qm - change debugfs file name from qm_regs to regs crypto: hisilicon/qm - add DebugFS for xQC and xQE dump crypto: hisilicon/zip - add debugfs for Hisilicon ZIP crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC crypto: hisilicon/qm - add debugfs to the QM state machine crypto: hisilicon/qm - add debugfs for QM crypto: stm32/crc32 - protect from concurrent accesses crypto: stm32/crc32 - don't sleep in runtime pm crypto: stm32/crc32 - fix multi-instance crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: hisilicon/zip - Use temporary sqe when doing work crypto: hisilicon - add device error report through abnormal irq crypto: hisilicon - remove codes of directly report device errors through MSI crypto: hisilicon - QM memory management optimization crypto: hisilicon - unify initial value assignment into QM ...
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30mptcp: fix NULL ptr dereference in MP_JOIN error pathPaolo Abeni
When token lookup on MP_JOIN 3rd ack fails, the server socket closes with a reset the incoming child. Such socket has the 'is_mptcp' flag set, but no msk socket associated - due to the failed lookup. While crafting the reset packet mptcp_established_options_mp() will try to dereference the child's master socket, causing a NULL ptr dereference. This change addresses the issue with explicit fallback to TCP in such error path. Fixes: 729cd6436f35 ("mptcp: cope better with MP_JOIN failure") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30mptcp: remove msk from the token container at destruction time.Paolo Abeni
Currently we remote the msk from the token container only via mptcp_close(). The MPTCP master socket can be destroyed also via other paths (e.g. if not yet accepted, when shutting down the listener socket). When we hit the latter scenario, dangling msk references are left into the token container, leading to memory corruption and/or UaF. This change addresses the issue by moving the token removal into the msk destructor. Fixes: 79c0949e9a09 ("mptcp: Add key generation and token tree") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30mptcp: fix race between MP_JOIN and closePaolo Abeni
If a MP_JOIN subflow completes the 3whs while another CPU is closing the master msk, we can hit the following race: CPU1 CPU2 close() mptcp_close subflow_syn_recv_sock mptcp_token_get_sock mptcp_finish_join inet_sk_state_load mptcp_token_destroy inet_sk_state_store(TCP_CLOSE) __mptcp_flush_join_list() mptcp_sock_graft list_add_tail sk_common_release sock_orphan() <socket free> The MP_JOIN socket will be leaked. Additionally we can hit UaF for the msk 'struct socket' referenced via the 'conn' field. This change try to address the issue introducing some synchronization between the MP_JOIN 3whs and mptcp_close via the join_list spinlock. If we detect the msk is closing the MP_JOIN socket is closed, too. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30mptcp: fix unblocking connect()Paolo Abeni
Currently unblocking connect() on MPTCP sockets fails frequently. If mptcp_stream_connect() is invoked to complete a previously attempted unblocking connection, it will still try to create the first subflow via __mptcp_socket_create(). If the 3whs is completed and the 'can_ack' flag is already set, the latter will fail with -EINVAL. This change addresses the issue checking for pending connect and delegating the completion to the first subflow. Additionally do msk addresses and sk_state changes only when needed. Fixes: 2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mptcp: attempt coalescing when moving skbs to mptcp rx queueFlorian Westphal
We can try to coalesce skbs we take from the subflows rx queue with the tail of the mptcp rx queue. If successful, the skb head can be discarded early. We can also free the skb extensions, we do not access them after this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26mptcp: avoid NULL-ptr derefence on fallbackPaolo Abeni
In the MPTCP receive path we must cope with TCP fallback on blocking recvmsg(). Currently in such code path we detect the fallback condition, but we don't fetch the struct socket required for fallback. The above allowed syzkaller to trigger a NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline] RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904 Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d RSP: 0018:ffffc90001077b98 EFLAGS: 00010202 RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000 FS: 00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891 inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:886 [inline] sock_recvmsg net/socket.c:904 [inline] __sys_recvfrom+0x2f3/0x470 net/socket.c:2057 __do_sys_recvfrom net/socket.c:2075 [inline] __se_sys_recvfrom net/socket.c:2071 [inline] __x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071 do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Address the issue initializing the struct socket reference before entering the fallback code. Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com Suggested-by: Ondrej Mosnacek <omosnace@redhat.com> Fixes: 8ab183deb26a ("mptcp: cope with later TCP fallback") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22mptcp: use untruncated hash in ADD_ADDR HMACTodd Malsbary
There is some ambiguity in the RFC as to whether the ADD_ADDR HMAC is the rightmost 64 bits of the entire hash or of the leftmost 160 bits of the hash. The intention, as clarified with the author of the RFC, is the entire hash. This change returns the entire hash from mptcp_crypto_hmac_sha (instead of only the first 160 bits), and moves any truncation/selection operation on the hash to the caller. Fixes: 12555a2d97e5 ("mptcp: use rightmost 64 bits in ADD_ADDR HMAC") Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Todd Malsbary <todd.malsbary@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19mptcp: use rightmost 64 bits in ADD_ADDR HMACTodd Malsbary
This changes the HMAC used in the ADD_ADDR option from the leftmost 64 bits to the rightmost 64 bits as described in RFC 8684, section 3.4.1. This issue was discovered while adding support to packetdrill for the ADD_ADDR v1 option. Fixes: 3df523ab582c ("mptcp: Add ADD_ADDR handling") Signed-off-by: Todd Malsbary <todd.malsbary@linux.intel.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-18ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig
To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17net: allow __skb_ext_alloc to sleepFlorian Westphal
mptcp calls this from the transmit side, from process context. Allow a sleeping allocation instead of unconditional GFP_ATOMIC. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: remove inner wait loop from mptcp_sendmsg_fragFlorian Westphal
previous patches made sure we only call into this function when these prerequisites are met, so no need to wait on the subflow socket anymore. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: fill skb page frag cache outside of mptcp_sendmsg_fragFlorian Westphal
The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is another preparation patch that makes sure we call mptcp_sendmsg_frag only if the page frag cache has been refilled. Followup patch will remove the wait loop from mptcp_sendmsg_frag(). The retransmit worker doesn't need to do this refill as it won't transmit new mptcp-level data. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: fill skb extension cache outside of mptcp_sendmsg_fragFlorian Westphal
The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is a preparation patch that makes sure we call mptcp_sendmsg_frag only if a skb extension has been allocated. Moreover, such allocation currently uses GFP_ATOMIC while it could use sleeping allocation instead. Followup patches will remove the wait loop from mptcp_sendmsg_frag() and will allow to do a sleeping allocation for the extension. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: avoid blocking in tcp_sendpagesFlorian Westphal
The transmit loop continues to xmit new data until an error is returned or all data was transmitted. For the blocking i/o case, this means that tcp_sendpages() may block on the subflow until more space becomes available, i.e. we end up sleeping with the mptcp socket lock held. Instead we should check if a different subflow is ready to be used. This restarts the subflow sk lookup when the tx operation succeeded and the tcp subflow can't accept more data or if tcp_sendpages indicates -EAGAIN on a blocking mptcp socket. In that case we also need to set the NOSPACE bit to make sure we get notified once memory becomes available. In case all subflows are busy, the existing logic will wait until a subflow is ready, releasing the mptcp socket lock while doing so. The mptcp worker already sets DONTWAIT, so no need to make changes there. v2: * set NOSPACE bit * add a comment to clarify that mptcp-sk sndbuf limits need to be checked as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: break and restart in case mptcp sndbuf is fullFlorian Westphal
Its not enough to check for available tcp send space. We also hold on to transmitted data for mptcp-level retransmits. Right now we will send more and more data if the peer can ack data at the tcp level fast enough, since that frees up tcp send buffer space. But we also need to check that data was acked and reclaimed at the mptcp level. Therefore add needed check in mptcp_sendmsg, flush tcp data and wait until more mptcp snd space becomes available if we are over the limit. Before we wait for more data, also make sure we start the retransmit timer if we ran out of sndbuf space. Otherwise there is a very small chance that we wait forever: * receiver is waiting for data * sender is blocked because mptcp socket buffer is full * at tcp level, all data was acked * mptcp-level snd_una was not updated, because last ack that acknowledged the last data packet carried an older MPTCP-ack. Restarting the retransmit timer avoids this problem: if TCP subflow is idle, data is retransmitted from the RTX queue. New data will make the peer send a new, updated MPTCP-Ack. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17mptcp: move common nospace-pattern to a helperFlorian Westphal
Paolo noticed that ssk_check_wmem() has same pattern, so add/use common helper for both places. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16mptcp: Use 32-bit DATA_ACK when possibleChristoph Paasch
RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not sending 64-bit data-sequence numbers. The 64-bit DSN is only there for extreme scenarios when a very high throughput subflow is combined with a long-RTT subflow such that the high-throughput subflow wraps around the 32-bit sequence number space within an RTT of the high-RTT subflow. It is thus a rare scenario and we should try to use the 32-bit DATA_ACK instead as long as possible. It allows to reduce the TCP-option overhead by 4 bytes, thus makes space for an additional SACK-block. It also makes tcpdumps much easier to read when the DSN and DATA_ACK are both either 32 or 64-bit. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15mptcp: cope better with MP_JOIN failurePaolo Abeni
Currently, on MP_JOIN failure we reset the child socket, but leave the request socket untouched. tcp_check_req will deal with it according to the 'tcp_abort_on_overflow' sysctl value - by default the req socket will stay alive. The above leads to inconsistent behavior on MP JOIN failure, and bad listener overflow accounting. This patch addresses the issue leveraging the infrastructure just introduced to ask the TCP stack to drop the req on failure. The child socket is not freed anymore by subflow_syn_recv_sock(), instead it's moved to a dead state and will be disposed by the next sock_put done by the TCP stack, so that listener overflow accounting is not affected by MP JOIN failure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15mptcp: add new sock flag to deal with join subflowsPaolo Abeni
MP_JOIN subflows must not land into the accept queue. Currently tcp_check_req() calls an mptcp specific helper to detect such scenario. Such helper leverages the subflow context to check for MP_JOIN subflows. We need to deal also with MP JOIN failures, even when the subflow context is not available due allocation failure. A possible solution would be changing the syn_recv_sock() signature to allow returning a more descriptive action/ error code and deal with that in tcp_check_req(). Since the above need is MPTCP specific, this patch instead uses a TCP request socket hole to add a MPTCP specific flag. Such flag is used by the MPTCP syn_recv_sock() to tell tcp_check_req() how to deal with the request socket. This change is a no-op for !MPTCP build, and makes the MPTCP code simpler. It allows also the next patch to deal correctly with MP JOIN failure. v1 -> v2: - be more conservative on drop_req initialization (Mat) RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12mptcp: Initialize map_seq upon subflow establishmentChristoph Paasch
When the other MPTCP-peer uses 32-bit data-sequence numbers, we rely on map_seq to indicate how to expand to a 64-bit data-sequence number in expand_seq() when receiving data. For new subflows, this field is not initialized, thus results in an "invalid" mapping being discarded. Fix this by initializing map_seq upon subflow establishment time. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-08mptcp: use SHA256_BLOCK_SIZE, not SHA_MESSAGE_BYTESEric Biggers
In preparation for naming the SHA-1 stuff in <linux/cryptohash.h> properly and moving it to a more appropriate header, fix the HMAC-SHA256 code in mptcp_crypto_hmac_sha() to use SHA256_BLOCK_SIZE instead of "SHA_MESSAGE_BYTES" which is actually the SHA-1 block size. (Fortunately these are both 64 bytes, so this wasn't a "real" bug...) Cc: Paolo Abeni <pabeni@redhat.com> Cc: mptcp@lists.01.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-07mptcp: set correct vfs info for subflowsPaolo Abeni
When a subflow is created via mptcp_subflow_create_socket(), a new 'struct socket' is allocated, with a new i_ino value. When inspecting TCP sockets via the procfs and or the diag interface, the above ones are not related to the process owning the MPTCP master socket, even if they are a logical part of it ('ss -p' shows an empty process field) Additionally, subflows created by the path manager get the uid/gid from the running workqueue. Subflows are part of the owning MPTCP master socket, let's adjust the vfs info to reflect this. After this patch, 'ss' correctly displays subflows as belonging to the msk socket creator. Fixes: 2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: fix uninitialized value accessPaolo Abeni
tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning a not NULL 'child', let's check 'own_req' only if child is available to avoid an - unharmful - UBSAN splat. v1 -> v2: - reference the correct hash Fixes: 4c8941de781c ("mptcp: avoid flipping mp_capable field in syn_recv_sock()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: initialize the data_fin field for mpc packetsPaolo Abeni
When parsing MPC+data packets we set the dss field, so we must also initialize the data_fin, or we can find stray value there. Fixes: 9a19371bf029 ("mptcp: fix data_fin handing in RX path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: fix 'use_ack' option access.Paolo Abeni
The mentioned RX option field is initialized only for DSS packet, we must access it only if 'dss' is set too, or the subflow will end-up in a bad status, leading to RFC violations. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: avoid a WARN on bad input.Paolo Abeni
Syzcaller has found a way to trigger the WARN_ON_ONCE condition in check_fully_established(). The root cause is a legit fallback to TCP scenario, so replace the WARN with a plain message on a more strict condition. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: move option parsing into mptcp_incoming_options()Paolo Abeni
The mptcp_options_received structure carries several per packet flags (mp_capable, mp_join, etc.). Such fields must be cleared on each packet, even on dropped ones or packet not carrying any MPTCP options, but the current mptcp code clears them only on TCP option reset. On several races/corner cases we end-up with stray bits in incoming options, leading to WARN_ON splats. e.g.: [ 171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713 [ 171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata [ 171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95 [ 171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c [ 171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282 [ 171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e [ 171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955 [ 171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca [ 171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020 [ 171.228460] FS: 00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000 [ 171.230065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0 [ 171.232586] Call Trace: [ 171.233109] <IRQ> [ 171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691) [ 171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832) [ 171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1)) [ 171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217) [ 171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822) [ 171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730) [ 171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009) [ 171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1)) [ 171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232) [ 171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252) [ 171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539) [ 171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135) [ 171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640) [ 171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293) [ 171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083) [ 171.282358] </IRQ> We could address the issue clearing explicitly the relevant fields in several places - tcp_parse_option, tcp_fast_parse_options, possibly others. Instead we move the MPTCP option parsing into the already existing mptcp ingress hook, so that we need to clear the fields in a single place. This allows us dropping an MPTCP hook from the TCP code and removing the quite large mptcp_options_received from the tcp_sock struct. On the flip side, the MPTCP sockets will traverse the option space twice (in tcp_parse_option() and in mptcp_incoming_options(). That looks acceptable: we already do that for syn and 3rd ack packets, plain TCP socket will benefit from it, and even MPTCP sockets will experience better code locality, reducing the jumps between TCP and MPTCP code. v1 -> v2: - rebased on current '-net' tree Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: consolidate synack processing.Paolo Abeni
Currently the MPTCP code uses 2 hooks to process syn-ack packets, mptcp_rcv_synsent() and the sk_rx_dst_set() callback. We can drop the first, moving the relevant code into the latter, reducing the hooking into the TCP code. This is also needed by the next patch. v1 -> v2: - use local tcp sock ptr instead of casting the sk variable several times - DaveM Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29mptcp: replace mptcp_disconnect with a stubFlorian Westphal
Paolo points out that mptcp_disconnect is bogus: "lock_sock(sk); looks suspicious (lock should be already held by the caller) And call to: tcp_disconnect(sk, flags); too, sk is not a tcp socket". ->disconnect() gets called from e.g. inet_stream_connect when one tries to disassociate a connected socket again (to re-connect without closing the socket first). MPTCP however uses mptcp_stream_connect, not inet_stream_connect, for the mptcp-socket connect call. inet_stream_connect only gets called indirectly, for the tcp socket, so any ->disconnect() calls end up calling tcp_disconnect for that tcp subflow sk. This also explains why syzkaller has not yet reported a problem here. So for now replace this with a stub that doesn't do anything. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/14 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25mptcp: fix race in msk status updatePaolo Abeni
Currently subflow_finish_connect() changes unconditionally any msk socket status other than TCP_ESTABLISHED. If an unblocking connect() races with close(), we can end-up triggering: IPv4: Attempt to release TCP socket in state 1 00000000e32b8b7e when the msk socket is disposed. Be sure to enter the established status only from SYN_SENT. Fixes: c3c123d16c0e ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25tcp: mptcp: use mptcp receive buffer space to select rcv windowFlorian Westphal
In MPTCP, the receive window is shared across all subflows, because it refers to the mptcp-level sequence space. MPTCP receivers already place incoming packets on the mptcp socket receive queue and will charge it to the mptcp socket rcvbuf until userspace consumes the data. Update __tcp_select_window to use the occupancy of the parent/mptcp socket instead of the subflow socket in case the tcp socket is part of a logical mptcp connection. This commit doesn't change choice of initial window for passive or active connections. While it would be possible to change those as well, this adds complexity (especially when handling MP_JOIN requests). Furthermore, the MPTCP RFC specifically says that a MPTCP sender 'MUST NOT use the RCV.WND field of a TCP segment at the connection level if it does not also carry a DSS option with a Data ACK field.' SYN/SYNACK packets do not carry a DSS option with a Data ACK field. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23mptcp/pm_netlink.c : add check for nla_put_in/6_addrBo YU
Normal there should be checked for nla_put_in6_addr like other usage in net. Detected by CoverityScan, CID# 1461639 Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Signed-off-by: Bo YU <tsu.yubo@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22mptcp: fix data_fin handing in RX pathPaolo Abeni
The data fin flag is set only via a DSS option, but mptcp_incoming_options() copies it unconditionally from the provided RX options. Since we do not clear all the mptcp sock RX options in a socket free/alloc cycle, we can end-up with a stray data_fin value while parsing e.g. MPC packets. That would lead to mapping data corruption and will trigger a few WARN_ON() in the RX path. Instead of adding a costly memset(), fetch the data_fin flag only for DSS packets - when we always explicitly initialize such bit at option parsing time. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20mptcp: drop req socket remote_key* fieldsPaolo Abeni
We don't need them, as we can use the current ingress opt data instead. Setting them in syn_recv_sock() may causes inconsistent mptcp socket status, as per previous commit. Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20mptcp: avoid flipping mp_capable field in syn_recv_sock()Paolo Abeni
If multiple CPUs races on the same req_sock in syn_recv_sock(), flipping such field can cause inconsistent child socket status. When racing, the CPU losing the req ownership may still change the mptcp request socket mp_capable flag while the CPU owning the request is cloning the socket, leaving the child socket with 'is_mptcp' set but no 'mp_capable' flag. Such socket will stay with 'conn' field cleared, heading to oops in later mptcp callback. Address the issue tracking the fallback status in a local variable. Fixes: 58b09919626b ("mptcp: create msk early") Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20mptcp: handle mptcp listener destruction via rcuFlorian Westphal
Following splat can occur during self test: BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160 Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808 subflow_data_ready+0x156/0x160 tcp_child_process+0x6a3/0xb30 tcp_v4_rcv+0x2231/0x3730 ip_protocol_deliver_rcu+0x5c/0x860 ip_local_deliver_finish+0x220/0x360 ip_local_deliver+0x1c8/0x4e0 ip_rcv_finish+0x1da/0x2f0 ip_rcv+0xd0/0x3c0 __netif_receive_skb_one_core+0xf5/0x160 __netif_receive_skb+0x27/0x1c0 process_backlog+0x21e/0x780 net_rx_action+0x35f/0xe90 do_softirq+0x4c/0x50 [..] This occurs when accessing subflow_ctx->conn. Problem is that tcp_child_process() calls listen sockets' sk_data_ready() notification, but it doesn't hold the listener lock. Another cpu calling close() on the listener will then cause transition of refcount to 0. Fixes: 58b09919626bf ("mptcp: create msk early") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-18mptcp: fix 'Attempt to release TCP socket in state' warningsFlorian Westphal
We need to set sk_state to CLOSED, else we will get following: IPv4: Attempt to release TCP socket in state 3 00000000b95f109e IPv4: Attempt to release TCP socket in state 10 00000000b95f109e First one is from inet_sock_destruct(), second one from mptcp_sk_clone failure handling. Setting sk_state to CLOSED isn't enough, we also need to orphan sk so it has DEAD flag set. Otherwise, a very similar warning is printed from inet_sock_destruct(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-18mptcp: fix splat when incoming connection is never accepted before exit/closeFlorian Westphal
Following snippet (replicated from syzkaller reproducer) generates warning: "IPv4: Attempt to release TCP socket in state 1". int main(void) { struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20, .sin_addr.s_addr = 0x010000e0, }; struct sockaddr_in sin2 = { .sin_family = 2, .sin_addr.s_addr = 0x0100007f, }; struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20, .sin_addr.s_addr = 0x0100007f, }; int r0 = socket(0x2, 0x1, 0x106); int r1 = socket(0x2, 0x1, 0x106); bind(r1, (void *)&sin1, sizeof(sin1)); connect(r1, (void *)&sin2, sizeof(sin2)); listen(r1, 3); return connect(r0, (void *)&sin3, 0x4d); } Reason is that the newly generated mptcp socket is closed via the ulp release of the tcp listener socket when its accept backlog gets purged. To fix this, delay setting the ESTABLISHED state until after userspace calls accept and via mptcp specific destructor. Fixes: 58b09919626bf ("mptcp: create msk early") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-12mptcp: fix double-unlock in mptcp_pollFlorian Westphal
mptcp_connect/28740 is trying to release lock (sk_lock-AF_INET) at: [<ffffffff82c15869>] mptcp_poll+0xb9/0x550 but there are no more locks to release! Call Trace: lock_release+0x50f/0x750 release_sock+0x171/0x1b0 mptcp_poll+0xb9/0x550 sock_poll+0x157/0x470 ? get_net_ns+0xb0/0xb0 do_sys_poll+0x63c/0xdd0 Problem is that __mptcp_tcp_fallback() releases the mptcp socket lock, but after recent change it doesn't do this in all of its return paths. To fix this, remove the unlock from __mptcp_tcp_fallback() and always do the unlock in the caller. Also add a small comment as to why we have this __mptcp_needs_tcp_fallback(). Fixes: 0b4f33def7bbde ("mptcp: fix tcp fallback crash") Reported-by: syzbot+e56606435b7bfeea8cf5@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-03mptcp: add some missing pr_fmt definesGeliang Tang
Some of the mptcp logs didn't print out the format string: [ 185.651493] DSS [ 185.651494] data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1 [ 185.651494] data_ack=13792750332298763796 [ 185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=0 skb=0000000063dc595d [ 185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 status=0 [ 185.651495] MPTCP: msk ack_seq=9bbc894565aa2f9a subflow ack_seq=9bbc894565aa2f9a [ 185.651496] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=1 skb=0000000012e809e1 So this patch added these missing pr_fmt defines. Then we can get the same format string "MPTCP" in all mptcp logs like this: [ 142.795829] MPTCP: DSS [ 142.795829] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1 [ 142.795829] MPTCP: data_ack=8089704603109242421 [ 142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=0 skb=00000000d5f230df [ 142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 status=0 [ 142.795831] MPTCP: msk ack_seq=66790290f1199d9b subflow ack_seq=66790290f1199d9b [ 142.795831] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=1 skb=00000000de5aca2e Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: fix "fn parameter not described" warningsMatthieu Baerts
Obtained with: $ make W=1 net/mptcp/token.o net/mptcp/token.c:53: warning: Function parameter or member 'req' not described in 'mptcp_token_new_request' net/mptcp/token.c:98: warning: Function parameter or member 'sk' not described in 'mptcp_token_new_connect' net/mptcp/token.c:133: warning: Function parameter or member 'conn' not described in 'mptcp_token_new_accept' net/mptcp/token.c:178: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy_request' net/mptcp/token.c:191: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy' Fixes: 79c0949e9a09 (mptcp: Add key generation and token tree) Fixes: 58b09919626b (mptcp: create msk early) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: re-check dsn before reading from subflowFlorian Westphal
mptcp_subflow_data_available() is commonly called via ssk->sk_data_ready(), in this case the mptcp socket lock cannot be acquired. Therefore, while we can safely discard subflow data that was already received up to msk->ack_seq, we cannot be sure that 'subflow->data_avail' will still be valid at the time userspace wants to read the data -- a previous read on a different subflow might have carried this data already. In that (unlikely) event, msk->ack_seq will have been updated and will be ahead of the subflow dsn. We can check for this condition and skip/resync to the expected sequence number. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: subflow: check parent mptcp socket on subflow state changeFlorian Westphal
This is needed at least until proper MPTCP-Level fin/reset signalling gets added: We wake parent when a subflow changes, but we should do this only when all subflows have closed, not just one. Schedule the mptcp worker and tell it to check eof state on all subflows. Only flag mptcp socket as closed and wake userspace processes blocking in poll if all subflows have closed. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: fix tcp fallback crashFlorian Westphal
Christoph Paasch reports following crash: general protection fault [..] CPU: 0 PID: 2874 Comm: syz-executor072 Not tainted 5.6.0-rc5 #62 RIP: 0010:__pv_queued_spin_lock_slowpath kernel/locking/qspinlock.c:471 [..] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline] do_raw_spin_lock include/linux/spinlock.h:181 [inline] spin_lock_bh include/linux/spinlock.h:343 [inline] __mptcp_flush_join_list+0x44/0xb0 net/mptcp/protocol.c:278 mptcp_shutdown+0xb3/0x230 net/mptcp/protocol.c:1882 [..] Problem is that mptcp_shutdown() socket isn't an mptcp socket, its a plain tcp_sk. Thus, trying to access mptcp_sk specific members accesses garbage. Root cause is that accept() returns a fallback (tcp) socket, not an mptcp one. There is code in getpeername to detect this and override the sockets stream_ops. But this will only run when accept() caller provided a sockaddr struct. "accept(fd, NULL, 0)" will therefore result in mptcp stream ops, but with sock->sk pointing at a tcp_sk. Update the existing fallback handling to detect this as well. Moreover, mptcp_shutdown did not have fallback handling, and mptcp_poll did it too late so add that there as well. Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>