aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-11selftests/bpf: use proper type when passing prog_typeStanislav Fomichev
Use bpf_prog_type instead of bpf_map_type when passing prog_type. -Wenum-conversion might be unhappy about it: error: implicit conversion from enumeration type 'enum bpf_map_type' to different enumeration type 'enum bpf_prog_type' Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-10selftests/bpf: add missing pointer dereference for map stacktrace fixupStanislav Fomichev
I get a segfault without it, other fixups always do dereference, and without dereference I don't understand how it can ever work. Fixes: 7c85c448e7d74 ("selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-09bpf: powerpc: fix broken uapi for BPF_PROG_TYPE_PERF_EVENTSandipan Das
Now that there are different variants of pt_regs for userspace and kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must be changed by exporting the user_pt_regs structure instead of the pt_regs structure that is in-kernel only. Fixes: 002af9391bfb ("powerpc: Split user/kernel definitions of struct pt_regs") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07net/flow_dissector: correctly cap nhoff and thoff in case of BPFStanislav Fomichev
We want to make sure that the following condition holds: 0 <= nhoff <= thoff <= skb->len BPF program can set out-of-bounds nhoff and thoff, which is dangerous, see recent commit d0c081b49137 ("flow_dissector: properly cap thoff field")'. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07selftests/bpf: use thoff instead of nhoff in BPF flow dissectorStanislav Fomichev
We are returning thoff from the flow dissector, not the nhoff. Pass thoff along with nhoff to the bpf program (initially thoff == nhoff) and expect flow dissector amend/return thoff, not nhoff. This avoids confusion, when by the time bpf flow dissector exits, nhoff == thoff, which doesn't make much sense. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07ipv6: sr: properly initialize flowi6 prior passing to ip6_route_outputShmulik Ladkani
In 'seg6_output', stack variable 'struct flowi6 fl6' was missing initialization. Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06Merge branch 'mlxsw-Various-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Various fixes Patches #1 and #2 fix two VxLAN related issues. The first patch removes warnings that can currently be triggered from user space. Second patch avoids leaking a FID in an error path. Patch #3 fixes a too strict check that causes certain host routes not to be promoted to perform GRE decapsulation in hardware. Last patch avoids a use-after-free when deleting a VLAN device via an ioctl when it is enslaved to a bridge. I have a patchset for net-next that reworks this code and makes the driver more robust. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctlIdo Schimmel
When deleting a VLAN device using an ioctl the netdev is unregistered before the VLAN filter is updated via ndo_vlan_rx_kill_vid(). It can lead to a use-after-free in mlxsw in case the VLAN device is deleted while being enslaved to a bridge. The reason for the above is that when mlxsw receives the CHANGEUPPER event, it wrongly assumes that the VLAN device is no longer its upper and thus destroys the internal representation of the bridge port despite the reference count being non-zero. Fix this by checking if the VLAN device is our upper using its real device. In net-next I'm going to remove this trick and instead make mlxsw completely agnostic to the order of the events. Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06mlxsw: spectrum_router: Relax GRE decap matching checkNir Dotan
GRE decap offload is configured when local routes prefix correspond to the local address of one of the offloaded GRE tunnels. The matching check was found to be too strict, such that for a flat GRE configuration, in which the overlay and underlay traffic share the same non-default VRF, decap flow was not offloaded. Relax the check for decap flow offloading. A match occurs if the local address of the tunnel matches the local route address while both share the same VRF table. Fixes: 4607f6d26950 ("mlxsw: spectrum_router: Support IPv4 underlay decap") Signed-off-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06mlxsw: spectrum_switchdev: Avoid leaking FID's reference countIdo Schimmel
It should never be possible for a user to set a VNI on a FID in case one is already set. The driver therefore returns an error, but fails to drop the reference count taken earlier when calling mlxsw_sp_fid_8021d_lookup(). Drop the reference when this unlikely error is hit. Fixes: 1c30d1836aeb ("mlxsw: spectrum: Enable VxLAN enslavement to bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06mlxsw: spectrum_nve: Remove easily triggerable warningsIdo Schimmel
It is possible to trigger a warning in mlxsw in case a flood entry which mlxsw is not aware of is deleted from the VxLAN device. This is because mlxsw expects to find a singly linked list where the flood entry is present in. Fix by removing these warnings for now. Will re-add them in the next release after we teach mlxsw to ask for a dump of FDB entries from the VxLAN device, once it is enslaved to a bridge mlxsw cares about. Fixes: 6e6030bd5412 ("mlxsw: spectrum_nve: Implement common NVE core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changesJiri Wiesner
The *_frag_reasm() functions are susceptible to miscalculating the byte count of packet fragments in case the truesize of a head buffer changes. The truesize member may be changed by the call to skb_unclone(), leaving the fragment memory limit counter unbalanced even if all fragments are processed. This miscalculation goes unnoticed as long as the network namespace which holds the counter is not destroyed. Should an attempt be made to destroy a network namespace that holds an unbalanced fragment memory limit counter the cleanup of the namespace never finishes. The thread handling the cleanup gets stuck in inet_frags_exit_net() waiting for the percpu counter to reach zero. The thread is usually in running state with a stacktrace similar to: PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4" #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480 #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856 #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0 #10 [ffff880621563e38] process_one_work at ffffffff81096f14 It is not possible to create new network namespaces, and processes that call unshare() end up being stuck in uninterruptible sleep state waiting to acquire the net_mutex. The bug was observed in the IPv6 netfilter code by Per Sundstrom. I thank him for his analysis of the problem. The parts of this patch that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Reported-by: Per Sundstrom <per.sundstrom@redqube.se> Acked-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05sctp: frag_point sanity checkJakub Audykowicz
If for some reason an association's fragmentation point is zero, sctp_datamsg_from_user will try to endlessly try to divide a message into zero-sized chunks. This eventually causes kernel panic due to running out of memory. Although this situation is quite unlikely, it has occurred before as reported. I propose to add this simple last-ditch sanity check due to the severity of the potential consequences. Signed-off-by: Jakub Audykowicz <jakub.audykowicz@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05tcp: fix NULL ref in tail loss probeYuchung Cheng
TCP loss probe timer may fire when the retranmission queue is empty but has a non-zero tp->packets_out counter. tcp_send_loss_probe will call tcp_rearm_rto which triggers NULL pointer reference by fetching the retranmission queue head in its sub-routines. Add a more detailed warning to help catch the root cause of the inflight accounting inconsistency. Reported-by: Rafael Tinoco <rafael.tinoco@linaro.org> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05tcp: Do not underestimate rwnd_limitedEric Dumazet
If available rwnd is too small, tcp_tso_should_defer() can decide it is worth waiting before splitting a TSO packet. This really means we are rwnd limited. Fixes: 5615f88614a4 ("tcp: instrument how long TCP is limited by receive window") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-05 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix bpf uapi pointers for 32-bit architectures, from Daniel. 2) improve verifer ability to handle progs with a lot of branches, from Alexei. 3) strict btf checks, from Yonghong. 4) bpf_sk_lookup api cleanup, from Joe. 5) other misc fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05net: use skb_list_del_init() to remove from RX sublistsEdward Cree
list_del() leaves the skb->next pointer poisoned, which can then lead to a crash in e.g. OVS forwarding. For example, setting up an OVS VXLAN forwarding bridge on sfc as per: ======== $ ovs-vsctl show 5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30 Bridge "br0" Port "br0" Interface "br0" type: internal Port "enp6s0f0" Interface "enp6s0f0" Port "vxlan0" Interface "vxlan0" type: vxlan options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"} ovs_version: "2.5.0" ======== (where 10.0.0.5 is an address on enp6s0f1) and sending traffic across it will lead to the following panic: ======== general protection fault: 0000 [#1] SMP PTI CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701 Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013 RIP: 0010:dev_hard_start_xmit+0x38/0x200 Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95 RSP: 0018:ffff888627b437e0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000 RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8 R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000 R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000 FS: 0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0 Call Trace: <IRQ> __dev_queue_xmit+0x623/0x870 ? masked_flow_lookup+0xf7/0x220 [openvswitch] ? ep_poll_callback+0x101/0x310 do_execute_actions+0xaba/0xaf0 [openvswitch] ? __wake_up_common+0x8a/0x150 ? __wake_up_common_lock+0x87/0xc0 ? queue_userspace_packet+0x31c/0x5b0 [openvswitch] ovs_execute_actions+0x47/0x120 [openvswitch] ovs_dp_process_packet+0x7d/0x110 [openvswitch] ovs_vport_receive+0x6e/0xd0 [openvswitch] ? dst_alloc+0x64/0x90 ? rt_dst_alloc+0x50/0xd0 ? ip_route_input_slow+0x19a/0x9a0 ? __udp_enqueue_schedule_skb+0x198/0x1b0 ? __udp4_lib_rcv+0x856/0xa30 ? __udp4_lib_rcv+0x856/0xa30 ? cpumask_next_and+0x19/0x20 ? find_busiest_group+0x12d/0xcd0 netdev_frame_hook+0xce/0x150 [openvswitch] __netif_receive_skb_core+0x205/0xae0 __netif_receive_skb_list_core+0x11e/0x220 netif_receive_skb_list+0x203/0x460 ? __efx_rx_packet+0x335/0x5e0 [sfc] efx_poll+0x182/0x320 [sfc] net_rx_action+0x294/0x3c0 __do_softirq+0xca/0x297 irq_exit+0xa6/0xb0 do_IRQ+0x54/0xd0 common_interrupt+0xf/0xf </IRQ> ======== So, in all listified-receive handling, instead pull skbs off the lists with skb_list_del_init(). Fixes: 9af86f933894 ("net: core: fix use-after-free in __netif_receive_skb_list_core") Fixes: 7da517a3bc52 ("net: core: Another step of skb receive list processing") Fixes: a4ca8b7df73c ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()") Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05Merge tag 'mac80211-for-davem-2018-12-05' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg: ==================== As it's been a while, we have various fixes for * hwsim * AP mode (client powersave related) * CSA/FTM interaction * a busy loop in IE handling * and similar ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05cfg80211: Fix busy loop regression in ieee80211_ie_split_ric()Jouni Malinen
This function was modified to support the information element extension case (WLAN_EID_EXTENSION) in a manner that would result in an infinite loop when going through set of IEs that include WLAN_EID_RIC_DATA and contain an IE that is in the after_ric array. The only place where this can currently happen is in mac80211 ieee80211_send_assoc() where ieee80211_ie_split_ric() is called with after_ric[]. This can be triggered by valid data from user space nl80211 association/connect request (i.e., requiring GENL_UNS_ADMIN_PERM). The only known application having an option to include WLAN_EID_RIC_DATA in these requests is wpa_supplicant and it had a bug that prevented this specific contents from being used (and because of that, not triggering this kernel bug in an automated test case ap_ft_ric) and now that this bug is fixed, it has a workaround to avoid this kernel issue. WLAN_EID_RIC_DATA is currently used only for testing purposes, so this does not cause significant harm for production use cases. Fixes: 2512b1b18d07 ("mac80211: extend ieee80211_ie_split to support EXTENSION") Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-12-05mac80211: ignore NullFunc frames in the duplicate detectionEmmanuel Grumbach
NullFunc packets should never be duplicate just like QoS-NullFunc packets. We saw a client that enters / exits power save with NullFunc frames (and not with QoS-NullFunc) despite the fact that the association supports HT. This specific client also re-uses a non-zero sequence number for different NullFunc frames. At some point, the client had to send a retransmission of the NullFunc frame and we dropped it, leading to a misalignment in the power save state. Fix this by never consider a NullFunc frame as duplicate, just like we do for QoS NullFunc frames. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=201449 CC: <stable@vger.kernel.org> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-12-05mac80211: fix reordering of buffered broadcast packetsFelix Fietkau
If the buffered broadcast queue contains packets, letting new packets bypass that queue can lead to heavy reordering, since the driver is probably throttling transmission of buffered multicast packets after beacons. Keep buffering packets until the buffer has been cleared (and no client is in powersave mode). Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-12-05mac80211: ignore tx status for PS stations in ieee80211_tx_status_extFelix Fietkau
Make it behave like regular ieee80211_tx_status calls, except for the lack of filtered frame processing. This fixes spurious low-ack triggered disconnections with powersave clients connected to an AP. Fixes: f027c2aca0cf4 ("mac80211: add ieee80211_tx_status_noskb") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-12-04net: mvpp2: fix phylink handling of invalid PHY modesBaruch Siach
The .validate phylink callback should empty the supported bitmap when the interface mode is invalid. Cc: Maxime Chevallier <maxime.chevallier@bootlin.com> Cc: Antoine Tenart <antoine.tenart@bootlin.com> Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04net: mvpp2: fix detection of 10G SFP modulesBaruch Siach
The mvpp2_phylink_validate() relies on the interface field of phylink_link_state to determine valid link modes. However, when called from phylink_sfp_module_insert() this field in not initialized. The default switch case then excludes 10G link modes. This allows 10G SFP modules that are detected correctly to be configured at max rate of 2.5G. Catch the uninitialized PHY mode case, and allow 10G rates. Fixes: d97c9f4ab000b ("net: mvpp2: 1000baseX support") Cc: Maxime Chevallier <maxime.chevallier@bootlin.com> Cc: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04ethernet: fman: fix wrong of_node_put() in probe functionNicolas Saenz Julienne
After getting a reference to the platform device's of_node the probe function ends up calling of_find_matching_node() using the node as an argument. The function takes care of decreasing the refcount on it. We are then incorrectly decreasing the refcount on that node again. This patch removes the unwarranted call to of_node_put(). Fixes: 414fd46e7762 ("fsl/fman: Add FMan support") Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devicesEric Dumazet
kmsan was able to trigger a kernel-infoleak using a gre device [1] nlmsg_populate_fdb_fill() has a hard coded assumption that dev->addr_len is ETH_ALEN, as normally guaranteed for ARPHRD_ETHER devices. A similar issue was fixed recently in commit da71577545a5 ("rtnetlink: Disallow FDB configuration for non-Ethernet device") [1] BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:143 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576 CPU: 0 PID: 6697 Comm: syz-executor310 Not tainted 4.20.0-rc3+ #95 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x32d/0x480 lib/dump_stack.c:113 kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683 kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743 kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634 copyout lib/iov_iter.c:143 [inline] _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576 copy_to_iter include/linux/uio.h:143 [inline] skb_copy_datagram_iter+0x4e2/0x1070 net/core/datagram.c:431 skb_copy_datagram_msg include/linux/skbuff.h:3316 [inline] netlink_recvmsg+0x6f9/0x19d0 net/netlink/af_netlink.c:1975 sock_recvmsg_nosec net/socket.c:794 [inline] sock_recvmsg+0x1d1/0x230 net/socket.c:801 ___sys_recvmsg+0x444/0xae0 net/socket.c:2278 __sys_recvmsg net/socket.c:2327 [inline] __do_sys_recvmsg net/socket.c:2337 [inline] __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x441119 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffc7f008a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441119 RDX: 0000000000000040 RSI: 00000000200005c0 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000100 R09: 0000000000000100 R10: 0000000000000100 R11: 0000000000000207 R12: 0000000000402080 R13: 0000000000402110 R14: 0000000000000000 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline] kmsan_save_stack mm/kmsan/kmsan.c:261 [inline] kmsan_internal_chain_origin+0x13d/0x240 mm/kmsan/kmsan.c:469 kmsan_memcpy_memmove_metadata+0x1a9/0xf70 mm/kmsan/kmsan.c:344 kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:362 __msan_memcpy+0x61/0x70 mm/kmsan/kmsan_instr.c:162 __nla_put lib/nlattr.c:744 [inline] nla_put+0x20a/0x2d0 lib/nlattr.c:802 nlmsg_populate_fdb_fill+0x444/0x810 net/core/rtnetlink.c:3466 nlmsg_populate_fdb net/core/rtnetlink.c:3775 [inline] ndo_dflt_fdb_dump+0x73a/0x960 net/core/rtnetlink.c:3807 rtnl_fdb_dump+0x1318/0x1cb0 net/core/rtnetlink.c:3979 netlink_dump+0xc79/0x1c90 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x10c4/0x11d0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:216 [inline] rtnetlink_rcv_msg+0x141b/0x1540 net/core/rtnetlink.c:4910 netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116 __sys_sendmsg net/socket.c:2154 [inline] __do_sys_sendmsg net/socket.c:2163 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2161 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline] kmsan_internal_poison_shadow+0x6d/0x130 mm/kmsan/kmsan.c:170 kmsan_kmalloc+0xa1/0x100 mm/kmsan/kmsan_hooks.c:186 __kmalloc+0x14c/0x4d0 mm/slub.c:3825 kmalloc include/linux/slab.h:551 [inline] __hw_addr_create_ex net/core/dev_addr_lists.c:34 [inline] __hw_addr_add_ex net/core/dev_addr_lists.c:80 [inline] __dev_mc_add+0x357/0x8a0 net/core/dev_addr_lists.c:670 dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687 ip_mc_filter_add net/ipv4/igmp.c:1128 [inline] igmp_group_added+0x4d4/0xb80 net/ipv4/igmp.c:1311 __ip_mc_inc_group+0xea9/0xf70 net/ipv4/igmp.c:1444 ip_mc_inc_group net/ipv4/igmp.c:1453 [inline] ip_mc_up+0x1c3/0x400 net/ipv4/igmp.c:1775 inetdev_event+0x1d03/0x1d80 net/ipv4/devinet.c:1522 notifier_call_chain kernel/notifier.c:93 [inline] __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x13d/0x240 kernel/notifier.c:401 __dev_notify_flags+0x3da/0x860 net/core/dev.c:1733 dev_change_flags+0x1ac/0x230 net/core/dev.c:7569 do_setlink+0x165f/0x5ea0 net/core/rtnetlink.c:2492 rtnl_newlink+0x2ad7/0x35a0 net/core/rtnetlink.c:3111 rtnetlink_rcv_msg+0x1148/0x1540 net/core/rtnetlink.c:4947 netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116 __sys_sendmsg net/socket.c:2154 [inline] __do_sys_sendmsg net/socket.c:2163 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2161 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 Bytes 36-37 of 105 are uninitialized Memory access of size 105 starts at ffff88819686c000 Data copied to user address 0000000020000380 Fixes: d83b06036048 ("net: add fdb generic dump routine") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Ido Schimmel <idosch@mellanox.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04phy: Revert toggling reset changes.David S. Miller
This reverts: ef1b5bf506b1 ("net: phy: Fix not to call phy_resume() if PHY is not attached") 8c85f4b81296 ("net: phy: micrel: add toggling phy reset if PHY is not attached") Andrew Lunn informs me that there are alternative efforts underway to fix this more properly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04Merge branch 'bpf-verifier-resilience'Daniel Borkmann
Alexei Starovoitov says: ==================== Three patches to improve verifier ability to handle pathological bpf programs with a lot of branches: - make sure prog_load syscall can be aborted - improve branch taken analysis - introduce per-insn complexity limit for unprivileged programs ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: add per-insn complexity limitAlexei Starovoitov
malicious bpf program may try to force the verifier to remember a lot of distinct verifier states. Put a limit to number of per-insn 'struct bpf_verifier_state'. Note that hitting the limit doesn't reject the program. It potentially makes the verifier do more steps to analyze the program. It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner instead of spending cpu time walking long link list. The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs with slight increase in number of "steps" it takes to successfully verify the programs: before after bpf_lb-DLB_L3.o 1940 1940 bpf_lb-DLB_L4.o 3089 3089 bpf_lb-DUNKNOWN.o 1065 1065 bpf_lxc-DDROP_ALL.o 28052 | 28162 bpf_lxc-DUNKNOWN.o 35487 | 35541 bpf_netdev.o 10864 10864 bpf_overlay.o 6643 6643 bpf_lcx_jit.o 38437 38437 But it also makes malicious program to be rejected in 0.4 seconds vs 6.5 Hence apply this limit to unprivileged programs only. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: improve verifier branch analysisAlexei Starovoitov
pathological bpf programs may try to force verifier to explode in the number of branch states: 20: (d5) if r1 s<= 0x24000028 goto pc+0 21: (b5) if r0 <= 0xe1fa20 goto pc+2 22: (d5) if r1 s<= 0x7e goto pc+0 23: (b5) if r0 <= 0xe880e000 goto pc+0 24: (c5) if r0 s< 0x2100ecf4 goto pc+0 25: (d5) if r1 s<= 0xe880e000 goto pc+1 26: (c5) if r0 s< 0xf4041810 goto pc+0 27: (d5) if r1 s<= 0x1e007e goto pc+0 28: (b5) if r0 <= 0xe86be000 goto pc+0 29: (07) r0 += 16614 30: (c5) if r0 s< 0x6d0020da goto pc+0 31: (35) if r0 >= 0x2100ecf4 goto pc+0 Teach verifier to recognize always taken and always not taken branches. This analysis is already done for == and != comparison. Expand it to all other branches. It also helps real bpf programs to be verified faster: before after bpf_lb-DLB_L3.o 2003 1940 bpf_lb-DLB_L4.o 3173 3089 bpf_lb-DUNKNOWN.o 1080 1065 bpf_lxc-DDROP_ALL.o 29584 28052 bpf_lxc-DUNKNOWN.o 36916 35487 bpf_netdev.o 11188 10864 bpf_overlay.o 6679 6643 bpf_lcx_jit.o 39555 38437 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: check pending signals while verifying programsAlexei Starovoitov
Malicious user space may try to force the verifier to use as much cpu time and memory as possible. Hence check for pending signals while verifying the program. Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN, since the kernel has to release the resources used for program verification. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-03Merge branch 'mlx4-fixes'David S. Miller
Tariq Toukan says: ==================== mlx4 fixes for 4.20-rc This patchset includes small fixes for the mlx4_en driver. First patch by Eran fixes the value used to init the netdevice's min_mtu field. Please queue it to -stable >= v4.10. Second patch by Saeed adds missing Kconfig build dependencies. Series generated against net commit: 35b827b6d061 tun: forbid iface creation with rtnl ops ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/mlx4_en: Fix build break when CONFIG_INET is offSaeed Mahameed
MLX4_EN depends on NETDEVICES, ETHERNET and INET Kconfigs. Make sure they are listed in MLX4_EN Kconfig dependencies. This fixes the following build break: drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: ‘struct iphdr’ declared inside parameter list [enabled by default] struct iphdr *iph) ^ drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function ‘get_fixed_ipv4_csum’: drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing pointer to incomplete type _u8 ipproto = iph->protocol; Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/mlx4_en: Change min MTU size to ETH_MIN_MTUEran Ben Elisha
NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in the RFC791 and in the network stack. Remove old mlx4_en only define for it, which was set to wrong value. Fixes: b80f71f5816f ("ethernet/mellanox: use core min/max MTU checking") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/core: tidy up an error messageQian Cai
netif_napi_add() could report an error like this below due to it allows to pass a format string for wildcarding before calling dev_get_valid_name(), "netif_napi_add() called with weight 256 on device eth%d" For example, hns_enet_drv module does this. hns_nic_try_get_ae hns_nic_init_ring_data netif_napi_add register_netdev dev_get_valid_name Hence, make it a bit more human-readable by using netdev_err_once() instead. Signed-off-by: Qian Cai <cai@gmx.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03mv88e6060: disable hardware level MAC learningAnderson Luiz Alves
Disable hardware level MAC learning because it breaks station roaming. When enabled it drops all frames that arrive from a MAC address that is on a different port at learning table. Signed-off-by: Anderson Luiz Alves <alacn1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03macvlan: return correct error valueMatteo Croce
A MAC address must be unique among all the macvlan devices with the same lower device. The only exception is the passthru [sic] mode, which shares the lower device address. When duplicate addresses are detected, EBUSY is returned when bringing the interface up: # ip link add macvlan0 link eth0 type macvlan # read addr </sys/class/net/eth0/address # ip link set macvlan0 address $addr # ip link set macvlan0 up RTNETLINK answers: Device or resource busy Use correct error code which is EADDRINUSE, and do the check also earlier, on address change: # ip link set macvlan0 address $addr RTNETLINK answers: Address already in use Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03sctp: kfree_rcu asocXin Long
In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences a transport's asoc under rcu_read_lock while asoc is freed not after a grace period, which leads to a use-after-free panic. This patch fixes it by calling kfree_rcu to make asoc be freed after a grace period. Note that only the asoc's memory is delayed to free in the patch, it won't cause sk to linger longer. Thanks Neil and Marcelo to make this clear. Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable") Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new transport") Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com Suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net/ibmvnic: Fix RTNL deadlock during device resetThomas Falcon
Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset") made the change to hold the RTNL lock during driver reset but still calls netdev_notify_peers, which results in a deadlock. Instead, use call_netdevice_notifiers, which is functionally the same except that it does not take the RTNL lock again. Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset") Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03vhost: fix IOTLB lockingJean-Philippe Brucker
Commit 78139c94dc8c ("net: vhost: lock the vqs one by one") moved the vq lock to improve scalability, but introduced a possible deadlock in vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the spinlock is taken while holding vq->mutex. Since calling vhost_poll_queue() doesn't require any lock, avoid the deadlock by not taking vq->mutex. Fixes: 78139c94dc8c ("net: vhost: lock the vqs one by one") Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge branch 'phy-micrel-toggling-reset'David S. Miller
Yoshihiro Shimoda says: ==================== net: phy: micrel: add toggling phy reset This patch set is for R-Car Gen3 Salvator-XS boards. If we do the following method, the phy cannot link up correctly. 1) Kernel boots by using initramfs. --> No open the nic, so phy_device_register() and phy_probe() deasserts the reset. 2) Kernel enters the suspend. --> So, keep the reset signal as deassert. --> On R-Car Salvator-XS board, unfortunately, the board power is turned off. 3) Kernel returns from suspend. 4) ifconfig eth0 up --> Then, since edge signal of the reset doesn't happen, it cannot link up. 5) ifconfig eth0 down 6) ifconfig eth0 up --> In this case, it can link up. When resolving this issue after I got feedback from Andrew and Heiner, I found an issue that the phy_device.c didn't call phy_resume() if the PHY was not attached. So, patch 1 fixes it and add toggling the phy reset to the micrel phy driver. Changes from v1 (as RFC): - No remove the current code of phy_device.c to avoid any side effects. - Fix the mdio_bus_phy_resume() in phy_device.c. - Add toggling the phy reset in micrel.c if the PHY is not attached. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: phy: micrel: add toggling phy reset if PHY is not attachedYoshihiro Shimoda
This patch adds toggling phy reset if PHY is not attached. Otherwise, some boards (e.g. R-Car H3 Salvator-XS) cannot link up correctly if we do the following method: 1) Kernel boots by using initramfs. --> No open the nic, so phy_device_register() and phy_probe() deasserts the reset. 2) Kernel enters the suspend. --> So, keep the reset signal as deassert. --> On R-Car Salvator-XS board, unfortunately, the board power is turned off. 3) Kernel returns from suspend. 4) ifconfig eth0 up --> Then, since edge signal of the reset doesn't happen, it cannot link up. 5) ifconfig eth0 down 6) ifconfig eth0 up --> In this case, it can link up. Reported-by: Hiromitsu Yamasaki <hiromitsu.yamasaki.ym@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: phy: Fix not to call phy_resume() if PHY is not attachedYoshihiro Shimoda
This patch fixes an issue that mdio_bus_phy_resume() doesn't call phy_resume() if the PHY is not attached. Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03bpf: powerpc64: optimize JIT passes for bpf function callsSandipan Das
Once the JITed images for each function in a multi-function program are generated after the first three JIT passes, we only need to fix the target address for the branch instruction corresponding to each bpf-to-bpf function call. This introduces the following optimizations for reducing the work done by the JIT compiler when handling multi-function programs: [1] Instead of doing two extra passes to fix the bpf function calls, do just one as that would be sufficient. [2] During the extra pass, only overwrite the instruction sequences for the bpf-to-bpf function calls as everything else would still remain exactly the same. This also reduces the number of writes to the JITed image. [3] Do not regenerate the prologue and the epilogue during the extra pass as that would be redundant. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-03tun: remove skb access after netif_receive_skbPrashant Bhole
In tun.c skb->len was accessed while doing stats accounting after a call to netif_receive_skb. We can not access skb after this call because buffers may be dropped. The fix for this bug would be to store skb->len in local variable and then use it after netif_receive_skb(). IMO using xdp data size for accounting bytes will be better because input for tun_xdp_one() is xdp_buff. Hence this patch: - fixes a bug by removing skb access after netif_receive_skb() - uses xdp data size for accounting bytes [613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun] [613.021062] Read of size 4 at addr ffff8881da9ab7c0 by task vhost-1115/1155 [613.023073] [613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232 [613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [613.029116] Call Trace: [613.031145] dump_stack+0x5b/0x90 [613.032219] print_address_description+0x6c/0x23c [613.034156] ? tun_sendmsg+0x77c/0xc50 [tun] [613.036141] kasan_report.cold.5+0x241/0x308 [613.038125] tun_sendmsg+0x77c/0xc50 [tun] [613.040109] ? tun_get_user+0x1960/0x1960 [tun] [613.042094] ? __isolate_free_page+0x270/0x270 [613.045173] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.047127] ? peek_head_len.part.13+0x90/0x90 [vhost_net] [613.049096] ? get_tx_bufs+0x5a/0x2c0 [vhost_net] [613.051106] ? vhost_enable_notify+0x2d8/0x420 [vhost] [613.053139] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.053139] ? vhost_net_buf_peek+0x340/0x340 [vhost_net] [613.053139] ? __mutex_lock+0x8d9/0xb30 [613.053139] ? finish_task_switch+0x8f/0x3f0 [613.053139] ? handle_tx+0x32/0x120 [vhost_net] [613.053139] ? mutex_trylock+0x110/0x110 [613.053139] ? finish_task_switch+0xcf/0x3f0 [613.053139] ? finish_task_switch+0x240/0x3f0 [613.053139] ? __switch_to_asm+0x34/0x70 [613.053139] ? __switch_to_asm+0x40/0x70 [613.053139] ? __schedule+0x506/0xf10 [613.053139] handle_tx+0xc7/0x120 [vhost_net] [613.053139] vhost_worker+0x166/0x200 [vhost] [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] ? __kthread_parkme+0x77/0x90 [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] kthread+0x1b1/0x1d0 [613.053139] ? kthread_park+0xb0/0xb0 [613.053139] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Allocated by task 1155: [613.088705] kasan_kmalloc+0xbf/0xe0 [613.088705] kmem_cache_alloc+0xdc/0x220 [613.088705] __build_skb+0x2a/0x160 [613.088705] build_skb+0x14/0xc0 [613.088705] tun_sendmsg+0x4f0/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Freed by task 1155: [613.088705] __kasan_slab_free+0x12e/0x180 [613.088705] kmem_cache_free+0xa0/0x230 [613.088705] ip6_mc_input+0x40f/0x5a0 [613.088705] ipv6_rcv+0xc9/0x1e0 [613.088705] __netif_receive_skb_one_core+0xc1/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] br_pass_frame_up+0x2b9/0x2e0 [613.088705] br_handle_frame_finish+0x2fb/0x7a0 [613.088705] br_handle_frame+0x30f/0x6c0 [613.088705] __netif_receive_skb_core+0x61a/0x15b0 [613.088705] __netif_receive_skb_one_core+0x8e/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] tun_sendmsg+0x738/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] The buggy address belongs to the object at ffff8881da9ab740 [613.088705] which belongs to the cache skbuff_head_cache of size 232 Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: 8139cp: fix a BUG triggered by changing mtu with network trafficSu Yanjun
When changing mtu many times with traffic, a bug is triggered: [ 1035.684037] kernel BUG at lib/dynamic_queue_limits.c:26! [ 1035.684042] invalid opcode: 0000 [#1] SMP [ 1035.684049] Modules linked in: loop binfmt_misc 8139cp(OE) macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag tcp_lp fuse uinput xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter devlink ip6_tables iptable_filter sunrpc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep ppdev snd_seq iosf_mbi crc32_pclmul parport_pc snd_seq_device ghash_clmulni_intel parport snd_pcm aesni_intel joydev lrw snd_timer virtio_balloon sg gf128mul glue_helper ablk_helper cryptd snd soundcore i2c_piix4 pcspkr ip_tables xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic [ 1035.684102] pata_acpi virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt floppy fb_sys_fops crct10dif_pclmul crct10dif_common ttm crc32c_intel serio_raw ata_piix drm libata 8139too virtio_pci drm_panel_orientation_quirks virtio_ring virtio mii dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 8139cp] [ 1035.684132] CPU: 9 PID: 25140 Comm: if-mtu-change Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.el7.x86_64 #1 [ 1035.684134] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1035.684136] task: ffff8f59b1f5a080 ti: ffff8f5a2e32c000 task.ti: ffff8f5a2e32c000 [ 1035.684149] RIP: 0010:[<ffffffffba3a40d0>] [<ffffffffba3a40d0>] dql_completed+0x180/0x190 [ 1035.684162] RSP: 0000:ffff8f5a75483e50 EFLAGS: 00010093 [ 1035.684162] RAX: 00000000000000c2 RBX: ffff8f5a6f91c000 RCX: 0000000000000000 [ 1035.684162] RDX: 0000000000000000 RSI: 0000000000000184 RDI: ffff8f599fea3ec0 [ 1035.684162] RBP: ffff8f5a75483ea8 R08: 00000000000000c2 R09: 0000000000000000 [ 1035.684162] R10: 00000000000616ef R11: ffff8f5a75483b56 R12: ffff8f599fea3e00 [ 1035.684162] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000184 [ 1035.684162] FS: 00007fa8434de740(0000) GS:ffff8f5a75480000(0000) knlGS:0000000000000000 [ 1035.684162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1035.684162] CR2: 00000000004305d0 CR3: 000000024eb66000 CR4: 00000000001406e0 [ 1035.684162] Call Trace: [ 1035.684162] <IRQ> [ 1035.684162] [<ffffffffc08cbaf8>] ? cp_interrupt+0x478/0x580 [8139cp] [ 1035.684162] [<ffffffffba14a294>] __handle_irq_event_percpu+0x44/0x1c0 [ 1035.684162] [<ffffffffba14a442>] handle_irq_event_percpu+0x32/0x80 [ 1035.684162] [<ffffffffba14a4cc>] handle_irq_event+0x3c/0x60 [ 1035.684162] [<ffffffffba14db29>] handle_fasteoi_irq+0x59/0x110 [ 1035.684162] [<ffffffffba02e554>] handle_irq+0xe4/0x1a0 [ 1035.684162] [<ffffffffba7795dd>] do_IRQ+0x4d/0xf0 [ 1035.684162] [<ffffffffba76b362>] common_interrupt+0x162/0x162 [ 1035.684162] <EOI> [ 1035.684162] [<ffffffffba0c2ae4>] ? __wake_up_bit+0x24/0x70 [ 1035.684162] [<ffffffffba1e46f5>] ? do_set_pte+0xd5/0x120 [ 1035.684162] [<ffffffffba1b64fb>] unlock_page+0x2b/0x30 [ 1035.684162] [<ffffffffba1e4879>] do_read_fault.isra.61+0x139/0x1b0 [ 1035.684162] [<ffffffffba1e9134>] handle_pte_fault+0x2f4/0xd10 [ 1035.684162] [<ffffffffba1ebc6d>] handle_mm_fault+0x39d/0x9b0 [ 1035.684162] [<ffffffffba76f5e3>] __do_page_fault+0x203/0x500 [ 1035.684162] [<ffffffffba76f9c6>] trace_do_page_fault+0x56/0x150 [ 1035.684162] [<ffffffffba76ef42>] do_async_page_fault+0x22/0xf0 [ 1035.684162] [<ffffffffba76b788>] async_page_fault+0x28/0x30 [ 1035.684162] Code: 54 c7 47 54 ff ff ff ff 44 0f 49 ce 48 8b 35 48 2f 9c 00 48 89 77 58 e9 fe fe ff ff 0f 1f 80 00 00 00 00 41 89 d1 e9 ef fe ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 8d 42 ff 48 [ 1035.684162] RIP [<ffffffffba3a40d0>] dql_completed+0x180/0x190 [ 1035.684162] RSP <ffff8f5a75483e50> It's not the same as in 7fe0ee09 patch described. As 8139cp uses shared irq mode, other device irq will trigger cp_interrupt to execute. cp_change_mtu -> cp_close -> cp_open In cp_close routine just before free_irq(), some interrupt may occur. In my environment, cp_interrupt exectutes and IntrStatus is 0x4, exactly TxOk. That will cause cp_tx to wake device queue. As device queue is started, cp_start_xmit and cp_open will run at same time which will cause kernel BUG. For example: [#] for tx descriptor At start: [#][#][#] num_queued=3 After cp_init_hw->cp_start_hw->netdev_reset_queue: [#][#][#] num_queued=0 When 8139cp starts to work then cp_tx will check num_queued mismatchs the complete_bytes. The patch will check IntrMask before check IntrStatus in cp_interrupt. When 8139cp interrupt is disabled, just return. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03net: phy: don't allow __set_phy_supported to add unsupported modesHeiner Kallweit
Currently __set_phy_supported allows to add modes w/o checking whether the PHY supports them. This is wrong, it should never add modes but only remove modes we don't want to support. The commit marked as fixed didn't do anything wrong, it just copied existing functionality to the helper which is being fixed now. Fixes: f3a6bd393c2c ("phylib: Add phy_set_max_speed helper") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-01bpf: refactor bpf_test_run() to separate own failures and test program resultRoman Gushchin
After commit f42ee093be29 ("bpf/test_run: support cgroup local storage") the bpf_test_run() function may fail with -ENOMEM, if it's not possible to allocate memory for a cgroup local storage. This error shouldn't be mixed with the return value of the testing program. Let's add an additional argument with a pointer where to store the testing program's result; and make bpf_test_run() return either 0 or -ENOMEM. Fixes: f42ee093be29 ("bpf/test_run: support cgroup local storage") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-30tun: forbid iface creation with rtnl opsNicolas Dichtel
It's not supported right now (the goal of the initial patch was to support 'ip link del' only). Before the patch: $ ip link add foo type tun [ 239.632660] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [snip] [ 239.636410] RIP: 0010:register_netdevice+0x8e/0x3a0 This panic occurs because dev->netdev_ops is not set by tun_setup(). But to have something usable, it will require more than just setting netdev_ops. Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX") CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30virtio-net: keep vnet header zeroed after processing XDPJason Wang
We copy vnet header unconditionally in page_to_skb() this is wrong since XDP may modify the packet data. So let's keep a zeroed vnet header for not confusing the conversion between vnet header and skb metadata. In the future, we should able to detect whether or not the packet was modified and keep using the vnet header when packet was not touched. Fixes: f600b6905015 ("virtio_net: Add XDP support") Reported-by: Pavel Popa <pashinho1990@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>