aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-01-04selftests: net: fix/improve ip_defrag selftestPeter Oskolkov
Commit ade446403bfb ("net: ipv4: do not handle duplicate fragments as overlapping") changed IPv4 defragmentation so that duplicate fragments, as well as _some_ fragments completely covered by previously delivered fragments, do not lead to the whole frag queue being discarded. This makes the existing ip_defrag selftest flaky. This patch * makes sure that negative IPv4 defrag tests generate truly overlapping fragments that trigger defrag queue drops; * tests that duplicate IPv4 fragments do not trigger defrag queue drops; * makes a couple of minor tweaks to the test aimed at increasing its code coverage and reduce flakiness. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04qmi_wwan: add MTU default to qmap network interfaceDaniele Palmas
This patch adds MTU default value to qmap network interface in order to avoid "RTNETLINK answers: No buffer space available" error when setting an ipv6 address. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04Merge branch 'hns-fixes'David S. Miller
Huazhong Tan says: ==================== net: hns: Bugfixes for HNS driver This patchset includes bugfixes for the HNS ethernet controller driver. Every patch is independent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net: hns: Fix use after free identified by SLUB debugYonglong Liu
When enable SLUB debug, than remove hns_enet_drv module, SLUB debug will identify a use after free bug: [134.189505] Unable to handle kernel paging request at virtual address 006b6b6b6b6b6b6b [134.197553] Mem abort info: [134.200381] ESR = 0x96000004 [134.203487] Exception class = DABT (current EL), IL = 32 bits [134.209497] SET = 0, FnV = 0 [134.212596] EA = 0, S1PTW = 0 [134.215777] Data abort info: [134.218701] ISV = 0, ISS = 0x00000004 [134.222596] CM = 0, WnR = 0 [134.225606] [006b6b6b6b6b6b6b] address between user and kernel address ranges [134.232851] Internal error: Oops: 96000004 [#1] SMP [134.237798] CPU: 21 PID: 27834 Comm: rmmod Kdump: loaded Tainted: G OE 4.19.5-1.2.34.aarch64 #1 [134.247856] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018 [134.255181] pstate: 20000005 (nzCv daif -PAN -UAO) [134.260044] pc : hns_ae_put_handle+0x38/0x60 [134.264372] lr : hns_ae_put_handle+0x24/0x60 [134.268700] sp : ffff00001be93c50 [134.272054] x29: ffff00001be93c50 x28: ffff802faaec8040 [134.277442] x27: 0000000000000000 x26: 0000000000000000 [134.282830] x25: 0000000056000000 x24: 0000000000000015 [134.288284] x23: ffff0000096fe098 x22: ffff000001050070 [134.293671] x21: ffff801fb3c044a0 x20: ffff80afb75ec098 [134.303287] x19: ffff80afb75ec098 x18: 0000000000000000 [134.312945] x17: 0000000000000000 x16: 0000000000000000 [134.322517] x15: 0000000000000002 x14: 0000000000000000 [134.332030] x13: dead000000000100 x12: ffff7e02bea3c988 [134.341487] x11: ffff80affbee9e68 x10: 0000000000000000 [134.351033] x9 : 6fffff8000008101 x8 : 0000000000000000 [134.360569] x7 : dead000000000100 x6 : ffff000009579748 [134.370059] x5 : 0000000000210d00 x4 : 0000000000000000 [134.379550] x3 : 0000000000000001 x2 : 0000000000000000 [134.388813] x1 : 6b6b6b6b6b6b6b6b x0 : 0000000000000000 [134.397993] Process rmmod (pid: 27834, stack limit = 0x00000000d474b7fd) [134.408498] Call trace: [134.414611] hns_ae_put_handle+0x38/0x60 [134.422208] hnae_put_handle+0xd4/0x108 [134.429563] hns_nic_dev_remove+0x60/0xc0 [hns_enet_drv] [134.438342] platform_drv_remove+0x2c/0x70 [134.445958] device_release_driver_internal+0x174/0x208 [134.454810] driver_detach+0x70/0xd8 [134.461913] bus_remove_driver+0x64/0xe8 [134.469396] driver_unregister+0x34/0x60 [134.476822] platform_driver_unregister+0x20/0x30 [134.485130] hns_nic_dev_driver_exit+0x14/0x6e4 [hns_enet_drv] [134.494634] __arm64_sys_delete_module+0x238/0x290 struct hnae_handle is a member of struct hnae_vf_cb, so when vf_cb is freed, than use hnae_handle will cause use after free panic. This patch frees vf_cb after hnae_handle used. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net: hns: Fix WARNING when hns modules installedYonglong Liu
Commit 308c6cafde01 ("net: hns: All ports can not work when insmod hns ko after rmmod.") add phy_stop in hns_nic_init_phy(), In the branch of "net", this method is effective, but in the branch of "net-next", it will cause a WARNING when hns modules loaded, reference to commit 2b3e88ea6528 ("net: phy: improve phy state checking"): [10.092168] ------------[ cut here ]------------ [10.092171] called from state READY [10.092189] WARNING: CPU: 4 PID: 1 at ../drivers/net/phy/phy.c:854 phy_stop+0x90/0xb0 [10.092192] Modules linked in: [10.092197] CPU: 4 PID:1 Comm:swapper/0 Not tainted 4.20.0-rc7-next-20181220 #1 [10.092200] Hardware name: Huawei TaiShan 2280 /D05, BIOS Hisilicon D05 UEFI 16.12 Release 05/15/2017 [10.092202] pstate: 60000005 (nZCv daif -PAN -UAO) [10.092205] pc : phy_stop+0x90/0xb0 [10.092208] lr : phy_stop+0x90/0xb0 [10.092209] sp : ffff00001159ba90 [10.092212] x29: ffff00001159ba90 x28: 0000000000000007 [10.092215] x27: ffff000011180068 x26: ffff0000110a5620 [10.092218] x25: ffff0000113b6000 x24: ffff842f96dac000 [10.092221] x23: 0000000000000000 x22: 0000000000000000 [10.092223] x21: ffff841fb8425e18 x20: ffff801fb3a56438 [10.092226] x19: ffff801fb3a56000 x18: ffffffffffffffff [10.092228] x17: 0000000000000000 x16: 0000000000000000 [10.092231] x15: ffff00001122d6c8 x14: ffff00009159b7b7 [10.092234] x13: ffff00001159b7c5 x12: ffff000011245000 [10.092236] x11: 0000000005f5e0ff x10: ffff00001159b750 [10.092239] x9 : 00000000ffffffd0 x8 : 0000000000000465 [10.092242] x7 : ffff0000112457f8 x6 : ffff0000113bd7ce [10.092245] x5 : 0000000000000000 x4 : 0000000000000000 [10.092247] x3 : 00000000ffffffff x2 : ffff000011245828 [10.092250] x1 : 4b5860bd05871300 x0 : 0000000000000000 [10.092253] Call trace: [10.092255] phy_stop+0x90/0xb0 [10.092260] hns_nic_init_phy+0xf8/0x110 [10.092262] hns_nic_try_get_ae+0x4c/0x3b0 [10.092264] hns_nic_dev_probe+0x1fc/0x480 [10.092268] platform_drv_probe+0x50/0xa0 [10.092271] really_probe+0x1f4/0x298 [10.092273] driver_probe_device+0x58/0x108 [10.092275] __driver_attach+0xdc/0xe0 [10.092278] bus_for_each_dev+0x74/0xc8 [10.092280] driver_attach+0x20/0x28 [10.092283] bus_add_driver+0x1b8/0x228 [10.092285] driver_register+0x60/0x110 [10.092288] __platform_driver_register+0x40/0x48 [10.092292] hns_nic_dev_driver_init+0x18/0x20 [10.092296] do_one_initcall+0x5c/0x180 [10.092299] kernel_init_freeable+0x198/0x240 [10.092303] kernel_init+0x10/0x108 [10.092306] ret_from_fork+0x10/0x18 [10.092308] ---[ end trace 1396dd0278e397eb ]--- This WARNING occurred because of calling phy_stop before phy_start. The root cause of the problem in commit '308c6cafde01' is: Reference to hns_nic_init_phy, the flag phydev->supported is changed after phy_connect_direct. The flag phydev->supported is 0x6ff when hns modules is loaded, so will not change Fiber Port power(Reference to marvell.c), which is power on at default. Then the flag phydev->supported is changed to 0x6f, so Fiber Port power is off when removing hns modules. When hns modules installed again, the flag phydev->supported is default value 0x6ff, so will not change Fiber Port power(now is off), causing mac link not up problem. So the solution is change phy flags before phy_connect_direct. Fixes: 308c6cafde01 ("net: hns: All ports can not work when insmod hns ko after rmmod.") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net: dsa: mt7530: Drop unused GPIO includeLinus Walleij
This driver uses GPIO descriptors only, <linux/of_gpio.h> is not used so drop the include. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04Merge branch 'GUE-error-recursion'David S. Miller
Stefano Brivio says: ==================== Fix two further potential unbounded recursions in GUE error handlers Patch 1/2 takes care of preventing the issue fixed by commit 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler") also with UDP-Lite payloads -- I just realised this might happen from a syzbot report. Patch 2/2 fixes the issue for both UDP and UDP-Lite on IPv6, which I also forgot to deal with in that same commit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04fou6: Prevent unbounded recursion in GUE error handlerStefano Brivio
I forgot to deal with IPv6 in commit 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler"). Now syzbot reported what might be the same type of issue, caused by gue6_err(), that is, handling exceptions for direct UDP encapsulation in GUE (UDP-in-UDP) leads to unbounded recursion in the GUE exception handler. As it probably doesn't make sense to set up GUE this way, and it's currently not even possible to configure this, skip exception handling for UDP (or UDP-Lite) packets encapsulated in UDP (or UDP-Lite) packets with GUE on IPv6. Reported-by: syzbot+4ad25edc7a33e4ab91e0@syzkaller.appspotmail.com Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: b8a51b38e4d4 ("fou, fou6: ICMP error handlers for FoU and GUE") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04fou: Prevent unbounded recursion in GUE error handler also with UDP-LiteStefano Brivio
In commit 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler"), I didn't take care of the case where UDP-Lite is encapsulated into UDP or UDP-Lite with GUE. From a syzbot report about a possibly similar issue with GUE on IPv6, I just realised the same thing might happen with a UDP-Lite inner payload. Also skip exception handling for inner UDP-Lite protocol. Fixes: 11789039da53 ("fou: Prevent unbounded recursion in GUE error handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04openvswitch: Fix IPv6 later frags parsingYi-Hung Wei
The previous commit fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags") introduces IP protocol number parsing for IPv6 later frags that can mess up the network header length calculation logic, i.e. nh_len < 0. However, the network header length calculation is mainly for deriving the transport layer header in the key extraction process which the later fragment does not apply. Therefore, this commit skips the network header length calculation to fix the issue. Reported-by: Chris Mi <chrism@mellanox.com> Reported-by: Greg Rose <gvrose8192@gmail.com> Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net: macb: remove unnecessary codeClaudiu Beznea
Commit 653e92a9175e ("net: macb: add support for padding and fcs computation") introduced a bug fixed by commit 899ecaedd155 ("net: ethernet: cadence: fix socket buffer corruption problem"). Code removed in this patch is not reachable at all so remove it. Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation") Cc: Tristram Ha <Tristram.Ha@microchip.com> Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net: dsa: microchip: Drop unused GPIO includesLinus Walleij
This driver does not use the old GPIO includes so drop them. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04Merge branch 'qed-fixes'David S. Miller
Denis Bolotin says: ==================== qed: Misc fixes in qed This patch series fixes 2 potential bugs in qed. Please consider applying to net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrierDenis Bolotin
Make sure chain element is updated before ringing the doorbell. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page countDenis Bolotin
In PBL chains with non power of 2 page count, the producer is not at the beginning of the chain when index is 0 after a wrap. Therefore, after the producer index wrap around, page index should be calculated more carefully. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04net, skbuff: do not prefer skb allocation fails earlyDavid Rientjes
Commit dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may directly reclaim. The previous behavior would require reclaim up to 1 << order pages for skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing, otherwise the allocations in alloc_skb() would loop in the page allocator looking for memory. __GFP_RETRY_MAYFAIL makes both allocations failable under memory pressure, including for the HEAD allocation. This can cause, among many other things, write() to fail with ENOTCONN during RPC when under memory pressure. These allocations should succeed as they did previous to dcda9b04713c even if it requires calling the oom killer and additional looping in the page allocator to find memory. There is no way to specify the previous behavior of __GFP_REPEAT, but it's unlikely to be necessary since the previous behavior only guaranteed that 1 << order pages would be reclaimed before failing for order > PAGE_ALLOC_COSTLY_ORDER. That reclaim is not guaranteed to be contiguous memory, so repeating for such large orders is usually not beneficial. Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous behavior, specifically not allowing alloc_skb() to fail for small orders and oom kill if necessary rather than allowing RPCs to fail. Fixes: dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic") Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04soc/fsl/qe: fix err handling of ucc_of_parse_tdmWen Yang
Currently there are some issues with the ucc_of_parse_tdm function: 1, a possible null pointer dereference in ucc_of_parse_tdm, detected by the semantic patch deref_null.cocci, with the following warning: drivers/soc/fsl/qe/qe_tdm.c:177:21-24: ERROR: pdev is NULL but dereferenced. 2, dev gets modified, so in any case that devm_iounmap() will fail even when the new pdev is valid, because the iomap was done with a different pdev. 3, there is no driver bind with the "fsl,t1040-qe-si" or "fsl,t1040-qe-siram" device. So allocating resources using devm_*() with these devices won't provide a cleanup path for these resources when the caller fails. This patch fixes them. Suggested-by: Li Yang <leoyang.li@nxp.com> Suggested-by: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Reviewed-by: Peng Hao <peng.hao2@zte.com.cn> CC: Julia Lawall <julia.lawall@lip6.fr> CC: Zhao Qiang <qiang.zhao@nxp.com> CC: David S. Miller <davem@davemloft.net> CC: netdev@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04r8169: Add support for new Realtek EthernetKai-Heng Feng
There are two new Realtek Ethernet devices which are re-branded r8168h. Add the IDs to to support them. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04netlink: fixup regression in RTM_GETADDRArthur Gautier
This commit fixes a regression in AF_INET/RTM_GETADDR and AF_INET6/RTM_GETADDR. Before this commit, the kernel would stop dumping addresses once the first skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error shouldn't be sent back to netlink_dump so the callback is kept alive. The userspace is expected to call back with a new empty skb. Changes from V1: - The error is not handled in netlink_dump anymore but rather in inet_dump_ifaddr and inet6_dump_addr directly as suggested by David Ahern. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Arthur Gautier <baloo@gandi.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-04octeontx2-af: Fix a resource leak in an error handling path in 'cgx_probe()'Christophe JAILLET
If an error occurs after the call to 'pci_alloc_irq_vectors()', we must call 'pci_free_irq_vectors()' in order to avoid a resource leak. The same sequence is already in place in the corresponding 'cgx_remove()' function. Fixes: 1463f382f58d ("octeontx2-af: Add support for CGX link management") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03Merge tag 'locks-v4.21-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking bugfix from Jeff Layton: "This is a one-line fix for a bug that syzbot turned up in the new patches to mitigate the thundering herd when a lock is released" * tag 'locks-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix error in locks_move_blocks()
2019-01-03Merge tag 'sound-fix-4.21-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Among a few HD-audio fixes, the only significant one is the regression fix on some machines like Dell XPS due to the default binding changes. We ended up reverting the whole since the fix for ASoC HD-audio driver won't be available immediately" * tag 'sound-fix-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Revert DSP detection on legacy HD-audio driver ALSA: hda/tegra: clear pending irq handlers ALSA: hda/realtek: Enable the headset mic auto detection for ASUS laptops
2019-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several fixes here. Basically split down the line between newly introduced regressions and long existing problems: 1) Double free in tipc_enable_bearer(), from Cong Wang. 2) Many fixes to nf_conncount, from Florian Westphal. 3) op->get_regs_len() can throw an error, check it, from Yunsheng Lin. 4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman driver, from Scott Wood. 5) Inifnite loop in fib_empty_table(), from Yue Haibing. 6) Use after free in ax25_fillin_cb(), from Cong Wang. 7) Fix socket locking in nr_find_socket(), also from Cong Wang. 8) Fix WoL wakeup enable in r8169, from Heiner Kallweit. 9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani. 10) Fix ptr_ring wrap during queue swap, from Cong Wang. 11) Missing shutdown callback in hinic driver, from Xue Chaojing. 12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano Brivio. 13) BPF out of bounds speculation fixes from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: Fix dump of specific table with strict checking bpf: add various test cases to selftests bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix check_map_access smin_value test when pointer contains offset bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict map value pointer arithmetic for unprivileged bpf: enable access to ax register also from verifier rewrite bpf: move tmp variable into ax register in interpreter bpf: move {prev_,}insn_idx into verifier env isdn: fix kernel-infoleak in capi_unlocked_ioctl ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error net/hamradio/6pack: use mod_timer() to rearm timers net-next/hinic:add shutdown callback net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT ip: validate header length on virtual device xmit tap: call skb_probe_transport_header after setting skb->dev ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() net: rds: remove unnecessary NULL check ...
2019-01-02ipv6: Consider sk_bound_dev_if when binding a socket to an addressDavid Ahern
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF. Resolve by considering the device if sk_bound_dev_if is set. This problem exists from the beginning of git history. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02ipv6: Fix dump of specific table with strict checkingDavid Ahern
Dump of a specific table with strict checking enabled is looping. The problem is that the end of the table dump is not marked in the cb. When dumping a specific table, cb args 0 and 1 are not used (they are the hash index and entry with an hash table index when dumping all tables). Re-use args[0] to hold a 'done' flag for the specific table dump. Fixes: 13e38901d46ca ("net/ipv6: Plumb support for filtering route dumps") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "A tiny pull request this merge window unfortunately, should get more material in for the next release: - new driver for Raspberry Pi's touchscreen (firmware interface) - miscellaneous input driver fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G Input: atmel_mxt_ts - don't try to free unallocated kernel memory Input: drv2667 - fix indentation issues Input: touchscreen - fix coding style issue Input: add official Raspberry Pi's touchscreen driver Input: nomadik-ske-keypad - fix a loop timeout test Input: rotary-encoder - don't log EPROBE_DEFER to kernel log Input: olpc_apsp - remove set but not used variable 'np' Input: olpc_apsp - enable the SP clock Input: olpc_apsp - check FIFO status on open(), not probe() Input: olpc_apsp - drop CONFIG_OLPC dependency clk: mmp2: add SP clock dt-bindings: marvell,mmp2: Add clock id for the SP clock Input: ad7879 - drop platform data support
2019-01-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost updates from Michael Tsirkin: "Features, fixes, cleanups: - discard in virtio blk - misc fixes and cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: correct the related warning message vhost: split structs into a separate header file virtio: remove deprecated VIRTIO_PCI_CONFIG() vhost/vsock: switch to a mutex for vhost_vsock_hash virtio_blk: add discard and write zeroes support
2019-01-02Merge tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: - Dead code removal for loop/sunvdc (Chengguang) - Mark BIDI support for bsg as deprecated, logging a single dmesg warning if anyone is actually using it (Christoph) - blkcg cleanup, killing a dead function and making the tryget_closest variant easier to read (Dennis) - Floppy fixes, one fixing a regression in swim3 (Finn) - lightnvm use-after-free fix (Gustavo) - gdrom leak fix (Wenwen) - a set of drbd updates (Lars, Luc, Nathan, Roland) * tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block: (28 commits) block/swim3: Fix regression on PowerBook G3 block/swim3: Fix -EBUSY error when re-opening device after unmount block/swim3: Remove dead return statement block/amiflop: Don't log error message on invalid ioctl gdrom: fix a memory leak bug lightnvm: pblk: fix use-after-free bug block: sunvdc: remove redundant code block: loop: remove redundant code bsg: deprecate BIDI support in bsg blkcg: remove unused __blkg_release_rcu() blkcg: clean up blkg_tryget_closest() drbd: Change drbd_request_detach_interruptible's return type to int drbd: Avoid Clang warning about pointless switch statment drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") drbd: skip spurious timeout (ping-timeo) when failing promote drbd: don't retry connection if peers do not agree on "authentication" settings drbd: fix print_st_err()'s prototype to match the definition drbd: avoid spurious self-outdating with concurrent disconnect / down drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix comment typos ...
2019-01-02Merge tag 'for-4.21/libata-20190102' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull libata fix from Jens Axboe: "This libata change missed the original libata pull request. Just a single fix in here, fixing a missed reference drop" * tag 'for-4.21/libata-20190102' of git://git.kernel.dk/linux-block: ata: pata_macio: add of_node_put()
2019-01-02Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: "One more patch to generalize a set of DT binding defines now before -rc1 comes out. This way the SoC DTS files can use the proper defines from a stable tag" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx8qxp: make the name of clock ID generic
2019-01-02Merge tag 'devprop-4.21-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fixes from Rafael Wysocki: "Fix two potential NULL pointer dereferences found by Coverity in the software nodes code introduced recently (Colin Ian King)" * tag 'devprop-4.21-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: drivers: base: swnode: check if swnode is NULL before dereferencing it drivers: base: swnode: check if pointer p is NULL before dereferencing it
2019-01-02Merge tag 'mailbox-v4.21' of ↵Linus Torvalds
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - Introduce device-managed registration devm_mbox_controller_un/register and convert drivers to use it - Introduce flush api to support clients that must busy-wait in atomic context - Support multiple controllers per device - Hi3660: a bugfix and constify ops structure - TI-MsgMgr: off by one bugfix. - BCM: switch to spdx license - Tegra-HSP: support for shared mailboxes and suspend/resume. * tag 'mailbox-v4.21' of git://git.linaro.org/landing-teams/working/fujitsu/integration: (30 commits) mailbox: tegra-hsp: Use device-managed registration API mailbox: tegra-hsp: use devm_kstrdup_const() mailbox: tegra-hsp: Add suspend/resume support mailbox: tegra-hsp: Add support for shared mailboxes dt-bindings: tegra186-hsp: Add shared mailboxes mailbox: Allow multiple controllers per device mailbox: Support blocking transfers in atomic context mailbox: ti-msgmgr: Use device-managed registration API mailbox: stm32-ipcc: Use device-managed registration API mailbox: rockchip: Use device-managed registration API mailbox: qcom-apcs: Use device-managed registration API mailbox: platform-mhu: Use device-managed registration API mailbox: omap: Use device-managed registration API mailbox: mtk-cmdq: Remove needless devm_kfree() calls mailbox: mtk-cmdq: Use device-managed registration API mailbox: xgene-slimpro: Use device-managed registration API mailbox: sti: Use device-managed registration API mailbox: altera: Use device-managed registration API mailbox: imx: Use device-managed registration API mailbox: hi6220: Use device-managed registration API ...
2019-01-02Merge branch 'for-linus-4.21-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - DISCARD support for our block device driver - Many TLB flush optimizations - Various smaller fixes - And most important, Anton agreed to help me maintaining UML * 'for-linus-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Remove obsolete reenable_XX calls um: writev needs <sys/uio.h> Add Anton Ivanov to UML maintainers um: remove redundant generic-y um: Optimize Flush TLB for force/fork case um: Avoid marking pages with "changed protection" um: Skip TLB flushing where not needed um: Optimize TLB operations v2 um: Remove unnecessary faulted check in uaccess.c um: Add support for DISCARD in the UBD Driver um: Remove unsafe printks from the io thread um: Clean-up command processing in UML UBD driver um: Switch to block-mq constants in the UML UBD driver um: Make GCOV depend on !KCOV um: Include sys/uio.h to have writev() um: Add HAVE_DEBUG_BUGVERBOSE um: Update maintainers file entry
2019-01-02Merge tag 's390-4.21-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - A larger update for the zcrypt / AP bus code: + Update two inline assemblies in the zcrypt driver to make gcc happy + Add a missing reply code for invalid special commands for zcrypt + Allow AP device reset to be triggered from user space + Split the AP scan function into smaller, more readable functions - Updates for vfio-ccw and vfio-ap + Add maintainers and reviewer for vfio-ccw + Include facility.h in vfio_ap_drv.c to avoid fragile include chain + Simplicy vfio-ccw state machine - Use the common code version of bust_spinlocks - Make use of the DEFINE_SHOW_ATTRIBUTE - Fix three incorrect file permissions in the DASD driver - Remove bit spin-lock from the PCI interrupt handler - Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code * tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: rework ap scan bus code s390/zcrypt: make sysfs reset attribute trigger queue reset s390/pci: fix sleeping in atomic during hotplug s390/pci: remove bit_lock usage in interrupt handler s390/drivers: fix proc/debugfs file permissions s390: convert to DEFINE_SHOW_ATTRIBUTE MAINTAINERS/vfio-ccw: add Farhan and Eric, make Halil Reviewer vfio: ccw: Merge BUSY and BOXED states s390: use common bust_spinlocks() s390/zcrypt: improve special ap message cmd handling s390/ap: rework assembler functions to use unions for in/out register variables s390: vfio-ap: include <asm/facility> for test_facility()
2019-01-02locks: fix error in locks_move_blocks()NeilBrown
After moving all requests from fl->fl_blocked_requests to new->fl_blocked_requests it is nonsensical to do anything to all the remaining elements, there aren't any. This should do something to all the requests that have been moved. For simplicity, it does it to all requests in the target list. Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests is "obviously correct" as it preserves the invariant of the linkage among requests. Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-01-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-01-02 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) prevent out of bounds speculation on pointer arithmetic, from Daniel. 2) typo fix, from Xiaozhou. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Yet another double DMA-unmap # v4.20 Features: - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG - Per-xprt rdma receive workqueues - Drop support for FMR memory registration - Make port= mount option optional for RDMA mounts Other bugfixes and cleanups: - Remove unused nfs4_xdev_fs_type declaration - Fix comments for behavior that has changed - Remove generic RPC credentials by switching to 'struct cred' - Fix crossing mountpoints with different auth flavors - Various xprtrdma fixes from testing and auditing the close code - Fixes for disconnect issues when using xprtrdma with krb5 - Clean up and improve xprtrdma trace points - Fix NFS v4.2 async copy reboot recovery" * tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) sunrpc: convert to DEFINE_SHOW_ATTRIBUTE sunrpc: Add xprt after nfs4_test_session_trunk() sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS sunrpc: handle ENOMEM in rpcb_getport_async NFS: remove unnecessary test for IS_ERR(cred) xprtrdma: Prevent leak of rpcrdma_rep objects NFSv4.2 fix async copy reboot recovery xprtrdma: Don't leak freed MRs xprtrdma: Add documenting comment for rpcrdma_buffer_destroy xprtrdma: Replace outdated comment for rpcrdma_ep_post xprtrdma: Update comments in frwr_op_send SUNRPC: Fix some kernel doc complaints SUNRPC: Simplify defining common RPC trace events NFS: Fix NFSv4 symbolic trace point output xprtrdma: Trace mapping, alloc, and dereg failures xprtrdma: Add trace points for calls to transport switch methods xprtrdma: Relocate the xprtrdma_mr_map trace points xprtrdma: Clean up of xprtrdma chunk trace points xprtrdma: Remove unused fields from rpcrdma_ia xprtrdma: Cull dprintk() call sites ...
2019-01-02Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits) nfs: fixed broken compilation in nfs_callback_up_net() nfs: minor typo in nfs4_callback_up_net() sunrpc: fix debug message in svc_create_xprt() sunrpc: make visible processing error in bc_svc_process() sunrpc: remove unused xpo_prep_reply_hdr callback sunrpc: remove svc_rdma_bc_class sunrpc: remove svc_tcp_bc_class sunrpc: remove unused bc_up operation from rpc_xprt_ops sunrpc: replace svc_serv->sv_bc_xprt by boolean flag sunrpc: use-after-free in svc_process_common() sunrpc: use SVC_NET() in svcauth_gss_* functions nfsd: drop useless LIST_HEAD lockd: Show pid of lockd for remote locks NFSD remove OP_CACHEME from 4.2 op_flags nfsd: Return EPERM, not EACCES, in some SETATTR cases sunrpc: fix cache_head leak due to queued request nfsd: clean up indentation, increase indentation in switch statement svcrdma: Optimize the logic that selects the R_key to invalidate nfsd: fix a warning in __cld_pipe_upcall() nfsd4: fix crash on writing v4_end_grace before nfsd startup ...
2019-01-02Merge branch 'prevent-oob-under-speculation'Alexei Starovoitov
Daniel Borkmann says: ==================== This set fixes an out of bounds case under speculative execution by implementing masking of pointer alu into the verifier. For details please see the individual patches. Thanks! v2 -> v3: - 8/9: change states_equal condition into old->speculative && !cur->speculative, thanks Jakub! - 8/9: remove incorrect speculative state test in propagate_liveness(), thanks Jakub! v1 -> v2: - Typo fixes in commit msg and a comment, thanks David! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: add various test cases to selftestsDaniel Borkmann
Add various map value pointer related test cases to test_verifier kselftest to reflect recent changes and improve test coverage. The tests include basic masking functionality, unprivileged behavior on pointer arithmetic which goes oob, mixed bounds tests, negative unknown scalar but resulting positive offset for access and helper range, handling of arithmetic from multiple maps, various masking scenarios with subsequent map value access and others including two test cases from Jann Horn for prior fixes. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: prevent out of bounds speculation on pointer arithmeticDaniel Borkmann
Jann reported that the original commit back in b2157399cc98 ("bpf: prevent out-of-bounds speculation") was not sufficient to stop CPU from speculating out of bounds memory access: While b2157399cc98 only focussed on masking array map access for unprivileged users for tail calls and data access such that the user provided index gets sanitized from BPF program and syscall side, there is still a more generic form affected from BPF programs that applies to most maps that hold user data in relation to dynamic map access when dealing with unknown scalars or "slow" known scalars as access offset, for example: - Load a map value pointer into R6 - Load an index into R7 - Do a slow computation (e.g. with a memory dependency) that loads a limit into R8 (e.g. load the limit from a map for high latency, then mask it to make the verifier happy) - Exit if R7 >= R8 (mispredicted branch) - Load R0 = R6[R7] - Load R0 = R6[R0] For unknown scalars there are two options in the BPF verifier where we could derive knowledge from in order to guarantee safe access to the memory: i) While </>/<=/>= variants won't allow to derive any lower or upper bounds from the unknown scalar where it would be safe to add it to the map value pointer, it is possible through ==/!= test however. ii) another option is to transform the unknown scalar into a known scalar, for example, through ALU ops combination such as R &= <imm> followed by R |= <imm> or any similar combination where the original information from the unknown scalar would be destroyed entirely leaving R with a constant. The initial slow load still precedes the latter ALU ops on that register, so the CPU executes speculatively from that point. Once we have the known scalar, any compare operation would work then. A third option only involving registers with known scalars could be crafted as described in [0] where a CPU port (e.g. Slow Int unit) would be filled with many dependent computations such that the subsequent condition depending on its outcome has to wait for evaluation on its execution port and thereby executing speculatively if the speculated code can be scheduled on a different execution port, or any other form of mistraining as described in [1], for example. Given this is not limited to only unknown scalars, not only map but also stack access is affected since both is accessible for unprivileged users and could potentially be used for out of bounds access under speculation. In order to prevent any of these cases, the verifier is now sanitizing pointer arithmetic on the offset such that any out of bounds speculation would be masked in a way where the pointer arithmetic result in the destination register will stay unchanged, meaning offset masked into zero similar as in array_index_nospec() case. With regards to implementation, there are three options that were considered: i) new insn for sanitation, ii) push/pop insn and sanitation as inlined BPF, iii) reuse of ax register and sanitation as inlined BPF. Option i) has the downside that we end up using from reserved bits in the opcode space, but also that we would require each JIT to emit masking as native arch opcodes meaning mitigation would have slow adoption till everyone implements it eventually which is counter-productive. Option ii) and iii) have both in common that a temporary register is needed in order to implement the sanitation as inlined BPF since we are not allowed to modify the source register. While a push / pop insn in ii) would be useful to have in any case, it requires once again that every JIT needs to implement it first. While possible, amount of changes needed would also be unsuitable for a -stable patch. Therefore, the path which has fewer changes, less BPF instructions for the mitigation and does not require anything to be changed in the JITs is option iii) which this work is pursuing. The ax register is already mapped to a register in all JITs (modulo arm32 where it's mapped to stack as various other BPF registers there) and used in constant blinding for JITs-only so far. It can be reused for verifier rewrites under certain constraints. The interpreter's tmp "register" has therefore been remapped into extending the register set with hidden ax register and reusing that for a number of instructions that needed the prior temporary variable internally (e.g. div, mod). This allows for zero increase in stack space usage in the interpreter, and enables (restricted) generic use in rewrites otherwise as long as such a patchlet does not make use of these instructions. The sanitation mask is dynamic and relative to the offset the map value or stack pointer currently holds. There are various cases that need to be taken under consideration for the masking, e.g. such operation could look as follows: ptr += val or val += ptr or ptr -= val. Thus, the value to be sanitized could reside either in source or in destination register, and the limit is different depending on whether the ALU op is addition or subtraction and depending on the current known and bounded offset. The limit is derived as follows: limit := max_value_size - (smin_value + off). For subtraction: limit := umax_value + off. This holds because we do not allow any pointer arithmetic that would temporarily go out of bounds or would have an unknown value with mixed signed bounds where it is unclear at verification time whether the actual runtime value would be either negative or positive. For example, we have a derived map pointer value with constant offset and bounded one, so limit based on smin_value works because the verifier requires that statically analyzed arithmetic on the pointer must be in bounds, and thus it checks if resulting smin_value + off and umax_value + off is still within map value bounds at time of arithmetic in addition to time of access. Similarly, for the case of stack access we derive the limit as follows: MAX_BPF_STACK + off for subtraction and -off for the case of addition where off := ptr_reg->off + ptr_reg->var_off.value. Subtraction is a special case for the masking which can be in form of ptr += -val, ptr -= -val, or ptr -= val. In the first two cases where we know that the value is negative, we need to temporarily negate the value in order to do the sanitation on a positive value where we later swap the ALU op, and restore original source register if the value was in source. The sanitation of pointer arithmetic alone is still not fully sufficient as is, since a scenario like the following could happen ... PTR += 0x1000 (e.g. K-based imm) PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON PTR += 0x1000 PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON [...] ... which under speculation could end up as ... PTR += 0x1000 PTR -= 0 [ truncated by mitigation ] PTR += 0x1000 PTR -= 0 [ truncated by mitigation ] [...] ... and therefore still access out of bounds. To prevent such case, the verifier is also analyzing safety for potential out of bounds access under speculative execution. Meaning, it is also simulating pointer access under truncation. We therefore "branch off" and push the current verification state after the ALU operation with known 0 to the verification stack for later analysis. Given the current path analysis succeeded it is likely that the one under speculation can be pruned. In any case, it is also subject to existing complexity limits and therefore anything beyond this point will be rejected. In terms of pruning, it needs to be ensured that the verification state from speculative execution simulation must never prune a non-speculative execution path, therefore, we mark verifier state accordingly at the time of push_stack(). If verifier detects out of bounds access under speculative execution from one of the possible paths that includes a truncation, it will reject such program. Given we mask every reg-based pointer arithmetic for unprivileged programs, we've been looking into how it could affect real-world programs in terms of size increase. As the majority of programs are targeted for privileged-only use case, we've unconditionally enabled masking (with its alu restrictions on top of it) for privileged programs for the sake of testing in order to check i) whether they get rejected in its current form, and ii) by how much the number of instructions and size will increase. We've tested this by using Katran, Cilium and test_l4lb from the kernel selftests. For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb we've used test_l4lb.o as well as test_l4lb_noinline.o. We found that none of the programs got rejected by the verifier with this change, and that impact is rather minimal to none. balancer_kern.o had 13,904 bytes (1,738 insns) xlated and 7,797 bytes JITed before and after the change. Most complex program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated and 18,538 bytes JITed before and after and none of the other tail call programs in bpf_lxc.o had any changes either. For the older bpf_lxc_opt_-DUNKNOWN.o object we found a small increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed after the change. Other programs from that object file had similar small increase. Both test_l4lb.o had no change and remained at 6,544 bytes (817 insns) xlated and 3,401 bytes JITed and for test_l4lb_noinline.o constant at 5,080 bytes (634 insns) xlated and 3,313 bytes JITed. This can be explained in that LLVM typically optimizes stack based pointer arithmetic by using K-based operations and that use of dynamic map access is not overly frequent. However, in future we may decide to optimize the algorithm further under known guarantees from branch and value speculation. Latter seems also unclear in terms of prediction heuristics that today's CPUs apply as well as whether there could be collisions in e.g. the predictor's Value History/Pattern Table for triggering out of bounds access, thus masking is performed unconditionally at this point but could be subject to relaxation later on. We were generally also brainstorming various other approaches for mitigation, but the blocker was always lack of available registers at runtime and/or overhead for runtime tracking of limits belonging to a specific pointer. Thus, we found this to be minimally intrusive under given constraints. With that in place, a simple example with sanitized access on unprivileged load at post-verification time looks as follows: # bpftool prog dump xlated id 282 [...] 28: (79) r1 = *(u64 *)(r7 +0) 29: (79) r2 = *(u64 *)(r7 +8) 30: (57) r1 &= 15 31: (79) r3 = *(u64 *)(r0 +4608) 32: (57) r3 &= 1 33: (47) r3 |= 1 34: (2d) if r2 > r3 goto pc+19 35: (b4) (u32) r11 = (u32) 20479 | 36: (1f) r11 -= r2 | Dynamic sanitation for pointer 37: (4f) r11 |= r2 | arithmetic with registers 38: (87) r11 = -r11 | containing bounded or known 39: (c7) r11 s>>= 63 | scalars in order to prevent 40: (5f) r11 &= r2 | out of bounds speculation. 41: (0f) r4 += r11 | 42: (71) r4 = *(u8 *)(r4 +0) 43: (6f) r4 <<= r1 [...] For the case where the scalar sits in the destination register as opposed to the source register, the following code is emitted for the above example: [...] 16: (b4) (u32) r11 = (u32) 20479 17: (1f) r11 -= r2 18: (4f) r11 |= r2 19: (87) r11 = -r11 20: (c7) r11 s>>= 63 21: (5f) r2 &= r11 22: (0f) r2 += r0 23: (61) r0 = *(u32 *)(r2 +0) [...] JIT blinding example with non-conflicting use of r10: [...] d5: je 0x0000000000000106 _ d7: mov 0x0(%rax),%edi | da: mov $0xf153246,%r10d | Index load from map value and e0: xor $0xf153259,%r10 | (const blinded) mask with 0x1f. e7: and %r10,%rdi |_ ea: mov $0x2f,%r10d | f0: sub %rdi,%r10 | Sanitized addition. Both use r10 f3: or %rdi,%r10 | but do not interfere with each f6: neg %r10 | other. (Neither do these instructions f9: sar $0x3f,%r10 | interfere with the use of ax as temp fd: and %r10,%rdi | in interpreter.) 100: add %rax,%rdi |_ 103: mov 0x0(%rdi),%eax [...] Tested that it fixes Jann's reproducer, and also checked that test_verifier and test_progs suite with interpreter, JIT and JIT with hardening enabled on x86-64 and arm64 runs successfully. [0] Speculose: Analyzing the Security Implications of Speculative Execution in CPUs, Giorgi Maisuradze and Christian Rossow, https://arxiv.org/pdf/1801.04084.pdf [1] A Systematic Evaluation of Transient Execution Attacks and Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, Daniel Gruss, https://arxiv.org/pdf/1811.05441.pdf Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: fix check_map_access smin_value test when pointer contains offsetDaniel Borkmann
In check_map_access() we probe actual bounds through __check_map_access() with offset of reg->smin_value + off for lower bound and offset of reg->umax_value + off for the upper bound. However, even though the reg->smin_value could have a negative value, the final result of the sum with off could be positive when pointer arithmetic with known and unknown scalars is combined. In this case we reject the program with an error such as "R<x> min value is negative, either use unsigned index or do a if (index >=0) check." even though the access itself would be fine. Therefore extend the check to probe whether the actual resulting reg->smin_value + off is less than zero. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: restrict unknown scalars of mixed signed bounds for unprivilegedDaniel Borkmann
For unknown scalars of mixed signed bounds, meaning their smin_value is negative and their smax_value is positive, we need to reject arithmetic with pointer to map value. For unprivileged the goal is to mask every map pointer arithmetic and this cannot reliably be done when it is unknown at verification time whether the scalar value is negative or positive. Given this is a corner case, the likelihood of breaking should be very small. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: restrict stack pointer arithmetic for unprivilegedDaniel Borkmann
Restrict stack pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a stack pointer as a destination we simulate a check_stack_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. This is analog to map value pointer arithmetic and needed for masking later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: restrict map value pointer arithmetic for unprivilegedDaniel Borkmann
Restrict map value pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a map value pointer as a destination it will simulate a check_map_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. We use this later on for masking any pointer arithmetic with the remainder of the map value space. The likelihood of breaking any existing real-world unprivileged eBPF program is very small for this corner case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: enable access to ax register also from verifier rewriteDaniel Borkmann
Right now we are using BPF ax register in JIT for constant blinding as well as in interpreter as temporary variable. Verifier will not be able to use it simply because its use will get overridden from the former in bpf_jit_blind_insn(). However, it can be made to work in that blinding will be skipped if there is prior use in either source or destination register on the instruction. Taking constraints of ax into account, the verifier is then open to use it in rewrites under some constraints. Note, ax register already has mappings in every eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: move tmp variable into ax register in interpreterDaniel Borkmann
This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run() into the hidden ax register. The latter is currently only used in JITs for constant blinding as a temporary scratch register, meaning the BPF interpreter will never see the use of ax. Therefore it is safe to use it for the cases where tmp has been used earlier. This is needed to later on allow restricted hidden use of ax in both interpreter and JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02bpf: move {prev_,}insn_idx into verifier envDaniel Borkmann
Move prev_insn_idx and insn_idx from the do_check() function into the verifier environment, so they can be read inside the various helper functions for handling the instructions. It's easier to put this into the environment rather than changing all call-sites only to pass it along. insn_idx is useful in particular since this later on allows to hold state in env->insn_aux_data[env->insn_idx]. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-02Merge tag '9p-for-4.21' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: "Missing prototype warning fix and a syzkaller fix when a 9p server advertises a too small msize" * tag '9p-for-4.21' of git://github.com/martinetd/linux: 9p/net: put a lower bound on msize net/9p: include trans_common.h to fix missing prototype warning.