aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-08-09net: ipv6: lower ndisc notifier priority below addrconfDavid Ahern
ndisc_notify is used to send unsolicited neighbor advertisements (e.g., on a link up). Currently, the ndisc notifier is run before the addrconf notifer which means NA's are not sent for link-local addresses which are added by the addrconf notifier. Fix by lowering the priority of the ndisc notifier. Setting the priority to ADDRCONF_NOTIFY_PRIORITY - 5 means it runs after addrconf and before the route notifier which is ADDRCONF_NOTIFY_PRIORITY - 10. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: call newid/getid without rtnl mutex heldFlorian Westphal
Both functions take nsid_lock and don't rely on rtnl lock. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: add RTNL_FLAG_DOIT_UNLOCKEDFlorian Westphal
Allow callers to tell rtnetlink core that its doit callback should be invoked without holding rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: protect handler table with rcuFlorian Westphal
Note that netlink dumps still acquire rtnl mutex via the netlink dump infrastructure. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: small rtnl lock pushdownFlorian Westphal
instead of rtnl lock/unload at the top level, push it down to the called function. This is just an intermediate step, next commit switches protection of the rtnl_link ops table to rcu, in which case (for dumps) the rtnl lock is acquired only by the netlink dumper infrastructure (current lock/unlock/dump/lock/unlock rtnl sequence becomes rcu lock/rcu unlock/dump). Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: add reference counting to prevent module unload while dump is in ↵Florian Westphal
progress I don't see what prevents rmmod (unregister_all is called) while a dump is active. Even if we'd add rtnl lock/unlock pair to unregister_all (as done here), thats not enough either as rtnl_lock is released right before the dump process starts. So this adds a refcount: * acquire rtnl mutex * bump refcount * release mutex * start the dump ... and make unregister_all remove the callbacks (no new dumps possible) and then wait until refcount is 0. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: make rtnl_register accept a flags parameterFlorian Westphal
This change allows us to later indicate to rtnetlink core that certain doit functions should be called without acquiring rtnl_mutex. This change should have no effect, we simply replace the last (now unused) calcit argument with the new flag. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: call rtnl_calcit directlyFlorian Westphal
There is only a single place in the kernel that regisers the "calcit" callback (to determine min allocation for dumps). This is in rtnetlink.c for PF_UNSPEC RTM_GETLINK. The function that checks for calcit presence at run time will first check the requested family (which will always fail for !PF_UNSPEC as no callsite registers this), then falls back to checking PF_UNSPEC. Therefore we can just check if type is RTM_GETLINK and then do a direct call. Because of fallback to PF_UNSPEC all RTM_GETLINK types used this regardless of family. This has the advantage that we don't need to allocate space for the function pointer for all the other families. A followup patch will drop the calcit function pointer from the rtnl_link callback structure. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf: add BPF_J{LT,LE,SLT,SLE} instructionsDaniel Borkmann
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=), BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that particularly *JLT/*JLE counterparts involving immediates need to be rewritten from e.g. X < [IMM] by swapping arguments into [IMM] > X, meaning the immediate first is required to be loaded into a register Y := [IMM], such that then we can compare with Y > X. Note that the destination operand is always required to be a register. This has the downside of having unnecessarily increased register pressure, meaning complex program would need to spill other registers temporarily to stack in order to obtain an unused register for the [IMM]. Loading to registers will thus also affect state pruning since we need to account for that register use and potentially those registers that had to be spilled/filled again. As a consequence slightly more stack space might have been used due to spilling, and BPF programs are a bit longer due to extra code involving the register load and potentially required spill/fills. Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=) counterparts to the eBPF instruction set. Modifying LLVM to remove the NegateCC() workaround in a PoC patch at [1] and allowing it to also emit the new instructions resulted in cilium's BPF programs that are injected into the fast-path to have a reduced program length in the range of 2-3% (e.g. accumulated main and tail call sections from one of the object file reduced from 4864 to 4729 insns), reduced complexity in the range of 10-30% (e.g. accumulated sections reduced in one of the cases from 116432 to 88428 insns), and reduced stack usage in the range of 1-5% (e.g. accumulated sections from one of the object files reduced from 824 to 784b). The modification for LLVM will be incorporated in a backwards compatible way. Plan is for LLVM to have i) a target specific option to offer a possibility to explicitly enable the extension by the user (as we have with -m target specific extensions today for various CPU insns), and ii) have the kernel checked for presence of the extensions and enable them transparently when the user is selecting more aggressive options such as -march=native in a bpf target context. (Other frontends generating BPF byte code, e.g. ply can probe the kernel directly for its code generation.) [1] https://github.com/borkmann/llvm/tree/bpf-insns Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09sock: fix zerocopy panic in mem accountingWillem de Bruijn
Only call mm_unaccount_pinned_pages when releasing a struct ubuf_info that has initialized its field uarg->mmp. Before this patch, a vhost-net with experimental_zcopytx can crash in mm_unaccount_pinned_pages sock_zerocopy_put skb_zcopy_clear skb_release_data Only sock_zerocopy_alloc initializes this field. Move the unaccount call from generic sock_zerocopy_put to its specific callback sock_zerocopy_callback. Fixes: a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The UDP offload conflict is dealt with by simply taking what is in net-next where we have removed all of the UFO handling code entirely. The TCP conflict was a case of local variables in a function being removed from both net and net-next. In netvsc we had an assignment right next to where a missing set of u64 stats sync object inits were added. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "The pull requests are getting smaller, that's progress I suppose :-) 1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi. 2) Fix remote checksum handling in VXLAN and GUE tunneling drivers, from Koichiro Den. 3) Missing u64_stats_init() calls in several drivers, from Florian Fainelli. 4) TCP can set the congestion window to an invalid ssthresh value after congestion window reductions, from Yuchung Cheng. 5) Fix BPF jit branch generation on s390, from Daniel Borkmann. 6) Correct MIPS ebpf JIT merge, from David Daney. 7) Correct byte order test in BPF test_verifier.c, from Daniel Borkmann. 8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins. 9) Handle SCTP checksums properly in mlx4 driver, from Davide Caratti. 10) We can potentially enter tcp_connect() with a cached route already, due to fastopen, so we have to explicitly invalidate it. 11) skb_warn_bad_offload() can bark in legitimate situations, fix from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) net: avoid skb_warn_bad_offload false positives on UFO qmi_wwan: fix NULL deref on disconnect ppp: fix xmit recursion detection on ppp channels rds: Reintroduce statistics counting tcp: fastopen: tcp_connect() must refresh the route net: sched: set xt_tgchk_param par.net properly in ipt_init_target net: dsa: mediatek: add adjust link support for user ports net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()' hysdn: fix to a race condition in put_log_buffer s390/qeth: fix L3 next-hop in xmit qeth hdr asix: Fix small memory leak in ax88772_unbind() asix: Ensure asix_rx_fixup_info members are all reset asix: Add rx->ax_skb = NULL after usbnet_skb_return() bpf: fix selftest/bpf/test_pkt_md_access on s390x netvsc: fix race on sub channel creation bpf: fix byte order test in test_verifier xgene: Always get clk source, but ignore if it's missing for SGMII ports MIPS: Add missing file for eBPF JIT. bpf, s390: fix build for libbpf and selftest suite ...
2017-08-08net: ipv6: avoid overhead when no custom FIB rules are installedVincent Bernat
If the user hasn't installed any custom rules, don't go through the whole FIB rules layer. This is pretty similar to f4530fa574df (ipv4: Avoid overhead when no custom FIB rules are installed). Using a micro-benchmark module [1], timing ip6_route_output() with get_cycles(), with 40,000 routes in the main routing table, before this patch: min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821 table=254 avgdepth=21.8 maxdepth=39 value │ ┊ count 600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 199 880 │▒▒▒░░░░░░░░░░░░░░░░ 43 1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░ 48 1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░ 43 1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░ 59 2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 50 2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26 2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 31 2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 28 3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17 3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17 3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 11 4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6 4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6 4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 After: min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565 table=254 avgdepth=21.8 maxdepth=39 value │ ┊ count 540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 201 800 │▒▒▒▒▒░░░░░░░░░░░░░░░░ 63 1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░ 68 1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░ 39 1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32 1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32 2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 34 2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 33 2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26 2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 22 3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 At the frequency of the host during the bench (~ 3.7 GHz), this is about a 100 ns difference on the median value. A next step would be to collapse local and main tables, as in 0ddcf43d5d4a (ipv4: FIB Local/MAIN table collapse). [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.c Signed-off-by: Vincent Bernat <vincent@bernat.im> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net: avoid skb_warn_bad_offload false positives on UFOWillem de Bruijn
skb_warn_bad_offload triggers a warning when an skb enters the GSO stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL checksum offload set. Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") observed that SKB_GSO_DODGY producers can trigger the check and that passing those packets through the GSO handlers will fix it up. But, the software UFO handler will set ip_summed to CHECKSUM_NONE. When __skb_gso_segment is called from the receive path, this triggers the warning again. Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On Tx these two are equivalent. On Rx, this better matches the skb state (checksum computed), as CHECKSUM_NONE here means no checksum computed. See also this thread for context: http://patchwork.ozlabs.org/patch/799015/ Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08rds: Reintroduce statistics countingHåkon Bugge
In commit 7e3f2952eeb1 ("rds: don't let RDS shutdown a connection while senders are present"), refilling the receive queue was removed from rds_ib_recv(), along with the increment of s_ib_rx_refill_from_thread. Commit 73ce4317bf98 ("RDS: make sure we post recv buffers") re-introduces filling the receive queue from rds_ib_recv(), but does not add the statistics counter. rds_ib_recv() was later renamed to rds_ib_recv_path(). This commit reintroduces the statistics counting of s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08tcp: fastopen: tcp_connect() must refresh the routeEric Dumazet
With new TCP_FASTOPEN_CONNECT socket option, there is a possibility to call tcp_connect() while socket sk_dst_cache is either NULL or invalid. +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(4, ..., ...) = 0 << sk->sk_dst_cache becomes obsolete, or even set to NULL >> +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 We need to refresh the route otherwise bad things can happen, especially when syzkaller is running on the host :/ Fixes: 19f6d3f3c8422 ("net/tcp-fastopen: Add new API support") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net: sched: set xt_tgchk_param par.net properly in ipt_init_targetXin Long
Now xt_tgchk_param par in ipt_init_target is a local varibale, par.net is not initialized there. Later when xt_check_target calls target's checkentry in which it may access par.net, it would cause kernel panic. Jaroslav found this panic when running: # ip link add TestIface type dummy # tc qd add dev TestIface ingress handle ffff: # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \ action xt -j CONNMARK --set-mark 4 This patch is to pass net param into ipt_init_target and set par.net with it properly in there. v1->v2: As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix it by also passing net_id to __tcf_ipt_init. v2->v3: Missed the fixes tag, so add it. Fixes: ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put") Reported-by: Jaroslav Aster <jaster@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net_sched: get rid of some forward declarationsWANG Cong
If we move up tcf_fill_node() we can get rid of these forward declarations. Also, move down tfilter_notify_chain() to group them together. Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net: dsa: lan9303: Only allocate 3 portsEgil Hjelmeland
Save 2628 bytes on arm eabi by allocate only the required 3 ports. Now that ds->num_ports is correct: In net/dsa/tag_lan9303.c eliminate duplicate LAN9303_MAX_PORTS, use ds->num_ports. (Matching the pattern of other net/dsa/tag_xxx.c files.) Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: switchdev: Remove bridge bypass support from switchdevArkadi Sharshevsky
Currently the bridge port flags, vlans, FDBs and MDBs can be offloaded through the bridge code, making the switchdev's SELF bridge bypass implementation to be redundant. This implies several changes: - No need for dump infra in switchdev, DSA's special case is handled privately. - Remove obj_dump from switchdev_ops. - FDBs are removed from obj_add/del routines, due to the fact that they are offloaded through the bridge notification chain. - The switchdev_port_bridge_xx() and switchdev_port_fdb_xx() functions can be removed. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: bridge: Remove FDB deletion through switchdev objectArkadi Sharshevsky
At this point no driver supports FDB add/del through switchdev object but rather via notification chain, thus, it is removed. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Move FDB dump implementation inside DSAArkadi Sharshevsky
>From all switchdev devices only DSA requires special FDB dump. This is due to lack of ability for syncing the hardware learned FDBs with the bridge. Due to this it is removed from switchdev and moved inside DSA. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Remove redundant MDB dump supportArkadi Sharshevsky
Currently the MDB HW database is synced with the bridge's one, thus, There is no need to support special dump functionality. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Remove support for bypass bridge port attributes/vlan setArkadi Sharshevsky
The bridge port attributes/vlan for DSA devices should be set only from bridge code. Furthermore, The vlans are synced totally with the bridge so there is no need for special dump support. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Add support for querying supported bridge flagsArkadi Sharshevsky
The DSA drivers do not support bridge flags offload. Yet, this attribute should be added in order for the bridge to fail when one tries set a flag on the port, as explained in commit dc0ecabd6231 ("net: switchdev: Add support for querying supported bridge flags by hardware"). Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Move FDB add/del implementation inside DSAArkadi Sharshevsky
Currently DSA uses switchdev's implementation of FDB add/del ndos. This patch moves the implementation inside DSA in order to support the legacy way for static FDB configuration. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Add support for learning FDB through notificationArkadi Sharshevsky
Add support for learning FDB through notification. The driver defers the hardware update via ordered work queue. In case of a successful FDB add a notification is sent back to bridge. In case of hw FDB del failure the static FDB will be deleted from the bridge, thus, the interface is moved to down state in order to indicate inconsistent situation. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Remove switchdev dependency from DSA switch notifier chainArkadi Sharshevsky
Currently, the switchdev objects are embedded inside the DSA notifier info. This patch removes this dependency. This is done as a preparation stage before adding support for learning FDB through the switchdev notification chain. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Remove prepare phase for FDBArkadi Sharshevsky
The prepare phase for FDB add is unneeded because most of DSA devices can have failures during bus transactions (SPI, I2C, etc.), thus, the prepare phase cannot guarantee success of the commit stage. The support for learning FDB through notification chain, which will be introduced in the following patches, will provide the ability to notify back the bridge about successful offload. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: Change DSA slave FDB API to be switchdev independentArkadi Sharshevsky
In order to support FDB add/del to be on a notifier chain the slave API need to be changed to be switchdev independent. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07xfrm: check that cached bundle is still validFlorian Westphal
Quoting Ilan Tayari: 1. Set up a host-to-host IPSec tunnel (or transport, doesn't matter) 2. Ping over IPSec, or do something to populate the pcpu cache 3. Join a MC group, then leave MC group 4. Try to ping again using same CPU as before -> traffic doesn't egress the machine at all Ilan debugged the problem down to the fact that one of the path dsts devices point to lo due to earlier dst_dev_put(). In this case, dst is marked as DEAD and we cannot reuse the bundle. The cache only asserted that the requested policy and that of the cached bundle match, but its not enough - also verify the path is still valid. Fixes: ec30d78c14a813 ("xfrm: add xdst pcpu cache") Reported-by: Ayham Masood <ayhamm@mellanox.com> Tested-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: remove useless args of dsa_slave_createVivien Didelot
dsa_slave_create currently takes 4 arguments while it only needs the related dsa_port and its name. Remove all other arguments. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: remove useless args of dsa_cpu_dsa_setupVivien Didelot
dsa_cpu_dsa_setup currently takes 4 arguments but they are all available from the dsa_port argument. Remove all others. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: dsa: remove useless argument in legacy setupVivien Didelot
dsa_switch_alloc() already assigns ds-dev, which can be used in dsa_switch_setup_one and dsa_cpu_dsa_setups instead of requiring an additional struct device argument. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ipv6: sr: implement several seg6local actionsDavid Lebrun
This patch implements the following seg6local actions. - SEG6_LOCAL_ACTION_END: regular SRH processing. The DA of the packet is updated to the next segment and forwarded accordingly. - SEG6_LOCAL_ACTION_END_X: same as above, except that the packet is forwarded to the specified IPv6 next-hop. - SEG6_LOCAL_ACTION_END_DX6: decapsulate the packet and forward to inner IPv6 packet to the specified IPv6 next-hop. - SEG6_LOCAL_ACTION_END_B6: insert the specified SRH directly after the IPv6 header of the packet. - SEG6_LOCAL_ACTION_END_B6_ENCAP: encapsulate the packet within an outer IPv6 header, containing the specified SRH. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ipv6: sr: add rtnetlink functions for seg6local action parametersDavid Lebrun
This patch adds the necessary functions to parse, fill, and compare seg6local rtnetlink attributes, for all defined action parameters. - The SRH parameter defines an SRH to be inserted or encapsulated. - The TABLE parameter defines the table to use for the route lookup of the next segment or the inner decapsulated packet. - The NH4 parameter defines the IPv4 next-hop for an inner decapsulated IPv4 packet. - The NH6 parameter defines the IPv6 next-hop for the next segment or for an inner decapsulated IPv6 packet - The IIF parameter defines an ingress interface index. - The OIF parameter defines an egress interface index. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ipv6: sr: define core operations for seg6local lightweight tunnelDavid Lebrun
This patch implements a new type of lightweight tunnel named seg6local. A seg6local lwt is defined by a type of action and a set of parameters. The action represents the operation to perform on the packets matching the lwt's route, and is not necessarily an encapsulation. The set of parameters are arguments for the processing function. Each action is defined in a struct seg6_action_desc within seg6_action_table[]. This structure contains the action, mandatory attributes, the processing function, and a static headroom size required by the action. The mandatory attributes are encoded as a bitmask field. The static headroom is set to a non-zero value when the processing function always add a constant number of bytes to the skb (e.g. the header size for encapsulations). To facilitate rtnetlink-related operations such as parsing, fill_encap, and cmp_encap, each type of action parameter is associated to three function pointers, in seg6_action_params[]. All actions defined in seg6_local.h are detailed in [1]. [1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01 Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ipv6: sr: export SRH insertion functionsDavid Lebrun
This patch exports the seg6_do_srh_encap() and seg6_do_srh_inline() functions. It also removes the CONFIG_IPV6_SEG6_INLINE knob that enabled the compilation of seg6_do_srh_inline(). This function is now built-in. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ipv6: sr: allow SRH insertion with arbitrary segments_left valueDavid Lebrun
The seg6_validate_srh() function only allows SRHs whose active segment is the first segment of the path. However, an application may insert an SRH whose active segment is not the first one. Such an application might be for example an SR-aware Virtual Network Function. This patch enables to insert SRHs with an arbitrary active segment. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net_sched: use void pointer for filter handleWANG Cong
Now we use 'unsigned long fh' as a pointer in every place, it is safe to convert it to a void pointer now. This gets rid of many casts to pointer. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net_sched: refactor notification code for RTM_DELTFILTERWANG Cong
It is confusing to use 'unsigned long fh' as both a handle and a pointer, especially commit 9ee7837449b3 ("net sched filters: fix notification of filter delete with proper handle"). This patch introduces tfilter_del_notify() so that we can pass it as a pointer as before, and we don't need to check RTM_DELTFILTER in tcf_fill_node() any more. This prepares for the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPLRoopa Prabhu
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv6: add second dif to raw socket lookupsDavid Ahern
Add a second device index, sdif, to raw socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv6: add second dif to inet6 socket lookupsDavid Ahern
Add a second device index, sdif, to inet6 socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the ingress index is obtained from IPCB using inet_sdif and after tcp_v4_rcv tcp_v4_sdif is used. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv6: add second dif to udp socket lookupsDavid Ahern
Add a second device index, sdif, to udp socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Early demux lookups are handled in the next patch as part of INET_MATCH changes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to multicast source filterDavid Ahern
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to raw socket lookupsDavid Ahern
Add a second device index, sdif, to raw socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to inet socket lookupsDavid Ahern
Add a second device index, sdif, to inet socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the ingress index is obtained from IPCB using inet_sdif and after the cb move in tcp_v4_rcv the tcp_v4_sdif helper is used. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to udp socket lookupsDavid Ahern
Add a second device index, sdif, to udp socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Early demux lookups are handled in the next patch as part of INET_MATCH changes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge tag 'mlx5-shared-2017-08-07' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-shared-2017-08-07 This series includes some mlx5 updates for both net-next and rdma trees. From Saeed, Core driver updates to allow selectively building the driver with or without some large driver components, such as - E-Switch (Ethernet SRIOV support). - Multi-Physical Function Switch (MPFs) support. For that we split E-Switch and MPFs functionalities into separate files. From Erez, Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration is taking place and until it completes. From Rabie, Increase the maximum supported flow counters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>