aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/vxlan.c
AgeCommit message (Collapse)Author
2015-01-18netlink: make nlmsg_end() and genlmsg_end() voidJohannes Berg
Contrary to common expectations for an "int" return, these functions return only a positive value -- if used correctly they cannot even return 0 because the message header will necessarily be in the skb. This makes the very common pattern of if (genlmsg_end(...) < 0) { ... } be a whole bunch of dead code. Many places also simply do return nlmsg_end(...); and the caller is expected to deal with it. This also commonly (at least for me) causes errors, because it is very common to write if (my_function(...)) /* error condition */ and if my_function() does "return nlmsg_end()" this is of course wrong. Additionally, there's not a single place in the kernel that actually needs the message length returned, and if anyone needs it later then it'll be very easy to just use skb->len there. Remove this, and make the functions void. This removes a bunch of dead code as described above. The patch adds lines because I did - return nlmsg_end(...); + nlmsg_end(...); + return 0; I could have preserved all the function's return values by returning skb->len, but instead I've audited all the places calling the affected functions and found that none cared. A few places actually compared the return value with <= 0 in dump functionality, but that could just be changed to < 0 with no change in behaviour, so I opted for the more efficient version. One instance of the error I've made numerous times now is also present in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't check for <0 or <=0 and thus broke out of the loop every single time. I've preserved this since it will (I think) have caused the messages to userspace to be formatted differently with just a single message for every SKB returned to userspace. It's possible that this isn't needed for the tools that actually use this, but I don't even know what they are so couldn't test that changing this behaviour would be acceptable. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15vxlan: Only bind to sockets with compatible flags enabledThomas Graf
A VXLAN net_device looking for an appropriate socket may only consider a socket which has a matching set of flags/extensions enabled. If incompatible flags are enabled, return a conflict to have the caller create a distinct socket with distinct port. The OVS VXLAN port is kept unaware of extensions at this point. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15vxlan: Group Policy extensionThomas Graf
Implements supports for the Group Policy VXLAN extension [0] to provide a lightweight and simple security label mechanism across network peers based on VXLAN. The security context and associated metadata is mapped to/from skb->mark. This allows further mapping to a SELinux context using SECMARK, to implement ACLs directly with nftables, iptables, OVS, tc, etc. The group membership is defined by the lower 16 bits of skb->mark, the upper 16 bits are used for flags. SELinux allows to manage label to secure local resources. However, distributed applications require ACLs to implemented across hosts. This is typically achieved by matching on L2-L4 fields to identify the original sending host and process on the receiver. On top of that, netlabel and specifically CIPSO [1] allow to map security contexts to universal labels. However, netlabel and CIPSO are relatively complex. This patch provides a lightweight alternative for overlay network environments with a trusted underlay. No additional control protocol is required. Host 1: Host 2: Group A Group B Group B Group A +-----+ +-------------+ +-------+ +-----+ | lxc | | SELinux CTX | | httpd | | VM | +--+--+ +--+----------+ +---+---+ +--+--+ \---+---/ \----+---/ | | +---+---+ +---+---+ | vxlan | | vxlan | +---+---+ +---+---+ +------------------------------+ Backwards compatibility: A VXLAN-GBP socket can receive standard VXLAN frames and will assign the default group 0x0000 to such frames. A Linux VXLAN socket will drop VXLAN-GBP frames. The extension is therefore disabled by default and needs to be specifically enabled: ip link add [...] type vxlan [...] gbp In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket must run on a separate port number. Examples: iptables: host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200 host2# iptables -I INPUT -m mark --mark 0x200 -j DROP OVS: # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL' # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop' [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy [1] http://lwn.net/Articles/204905/ Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14vxlan: Remote checksum offloadTom Herbert
Add support for remote checksum offload in VXLAN. This uses a reserved bit to indicate that RCO is being done, and uses the low order reserved eight bits of the VNI to hold the start and offset values in a compressed manner. Start is encoded in the low order seven bits of VNI. This is start >> 1 so that the checksum start offset is 0-254 using even values only. Checksum offset (transport checksum field) is indicated in the high order bit in the low order byte of the VNI. If the bit is set, the checksum field is for UDP (so offset = start + 6), else checksum field is for TCP (so offset = start + 16). Only TCP and UDP are supported in this implementation. Remote checksum offload for VXLAN is described in: https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4). With UDP checksums and Remote Checksum Offload IPv4 Client 11.84% CPU utilization Server 12.96% CPU utilization 9197 Mbps IPv6 Client 12.46% CPU utilization Server 14.48% CPU utilization 8963 Mbps With UDP checksums, no remote checksum offload IPv4 Client 15.67% CPU utilization Server 14.83% CPU utilization 9094 Mbps IPv6 Client 16.21% CPU utilization Server 14.32% CPU utilization 9058 Mbps No UDP checksums IPv4 Client 15.03% CPU utilization Server 23.09% CPU utilization 9089 Mbps IPv6 Client 16.18% CPU utilization Server 26.57% CPU utilization 8954 Mbps Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14udp: pass udp_offload struct to UDP gro callbacksTom Herbert
This patch introduces udp_offload_callbacks which has the same GRO functions (but not a GSO function) as offload_callbacks, except there is an argument to a udp_offload struct passed to gro_receive and gro_complete functions. This additional argument can be used to retrieve the per port structure of the encapsulation for use in gro processing (mostly by doing container_of on the structure). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13net: rename vlan_tx_* helpers since "tx" is misleading thereJiri Pirko
The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12vxlan: Improve support for header flagsTom Herbert
This patch cleans up the header flags of VXLAN in anticipation of defining some new ones: - Move header related definitions from vxlan.c to vxlan.h - Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag) - Move check for unknown flags to after we find vxlan_sock, this assumes that some flags may be processed based on tunnel configuration - Add a comment about why the stack treating unknown set flags as an error instead of ignoring them Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02net: Add Transparent Ethernet Bridging GRO support.Jesse Gross
Currently the only tunnel protocol that supports GRO with encapsulated Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer so that it can be used by other tunnel protocols such as GRE and Geneve. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-23vxlan: Fix double free of skb.Pravin B Shelar
In case of error vxlan_xmit_one() can free already freed skb. Also fixes memory leak of dst-entry. Fixes: acbf74a7630 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11Fix race condition between vxlan_sock_add and vxlan_sock_releaseMarcelo Leitner
Currently, when trying to reuse a socket, vxlan_sock_add will grab vn->sock_lock, locate a reusable socket, inc refcount and release vn->sock_lock. But vxlan_sock_release() will first decrement refcount, and then grab that lock. refcnt operations are atomic but as currently we have deferred works which hold vs->refcnt each, this might happen, leading to a use after free (specially after vxlan_igmp_leave): CPU 1 CPU 2 deferred work vxlan_sock_add ... ... spin_lock(&vn->sock_lock) vs = vxlan_find_sock(); vxlan_sock_release dec vs->refcnt, reaches 0 spin_lock(&vn->sock_lock) vxlan_sock_hold(vs), refcnt=1 spin_unlock(&vn->sock_lock) hlist_del_rcu(&vs->hlist); vxlan_notify_del_rx_port(vs) spin_unlock(&vn->sock_lock) So when we look for a reusable socket, we check if it wasn't freed already before reusing it. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Fixes: 7c47cedf43a8b3 ("vxlan: move IGMP join/leave to work queue") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02net: make vid as a parameter for ndo_fdb_add/ndo_fdb_delJiri Pirko
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple u16 vid to drivers from there. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-11-25vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]Alexander Duyck
In "vxlan: Call udp_sock_create" there was a logic error that resulted in the default for IPv6 VXLAN tunnels going from using checksums to not using checksums. Since there is currently no support in iproute2 for setting these values it means that a kernel after the change cannot talk over a IPv6 VXLAN tunnel to a kernel prior the change. Fixes: 3ee64f3 ("vxlan: Call udp_sock_create") Cc: Tom Herbert <therbert@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ieee802154/fakehard.c A bug fix went into 'net' for ieee802154/fakehard.c, which is removed in 'net-next'. Add build fix into the merge from Stephen Rothwell in openvswitch, the logging macros take a new initial 'log' argument, a new call was added in 'net' so when we merge that in here we have to explicitly add the new 'log' arg to it else the build fails. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21vlan: introduce *vlan_hwaccel_push_inside helpersJiri Pirko
Use them to push skb->vlan_tci into the payload and avoid code duplication. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21vlan: rename __vlan_put_tag to vlan_insert_tag_set_protoJiri Pirko
Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18vxlan: Inline vxlan_gso_check().Joe Stringer
Suggested-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14net: Add vxlan_gso_check() helperJoe Stringer
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other UDP-based encapsulation protocols where the format and size of the header differs. This patch implements a generic ndo_gso_check() for VXLAN which will only advertise GSO support when the skb looks like it contains VXLAN (or no UDP tunnelling at all). Implementation shamelessly stolen from Tom Herbert: http://thread.gmane.org/gmane.linux.network/332428/focus=333111 Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/chelsio/cxgb4vf/sge.c drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c sge.c was overlapping two changes, one to use the new __dev_alloc_page() in net-next, and one to use s->fl_pg_order in net. ixgbe_phy.c was a set of overlapping whitespace changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13vxlan: Do not reuse sockets for a different address familyMarcelo Leitner
Currently, we only match against local port number in order to reuse socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will follow. The following steps reproduce it: # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \ srcport 5000 6000 dev eth0 # ip link add vxlan7 type vxlan id 43 group ff0e::110 \ srcport 5000 6000 dev eth0 # ip link set vxlan6 up # ip link set vxlan7 up <panic> [ 4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 ... [ 4.188076] Call Trace: [ 4.188085] [<ffffffff81667c4a>] ? ipv6_sock_mc_join+0x3a/0x630 [ 4.188098] [<ffffffffa05a6ad6>] vxlan_igmp_join+0x66/0xd0 [vxlan] [ 4.188113] [<ffffffff810a3430>] process_one_work+0x220/0x710 [ 4.188125] [<ffffffff810a33c4>] ? process_one_work+0x1b4/0x710 [ 4.188138] [<ffffffff810a3a3b>] worker_thread+0x11b/0x3a0 [ 4.188149] [<ffffffff810a3920>] ? process_one_work+0x710/0x710 So address family must also match in order to reuse a socket. Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete.Jesse Gross
When doing GRO processing for UDP tunnels, we never add SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol is added (such as SKB_GSO_TCPV4). The result is that if the packet is later resegmented we will do GSO but not treat it as a tunnel. This results in UDP fragmentation of the outer header instead of (i.e.) TCP segmentation of the inner header as was originally on the wire. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06vxlan: Fix to enable UDP checksums on interfaceTom Herbert
Add definition to vxlan nla_policy for UDP checksum. This is necessary to enable UDP checksums on VXLAN. In some instances, enabling UDP checksums can improve performance on receive for devices that return legacy checksum-unnecessary for UDP/IP. Also, UDP checksum provides some protection against VNI corruption. Testing: Ran 200 instances of TCP_STREAM and TCP_RR on bnx2x. TCP_STREAM IPv4, without UDP checksums 14.41% TX CPU utilization 25.71% RX CPU utilization 9083.4 Mbps IPv4, with UDP checksums 13.99% TX CPU utilization 13.40% RX CPU utilization 9095.65 Mbps TCP_RR IPv4, without UDP checksums 94.08% TX CPU utilization 156/248/462 90/95/99% latencies 1.12743e+06 tps IPv4, with UDP checksums 94.43% TX CPU utilization 158/250/462 90/95/99% latencies 1.13345e+06 tps Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17vxlan: fix a free after useLi RongQing
pskb_may_pull maybe change skb->data and make eth pointer oboslete, so eth needs to reload Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15vxlan: using pskb_may_pull as early as possibleLi RongQing
pskb_may_pull should be used to check if skb->data has enough space, skb->len can not ensure that. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15vxlan: fix a use after free in vxlan_encap_bypassLi RongQing
when netif_rx() is done, the netif_rx handled skb maybe be freed, and should not be used. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07net: better IFF_XMIT_DST_RELEASE supportEric Dumazet
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01vxlan: Set inner protocol before transmitTom Herbert
Call skb_set_inner_protocol to set inner Ethernet protocol to ETH_P_TEB before transmit. This is needed for GSO with UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23vxlan: Fix bug introduced by commit acbf74a76300Andy Zhou
Commit acbf74a76300 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions." introduced a bug in vxlan_xmit_one() function, causing it to transmit Vxlan packets without proper Vxlan header inserted. The change was not needed in the first place. Revert it. Reported-by: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.Andy Zhou
Simplify vxlan implementation using common UDP tunnel APIs. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01vxlan: Enable checksum unnecessary conversions for vxlan/UDP socketsTom Herbert
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29net: Clarification of CHECKSUM_UNNECESSARYTom Herbert
This patch: - Clarifies the specific requirements of devices returning CHECKSUM_UNNECESSARY (comments in skbuff.h). - Adds csum_level field to skbuff. This is used to express how many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1). This replaces the overloading of skb->encapsulation, that field is is now only used to indicate inner headers are valid. - Change __skb_checksum_validate_needed to "consume" each checksum as indicated by csum_level as layers of the the packet are parsed. - Remove skb_pop_rcv_encapsulation, no longer needed in the new csum_level model. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22vxlan: fix incorrect initializer in union vxlan_addrGerhard Stenzel
The first initializer in the following union vxlan_addr ipa = { .sin.sin_addr.s_addr = tip, .sa.sa_family = AF_INET, }; is optimised away by the compiler, due to the second initializer, therefore initialising .sin.sin_addr.s_addr always to 0. This results in netlink messages indicating a L3 miss never contain the missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions. The problem affects user space programs relying on an IP address being sent as part of a netlink message indicating a L3 miss. Changing .sa.sa_family = AF_INET, to .sin.sin_family = AF_INET, fixes the problem. Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-28neighbour : fix ndm_type type error issueJun Zhao
ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST. NDA_DST is for netlink TLV type, hence it's not right value in this context. Signed-off-by: Jun Zhao <mypopydev@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14vxlan: Call udp_sock_createTom Herbert
In vxlan driver call common function udp_sock_create to create the listener UDP port. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-10bridge: fdb dumping takes a filter deviceJamal Hadi Salim
Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07vxlan: Call udp_flow_src_portTom Herbert
In vxlan and OVS vport-vxlan call common function to get source port for a UDP tunnel. Removed vxlan_src_port since the functionality is now in udp_flow_src_port. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15vxlan: Checksum fixesTom Herbert
Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet header to work properly with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13vxlan: use dev->needed_headroom instead of dev->hard_header_lenCong Wang
When we mirror packets from a vxlan tunnel to other device, the mirror device should see the same packets (that is, without outer header). Because vxlan tunnel sets dev->hard_header_len, tcf_mirred() resets mac header back to outer mac, the mirror device actually sees packets with outer headers Vxlan tunnel should set dev->needed_headroom instead of dev->hard_header_len, like what other ip tunnels do. This fixes the above problem. Cc: "David S. Miller" <davem@davemloft.net> Cc: stephen hemminger <stephen@networkplumber.org> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: Add skb_gro_postpull_rcsum to udp and vxlanTom Herbert
Need to gro_postpull_rcsum for GRO to work with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)Tom Herbert
Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13net: get rid of SET_ETHTOOL_OPSWilfried Klaebe
net: get rid of SET_ETHTOOL_OPS Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone. This does that. Mostly done via coccinelle script: @@ struct ethtool_ops *ops; struct net_device *dev; @@ - SET_ETHTOOL_OPS(dev, ops); + dev->ethtool_ops = ops; Compile tested only, but I'd seriously wonder if this broke anything. Suggested-by: Dave Miller <davem@davemloft.net> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24vxlan: add x-netns supportNicolas Dichtel
This patch allows to switch the netns when packet is encapsulated or decapsulated. The vxlan socket is openned into the i/o netns, ie into the netns where encapsulated packets are received. The socket lookup is done into this netns to find the corresponding vxlan tunnel. After decapsulation, the packet is injecting into the corresponding interface which may stand to another netns. When one of the two netns is removed, the tunnel is destroyed. Configuration example: ip netns add netns1 ip netns exec netns1 ip link set lo up ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 ip link set vxlan10 netns netns1 ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10 ip netns exec netns1 ip link set vxlan10 up Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-23vxlan: ensure to advertise the right fdb remoteNicolas Dichtel
The goal of this patch is to fix rtnelink notification. The main problem was about notification for fdb entry with more than one remote. Before the patch, when a remote was added to an existing fdb entry, the kernel advertised the first remote instead of the added one. Also when a remote was removed from a fdb entry with several remotes, the deleted remote was not advertised. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ipv4: add a sock pointer to dst->output() path.Eric Dumazet
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c922d55 ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03net: vxlan: fix crash when interface is created with no groupMike Rapoport
If the vxlan interface is created without explicit group definition, there are corner cases which may cause kernel panic. For instance, in the following scenario: node A: $ ip link add dev vxlan42 address 2c:c2:60:00:10:20 type vxlan id 42 $ ip addr add dev vxlan42 10.0.0.1/24 $ ip link set up dev vxlan42 $ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02 $ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst <IPv4 address> $ ping 10.0.0.2 node B: $ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42 $ ip addr add dev vxlan42 10.0.0.2/24 $ ip link set up dev vxlan42 $ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20 node B crashes: vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address) vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address) BUG: unable to handle kernel NULL pointer dereference at 0000000000000046 IP: [<ffffffff8143c459>] ip6_route_output+0x58/0x82 PGD 7bd89067 PUD 7bd4e067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221-dirty #154 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000 RIP: 0010:[<ffffffff8143c459>] [<ffffffff8143c459>] ip6_route_output+0x58/0x82 RSP: 0018:ffff88007fd03668 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040 RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754 RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000 R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740 R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0 Stack: 0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740 ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360 ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360 Call Trace: <IRQ> [<ffffffff814320bb>] ip6_dst_lookup_tail+0x2d/0xa4 [<ffffffff814322a5>] ip6_dst_lookup+0x10/0x12 [<ffffffff81323b4e>] vxlan_xmit_one+0x32a/0x68c [<ffffffff814a325a>] ? _raw_spin_unlock_irqrestore+0x12/0x14 [<ffffffff8104c551>] ? lock_timer_base.isra.23+0x26/0x4b [<ffffffff8132451a>] vxlan_xmit+0x66a/0x6a8 [<ffffffff8141a365>] ? ipt_do_table+0x35f/0x37e [<ffffffff81204ba2>] ? selinux_ip_postroute+0x41/0x26e [<ffffffff8139d0c1>] dev_hard_start_xmit+0x2ce/0x3ce [<ffffffff8139d491>] __dev_queue_xmit+0x2d0/0x392 [<ffffffff813b380f>] ? eth_header+0x28/0xb5 [<ffffffff8139d569>] dev_queue_xmit+0xb/0xd [<ffffffff813a5aa6>] neigh_resolve_output+0x134/0x152 [<ffffffff813db741>] ip_finish_output2+0x236/0x299 [<ffffffff813dc074>] ip_finish_output+0x98/0x9d [<ffffffff813dc749>] ip_output+0x62/0x67 [<ffffffff813da9f2>] dst_output+0xf/0x11 [<ffffffff813dc11c>] ip_local_out+0x1b/0x1f [<ffffffff813dcf1b>] ip_send_skb+0x11/0x37 [<ffffffff813dcf70>] ip_push_pending_frames+0x2f/0x33 [<ffffffff813ff732>] icmp_push_reply+0x106/0x115 [<ffffffff813ff9e4>] icmp_reply+0x142/0x164 [<ffffffff813ffb3b>] icmp_echo.part.16+0x46/0x48 [<ffffffff813c1d30>] ? nf_iterate+0x43/0x80 [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813ffb62>] icmp_echo+0x25/0x27 [<ffffffff814005f7>] icmp_rcv+0x1d2/0x20a [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813d810d>] ip_local_deliver_finish+0xd6/0x14f [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53 [<ffffffff813d82bf>] ip_local_deliver+0x4a/0x4f [<ffffffff813d7f7b>] ip_rcv_finish+0x253/0x26a [<ffffffff813d7d28>] ? inet_add_protocol+0x3e/0x3e [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53 [<ffffffff813d856a>] ip_rcv+0x2a6/0x2ec [<ffffffff8139a9a0>] __netif_receive_skb_core+0x43e/0x478 [<ffffffff812a346f>] ? virtqueue_poll+0x16/0x27 [<ffffffff8139aa2f>] __netif_receive_skb+0x55/0x5a [<ffffffff8139aaaa>] process_backlog+0x76/0x12f [<ffffffff8139add8>] net_rx_action+0xa2/0x1ab [<ffffffff81047847>] __do_softirq+0xca/0x1d1 [<ffffffff81047ace>] irq_exit+0x3e/0x85 [<ffffffff8100b98b>] do_IRQ+0xa9/0xc4 [<ffffffff814a37ad>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810378db>] ? native_safe_halt+0x6/0x8 [<ffffffff810110c7>] default_idle+0x9/0xd [<ffffffff81011694>] arch_cpu_idle+0x13/0x1c [<ffffffff8107480d>] cpu_startup_entry+0xbc/0x137 [<ffffffff8102e741>] start_secondary+0x1a0/0x1a5 Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89 RIP [<ffffffff8143c459>] ip6_route_output+0x58/0x82 RSP <ffff88007fd03668> CR2: 0000000000000046 ---[ end trace 4612329caab37efd ]--- When vxlan interface is created without explicit group definition, the default_dst protocol family is initialiazed to AF_UNSPEC and the driver assumes IPv4 configuration. On the other side, the default_dst protocol family is used to differentiate between IPv4 and IPv6 cases and, since, AF_UNSPEC != AF_INET, the processing takes the IPv6 path. Making the IPv4 assumption explicit by settting default_dst protocol family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in snooped fdb entries fixes the corner case crashes. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: Documentation/devicetree/bindings/net/micrel-ks8851.txt net/core/netpoll.c The net/core/netpoll.c conflict is a bug fix in 'net' happening to code which is completely removed in 'net-next'. In micrel-ks8851.txt we simply have overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24vxlan: fix nonfunctional neigh_reduce()David Stevens
The VXLAN neigh_reduce() code is completely non-functional since check-in. Specific errors: 1) The original code drops all packets with a multicast destination address, even though neighbor solicitations are sent to the solicited-node address, a multicast address. The code after this check was never run. 2) The neighbor table lookup used the IPv6 header destination, which is the solicited node address, rather than the target address from the neighbor solicitation. So neighbor lookups would always fail if it got this far. Also for L3MISSes. 3) The code calls ndisc_send_na(), which does a send on the tunnel device. The context for neigh_reduce() is the transmit path, vxlan_xmit(), where the host or a bridge-attached neighbor is trying to transmit a neighbor solicitation. To respond to it, the tunnel endpoint needs to do a *receive* of the appropriate neighbor advertisement. Doing a send, would only try to send the advertisement, encapsulated, to the remote destinations in the fdb -- hosts that definitely did not do the corresponding solicitation. 4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the isrouter flag in the advertisement. This has nothing to do with whether or not the target is a router, and generally won't be set since the tunnel endpoint is bridging, not routing, traffic. The patch below creates a proxy neighbor advertisement to respond to neighbor solicitions as intended, providing proper IPv6 support for neighbor reduction. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18vxlan: fix potential NULL dereference in arp_reduce()David Stevens
This patch fixes a NULL pointer dereference in the event of an skb allocation failure in arp_reduce(). Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26vxlan: remove unused port variable in vxlan_udp_encap_recv()Pablo Neira Ayuso
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>