linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2015-11-16	raw: increment correct SNMP counters for ICMP messages	Ben Cartwright-Cox
	Sending ICMP packets with raw sockets ends up in the SNMP counters logging the type as the first byte of the IPv4 header rather than the ICMP header. This is fixed by adding the IP Header Length to the casting into a icmphdr struct. Signed-off-by: Ben Cartwright-Cox <ben@benjojo.co.uk> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	sfc: constify pci_error_handlers structures	Julia Lawall
	This pci_error_handlers structure is never modified, like all the other pci_error_handlers structures, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: cavium: liquidio: constify pci_error_handlers structures	Julia Lawall
	This pci_error_handlers structure is never modified, like all the other pci_error_handlers structures, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	Driver: Vmxnet3: Fix use of mfTableLen for big endian architectures	Shrikrishna Khare
	Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Reported-by: Masao Uebayashi <uebayasi@gmail.com> Signed-off-by: Bhavesh Davda <bhavesh@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: usb: cdc_ether: add Dell DW5580 as a mobile broadband adapter	Daniele Palmas
	Since Dell DW5580 is a 3G modem, this patch adds the device as a mobile broadband adapter Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: fix __netdev_update_features return on ndo_set_features failure	Nikolay Aleksandrov
	If ndo_set_features fails __netdev_update_features() will return -1 but this is wrong because it is expected to return 0 if no features were changed (see netdev_update_features()), which will cause a netdev notifier to be called without any actual changes. Fix this by returning 0 if ndo_set_features fails. Fixes: 6cb6a27c45ce ("net: Call netdev_features_change() from netdev_update_features()") CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: fix feature changes on devices without ndo_set_features	Nikolay Aleksandrov
	When __netdev_update_features() was updated to ensure some features are disabled on new lower devices, an error was introduced for devices which don't have the ndo_set_features() method set. Before we'll just set the new features, but now we return an error and don't set them. Fix this by returning the old behaviour and setting err to 0 when ndo_set_features is not present. Fixes: e7868a85e1b2 ("net/core: ensure features get disabled on new lower devs") CC: Jarod Wilson <jarod@redhat.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Ido Schimmel <idosch@mellanox.com> CC: Sander Eikelenboom <linux@eikelenboom.it> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Dave Young <dyoung@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	switchdev: bridge: Check return code is not EOPNOTSUPP	Ido Schimmel
	When NET_SWITCHDEV=n, switchdev_port_attr_set simply returns EOPNOTSUPP. In this case we should not emit errors and warnings to the kernel log. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 0bc05d585d38 ("switchdev: allow caller to explicitly request attr_set as deferred") Fixes: 6ac311ae8bfb ("Adding switchdev ageing notification on port bridged") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	be2net: replace hardcoded values with existing define	Ivan Vecera
	Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	be2net: remove unused local rsstable array	Ivan Vecera
	Remove rsstable array and its initialization from be_set_rss_hash_opts(). The array became unused after "e255787 be2net: Support for configurable RSS hash key". The initial RSS table is now filled and stored for later usage during Rx queue creation. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	ravb: Fix int mask value overwritten issue	Masaru Nagai
	When RX/TX interrupt for Network Control queue and Best Effort queue is issued at the same time, the interrupt mask of Network Control queue will be reset when the mask of Best Effort queue is set. This patch fixes this problem. Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: smsc911x: Reset PHY during initialization	Pavel Fedin
	On certain hardware after software reboot the chip may get stuck and fail to reinitialize during reset. This can be fixed by ensuring that PHY is reset too. Old PHY resetting method required operational MDIO interface, therefore the chip should have been already set up. In order to be able to function during probe, it is changed to use PMT_CTRL register. The problem could be observed on SMDK5410 board. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	bpf, arm64: start flushing icache range from header	Daniel Borkmann
	While recently going over ARM64's BPF code, I noticed that the icache range we're flushing should start at header already and not at ctx.image. Reason is that after b569c1c622c5 ("net: bpf: arm64: address randomize and write protect JIT code"), we also want to make sure to flush the random-sized trap in front of the start of the actual program (analogous to x86). No operational differences from user side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	bpf, arm: start flushing icache range from header	Daniel Borkmann
	During review I noticed that the icache range we're flushing should start at header already and not at ctx.image. Reason is that after 55309dd3d4cd ("net: bpf: arm: address randomize and write protect JIT code"), we also want to make sure to flush the random-sized trap in front of the start of the actual program (analogous to x86). No operational differences from user side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	bpf: samples: exclude asm/sysreg.h for arm64	Yang Shi
	commit 338d4f49d6f7114a017d294ccf7374df4f998edc ("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is incompatible with llvm so it will cause BPF samples build failure for ARM64. Since sysreg.h is useless for BPF samples, just exclude it from Makefile via defining __ASM_SYSREG_H. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	arm64: bpf: fix JIT frame pointer setup	Yang Shi
	BPF fp should point to the top of the BPF prog stack. The original implementation made it point to the bottom incorrectly. Move A64_SP to fp before reserve BPF prog stack space. CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: phy: vitesse: add support for VSC8601	Måns Rullgård
	This adds support for the Vitesse VSC8601 PHY. Generic functions are used for everything except interrupt handling. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	net: phy: at803x: support interrupt on 8030 and 8035	Måns Rullgård
	Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt mode" added interrupt support for the 8031 PHY but left out the other two chips supported by this driver. This patch sets the .ack_interrupt and .config_intr functions for the 8030 and 8035 drivers as well. Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16	ip_tunnel: disable preemption when updating per-cpu tstats	Jason A. Donenfeld
	Drivers like vxlan use the recently introduced udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the packet, updates the struct stats using the usual u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats). udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch tstats, so drivers like vxlan, immediately after, call iptunnel_xmit_stats, which does the same thing - calls u64_stats_update_begin/end on this_cpu_ptr(dev->tstats). While vxlan is probably fine (I don't know?), calling a similar function from, say, an unbound workqueue, on a fully preemptable kernel causes real issues: [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6 [ 188.435579] caller is debug_smp_processor_id+0x17/0x20 [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2 [ 188.435607] Call Trace: [ 188.435611] [<ffffffff8234e936>] dump_stack+0x4f/0x7b [ 188.435615] [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0 [ 188.435619] [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20 The solution would be to protect the whole this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with disabling preemption and then reenabling it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	Merge branch 'mv88e6060-fixes'	David S. Miller
	Neil Armstrong says: ==================== net: dsa: mv88e6060: cleanup and fix setup This patchset introduces some fixes and a registers addressing cleanup for the mv88e6060 DSA driver. The first patch removes the poll_link as mv88e6xxx. The 3 following patches fixes the setup in regards of the datasheet. The 2 last patches introduces a clean header and replaces all magic values. v2: cleanup InitReady patch, add missing Acked-by and fix header copyright notice ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: dsa: mv88e6060: replace magic values with register defines	Neil Armstrong
	To align with the mv88e6xxx code, use the register defines to access all the register addresses and bit fields. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: dsa: mv88e6060: add register defines header file	Neil Armstrong
	To align with the mv88e6xxx code, add a similar header file with all the register defines. The file is based on the mv88e6xxx header for coherency. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: dsa: mv88e6060: use the correct bit shift for mac0	Neil Armstrong
	According to the mv88e6060 datasheet, the first mac byte must be at position 9 instead of 8 since the bit 8 is used to select if the mac address must differ for each port for Pause frames. Use the correct shift and set the same mac address for all port. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: dsa: mv88e6060: use the correct MaxFrameSize bit	Neil Armstrong
	According to the mv88e6060 datasheet, the MaxFrameSize bit position is 10 instead of 11 which is reserved. Use the bit correctly to setup max frame size to 1536. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: dsa: mv88e6060: use the correct InitReady bit	Neil Armstrong
	According to the mv88e6060 datasheet, the InitReady bit position is 11 and the polarity is inverted. Use the bit correctly to detect the end of initialization. Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: dsa: mv88e6060: remove poll_link callback	Neil Armstrong
	As of mv88e6xxx remove the poll_link callback since the link state change polling is now handled by the phylib. Tested on a mv88e6060 B0 device with a TI DM816X SoC. Suggested-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	Merge branch 'mellanox-net-fixes'	David S. Miller
	Or Gerlitz says: ==================== Mellanox NIC driver update, Nov 12, 2015 Few small mlx5 and mlx4 fixes from the team... done over net commit c5a3788 "Merge branch 'akpm' (patches from Andrew)" Eran's patch needs to go to 4.2 and 4.3 stable kernels. Tariq's patch need to go to 4.3 stable too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net/mlx4_core: Avoid returning success in case of an error flow	Noa Osherovich
	The err variable wasn't set with the correct error value in some cases. Fixes: 47605df95398 ('mlx4: Modify proxy/tunnel QP mechanism [..]') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters	Eran Ben Elisha
	When cleaning slave's counter resources, we hold a spinlock that protects the slave's counters list. As part of the clean, we call __mlx4_clear_if_stat which calls mlx4_alloc_cmd_mailbox which is a sleepable function. In order to fix this issue, hold the spinlock, and copy all counter indices into a temporary array, and release the spinlock. Afterwards, iterate over this array and free every counter. Repeat this scenario until the original list is empty (a new counter might have been added while releasing the counters from the temporary array). Fixes: b72ca7e96acf ("net/mlx4_core: Reset counters data when freed") Reported-by: Moni Shoua <monis@mellanox.com> Tested-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net/mlx5e: Use the right DMA free function on TX path	Achiad Shochat
	On xmit path we use skb_frag_dma_map() which is using dma_map_page(), while upon completion we dma-unmap the skb fragments using dma_unmap_single() rather than dma_unmap_page(). To fix this, we now save the dma map type on xmit path and use this info to call the right dma unmap method upon TX completion. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net/mlx5e: Max mtu comparison fix	Doron Tsur
	On change mtu the driver compares between hardware queried mtu and software requested mtu. We need to compare between software representation of the queried mtu and the requested mtu. Fixes: facc9699f0fe ('net/mlx5e: Fix HW MTU settings') Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net/mlx5e: Added self loopback prevention	Tariq Toukan
	Prevent outgoing multicast frames from looping back to the RX queue. By introducing new HW capability self_lb_en_modifiable, which indicates the support to modify self_lb_en bit in modify_tir command. When this capability is set we can prevent TIRs from sending back loopback multicast traffic to their own RQs, by "refreshing TIRs" with modify_tir command, on every time new channels (SQs/RQs) are created at device open. This is needed since TIRs are static and only allocated once on driver load, and the loopback decision is under their responsibility. Fixes issues of the kind: "IPv6: eth2: IPv6 duplicate address fe80::e61d:2dff:fe5c:f2e9 detected!" The issue is seen since the IPv6 solicitations multicast messages are loopedback and the network stack thinks they are coming from another host. Fixes: 5c50368f3831 ("net/mlx5e: Light-weight netdev open/stop") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net/mlx5e: Fix inline header size calculation	Saeed Mahameed
	mlx5e_get_inline_hdr_size didn't take into account the vlan insertion into the inline WQE segment. This could lead to max inline violation in cases where skb_headlen(skb) + VLAN_HLEN >= sq->max_inline. Fixes: 3ea4891db8d0 ("net/mlx5e: Fix LSO vlan insertion") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	ipvs: use skb_to_full_sk() helper	Eric Dumazet
	SYNACK packets might be attached to request sockets. Use skb_to_full_sk() helper to avoid illegal accesses to inet_sk(skb->sk) Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	tcp: ensure proper barriers in lockless contexts	Eric Dumazet
	Some functions access TCP sockets without holding a lock and might output non consistent data, depending on compiler and or architecture. tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ... Introduce sk_state_load() and sk_state_store() to fix the issues, and more clearly document where this lack of locking is happening. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	net: thunder: Fix crash upon shutdown after failed probe	Pavel Fedin
	If device probe fails, driver remains bound to the PCI device. However, driver data has been reset to NULL. This causes crash upon dereferencing it in nicvf_remove() Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	sctp: translate host order to network order when setting a hmacid	lucien
	now sctp auth cannot work well when setting a hmacid manually, which is caused by that we didn't use the network order for hmacid, so fix it by adding the transformation in sctp_auth_ep_set_hmacs. even we set hmacid with the network order in userspace, it still can't work, because of this condition in sctp_auth_ep_set_hmacs(): if (id > SCTP_AUTH_HMAC_ID_MAX) return -EOPNOTSUPP; so this wasn't working before and thus it won't break compatibility. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	Merge branch 'packet-fixes'	David S. Miller
	Daniel Borkmann says: ==================== packet fixes Fixes a couple of issues in packet sockets, i.e. on TX ring side. See individual patches for details. v2 -> v3: - First two patches unchanged, kept Jason's Ack - Reworked 3rd patch and split into 3: - check for dev type as discussed with Willem - infer skb->protocol - fix max len for dgram v1 -> v2: - Added patch 2 as suggested by Dave - Rest is unchanged from previous submission ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	packet: fix tpacket_snd max frame len	Daniel Borkmann
	Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW side. When used with SOCK_DGRAM only, the size_max > dev->mtu + reserve check should have reserve as 0, but currently, this is unconditionally set (in it's original form as dev->hard_header_len). I think this is not correct since tpacket_fill_skb() would then take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM, the extra VLAN_HLEN could be possible in both cases. Presumably, the reserve code was copied from packet_snd(), but later on missed the check. Make it similar as we have it in packet_snd(). Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	packet: infer protocol from ethernet header if unset	Daniel Borkmann
	In case no struct sockaddr_ll has been passed to packet socket's sendmsg() when doing a TX_RING flush run, then skb->protocol is set to po->num instead, which is the protocol passed via socket(2)/bind(2). Applications only xmitting can go the path of allocating the socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the TX_RING with sll_protocol of 0. That way, register_prot_hook() is neither called on creation nor on bind time, which saves cycles when there's no interest in capturing anyway. That leaves us however with po->num 0 instead and therefore the TX_RING flush run sets skb->protocol to 0 as well. Eric reported that this leads to problems when using tools like trafgen over bonding device. I.e. the bonding's hash function could invoke the kernel's flow dissector, which depends on skb->protocol being properly set. In the current situation, all the traffic is then directed to a single slave. Fix it up by inferring skb->protocol from the Ethernet header when not set and we have ARPHRD_ETHER device type. This is only done in case of SOCK_RAW and where we have a dev->hard_header_len length. In case of ARPHRD_ETHER devices, this is guaranteed to cover ETH_HLEN, and therefore being accessed on the skb after the skb_store_bits(). Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	packet: only allow extra vlan len on ethernet devices	Daniel Borkmann
	Packet sockets can be used by various net devices and are not really restricted to ARPHRD_ETHER device types. However, when currently checking for the extra 4 bytes that can be transmitted in VLAN case, our assumption is that we generally probe on ARPHRD_ETHER devices. Therefore, before looking into Ethernet header, check the device type first. This also fixes the issue where non-ARPHRD_ETHER devices could have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus the check would test unfilled linear part of the skb (instead of non-linear). Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.") Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	packet: always probe for transport header	Daniel Borkmann
	We concluded that the skb_probe_transport_header() should better be called unconditionally. Avoiding the call into the flow dissector has also not really much to do with the direct xmit mode. While it seems that only virtio_net code makes use of GSO from non RX/TX ring packet socket paths, we should probe for a transport header nevertheless before they hit devices. Reference: http://thread.gmane.org/gmane.linux.network/386173/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	packet: do skb_probe_transport_header when we actually have data	Daniel Borkmann
	In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport header before doing xmit") and later on 40893fd0fd4e ("net: switch to use skb_probe_transport_header()") was probing for a transport header on the skb from a ring buffer slot, but at a time, where the skb has _not even_ been filled with data yet. So that call into the flow dissector is pretty useless. Lets do it after we've set up the skb frags. Fixes: c1aad275b029 ("packet: set transport header before doing xmit") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	tools/net: Use include/uapi with __EXPORTED_HEADERS__	Kamal Mostafa
	Use the local uapi headers to keep in sync with "recently" added #define's (e.g. SKF_AD_VLAN_TPID). Refactored CFLAGS, and bpf_asm doesn't need -I. Fixes: 3f356385e8a4 ("filter: bpf_asm: add minimal bpf asm tool") Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	Merge branch 'ipv6-route-fixes'	David S. Miller
	Martin KaFai Lau says: ==================== ipv6: Fixes for pmtu update and DST_NOCACHE route This patchset fixes: 1. An oops during IPv6 pmtu update on a IPv4 GRE running in an IPSec setup 2. Misc fixes on DST_NOCACHE route ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	ipv6: Check rt->dst.from for the DST_NOCACHE route	Martin KaFai Lau
	All DST_NOCACHE rt6_info used to have rt->dst.from set to its parent. After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"), DST_NOCACHE is also set to rt6_info which does not have a parent (i.e. rt->dst.from is NULL). This patch catches the rt->dst.from == NULL case. Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	ipv6: Check expire on DST_NOCACHE route	Martin KaFai Lau
	Since the expires of the DST_NOCACHE rt can be set during the ip6_rt_update_pmtu(), we also need to consider the expires value when doing ip6_dst_check(). This patches creates __rt6_check_expired() to only check the expire value (if one exists) of the current rt. In rt6_dst_from_check(), it adds __rt6_check_expired() as one of the condition check. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree	Martin KaFai Lau
	The original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1272571 The setup has a IPv4 GRE tunnel running in a IPSec. The bug happens when ndisc starts sending router solicitation at the gre interface. The simplified oops stack is like: __lock_acquire+0x1b2/0x1c30 lock_acquire+0xb9/0x140 _raw_write_lock_bh+0x3f/0x50 __ip6_ins_rt+0x2e/0x60 ip6_ins_rt+0x49/0x50 ~~~~~~~~ __ip6_rt_update_pmtu.part.54+0x145/0x250 ip6_rt_update_pmtu+0x2e/0x40 ~~~~~~~~ ip_tunnel_xmit+0x1f1/0xf40 __gre_xmit+0x7a/0x90 ipgre_xmit+0x15a/0x220 dev_hard_start_xmit+0x2bd/0x480 __dev_queue_xmit+0x696/0x730 dev_queue_xmit+0x10/0x20 neigh_direct_output+0x11/0x20 ip6_finish_output2+0x21f/0x770 ip6_finish_output+0xa7/0x1d0 ip6_output+0x56/0x190 ~~~~~~~~ ndisc_send_skb+0x1d9/0x400 ndisc_send_rs+0x88/0xc0 ~~~~~~~~ The rt passed to ip6_rt_update_pmtu() is created by icmp6_dst_alloc() and it is not managed by the fib6 tree, so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL and it causes the ip6_ins_rt() oops. During pmtu update, we only want to create a RTF_CACHE clone from a rt which is currently managed (or owned) by the fib6 tree. It means either rt->rt6i_node != NULL or rt is a RTF_PCPU clone. It is worth to note that rt6i_table may not be NULL even it is not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()). Hence, rt6i_node is a better check instead of rt6i_table. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	fjes: fix inconsistent indenting	Colin Ian King
	minor change, indenting is one tab out. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15	af-unix: fix use-after-free with concurrent readers while splicing	Hannes Frederic Sowa
	During splicing an af-unix socket to a pipe we have to drop all af-unix socket locks. While doing so we allow another reader to enter unix_stream_read_generic which can read, copy and finally free another skb. If exactly this skb is just in process of being spliced we get a use-after-free report by kasan. First, we must make sure to not have a free while the skb is used during the splice operation. We simply increment its use counter before unlocking the reader lock. Stream sockets have the nice characteristic that we don't care about zero length writes and they never reach the peer socket's queue. That said, we can take the UNIXCB.consumed field as the indicator if the skb was already freed from the socket's receive queue. If the skb was fully consumed after we locked the reader side again we know it has been dropped by a second reader. We indicate a short read to user space and abort the current splice operation. This bug has been found with syzkaller (http://github.com/google/syzkaller) by Dmitry Vyukov. Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>