linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2015-09-01	flow_dissector: Add flag to stop parsing when an IPv6 flow label is seen	Tom Herbert
	Add an input flag to flow dissector on rather dissection should be stopped when a flow label is encountered. Presumably, the flow label is derived from a sufficient hash of an inner transport packet so further dissection is not needed (that is ports are not included in the flow hash). Using the flow label instead of ports has the additional benefit that packet fragments should hash to same value as non-fragments for a flow (assuming that the same flow label is used). We set this flag by default in for skb_get_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flow_dissector: Add flag to stop parsing at L3	Tom Herbert
	Add an input flag to flow dissector on rather dissection should be stopped when an L3 packet is encountered. This would be useful if a caller just wanted to get IP addresses of the outermost header (e.g. to do an L3 hash). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flow_dissector: Support IPv6 fragment header	Tom Herbert
	Parse NEXTHDR_FRAGMENT. When seen account for it in the fragment bits of key_control. Also, check if first fragment should be parsed. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flow_dissector: Add control/reporting of fragmentation	Tom Herbert
	Add an input flag to flow dissector on rather dissection should be attempted on a first fragment. Also add key_control flags to indicate that a packet is a fragment or first fragment. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flow_dissector: Add flags argument to skb_flow_dissector functions	Tom Herbert
	The flags argument will allow control of the dissection process (for instance whether to parse beyond L3). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flow_dissector: Jump to exit code in __skb_flow_dissect	Tom Herbert
	Instead of returning immediately (on a parsing failure for instance) we jump to cleanup code. This always sets protocol values in key_control (even on a failure there is still valid information in the key_tags that was set before the problem was hit). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flowi: Abstract out functions to get flow hash based on flowi	Tom Herbert
	Create __get_hash_from_flowi6 and __get_hash_from_flowi4 to get the flow keys and hash based on flowi structures. These are called by __skb_get_hash_flowi6 and __skb_get_hash_flowi4. Also, created get_hash_from_flowi6 and get_hash_from_flowi4 which can be called when just the hash value for a flowi is needed. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	skbuff: Make __skb_set_sw_hash a general function	Tom Herbert
	Move __skb_set_sw_hash to skbuff.h and add __skb_set_hash which is a common method (between __skb_set_sw_hash and skb_set_hash) to set the hash in an skbuff. Also, move skb_clear_hash to be closer to __skb_set_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	flow_dissector: Move skb related functions to skbuff.h	Tom Herbert
	Move the flow dissector functions that are specific to skbuffs into skbuff.h out of flow_dissector.h. This makes flow_dissector.h have no dependencies on skbuff.h. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	tg3: Fix temperature reporting	Jean Delvare
	The temperature registers appear to report values in degrees Celsius while the hwmon API mandates values to be exposed in millidegrees Celsius. Do the conversion so that the values reported by "sensors" are correct. Fixes: aed93e0bf493 ("tg3: Add hwmon support for temperature") Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Prashant Sreedharan <prashant@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Cc: stable@vger.kernel.org [v3.6+] Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	phylib: fix device deletion order in mdiobus_unregister()	Mark Salter
	commit 8b63ec1837fa ("phylib: Make PHYs children of their MDIO bus, not the bus' parent.") uncovered a problem in mdiobus_unregister() which leads to this warning when I reboot an APM Mustang (arm64) platform: WARNING: CPU: 7 PID: 4239 at fs/sysfs/group.c:224 sysfs_remove_group+0xa0/0xa4() sysfs group fffffe0000e07a10 not found for kobject 'xgene-mii-eth0:03' ... CPU: 7 PID: 4239 Comm: reboot Tainted: G E 4.2.0-0.18.el7.test15.aarch64 #1 Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Aug 26 2015 Call Trace: [<fffffe000009739c>] dump_backtrace+0x0/0x170 [<fffffe000009752c>] show_stack+0x20/0x2c [<fffffe00007436f0>] dump_stack+0x78/0x9c [<fffffe00000c2cb4>] warn_slowpath_common+0xa0/0xd8 [<fffffe00000c2d60>] warn_slowpath_fmt+0x74/0x88 [<fffffe0000293d3c>] sysfs_remove_group+0x9c/0xa4 [<fffffe00004a8bac>] dpm_sysfs_remove+0x5c/0x70 [<fffffe000049b388>] device_del+0x44/0x208 [<fffffe000049b578>] device_unregister+0x2c/0x7c [<fffffe000050dc68>] mdiobus_unregister+0x48/0x94 [<fffffe000052afd0>] xgene_enet_mdio_remove+0x28/0x44 [<fffffe000052d3f0>] xgene_enet_remove+0xd0/0xd8 [<fffffe000052d424>] xgene_enet_shutdown+0x2c/0x3c [<fffffe00004a204c>] platform_drv_shutdown+0x24/0x40 [<fffffe000049d4f4>] device_shutdown+0xf0/0x1b4 [<fffffe00000e31ec>] kernel_restart_prepare+0x40/0x4c [<fffffe00000e32f8>] kernel_restart+0x1c/0x80 [<fffffe00000e3670>] SyS_reboot+0x17c/0x250 The problem is that mdiobus_unregister() deletes the bus device before unregistering the phy devices on the bus. This wasn't a problem before because the phys were not children of the bus: /sys/devices/platform/APMC0D05:00/net/eth0/xgene-mii-eth0:03 /sys/devices/platform/APMC0D05:00/net/eth0/xgene-mii-eth0 But now that they are: /sys/devices/platform/APMC0D05:00/net/eth0/xgene-mii-eth0/xgene-mii-eth0:03 when mdiobus_unregister deletes the bus device, the phy subdirs are removed from sysfs also. So when the phys are unregistered afterward, we get the warning. This patch changes the order so that phys are unregistered before the bus device is deleted. Fixes: 8b63ec1837fa ("phylib: Make PHYs children of their MDIO bus, not the bus' parent.") Signed-off-by: Mark Salter <msalter@redhat.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Mark Langsdorf <mlangsdo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01	net: Make table id type u32	David Ahern
	A number of VRF patches used 'int' for table id. It should be u32 to be consistent with the rest of the stack. Fixes: 4e3c89920cd3a ("net: Introduce VRF related flags and helpers") 15be405eb2ea9 ("net: Add inet_addr lookup by table") 30bbaa1950055 ("net: Fix up inet_addr_type checks") 021dd3b8a142d ("net: Add routes to the table associated with the device") dc028da54ed35 ("inet: Move VRF table lookup to inlined function") f6d3c19274c74 ("net: FIB tracepoints") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	tun_dst: Remove opts_size	Pravin B Shelar
	opts_size is only written and never read. Following patch removes this unused variable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	ipv6: send only one NEWLINK when RA causes changes	Marius Tomaschewski
	Signed-off-by: Marius Tomaschewski <mt@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	gro_cells: remove spinlock protecting receive queues	Eric Dumazet
	As David pointed out, spinlock are no longer needed to protect the per cpu queues used in gro cells infrastructure. Also use new napi_complete_done() API so that gro_flush_timeout tweaks have an effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: qmi_wwan: Sierra Wireless MC73xx -> Sierra Wireless MC7304/MC7354	David Ward
	Other Sierra Wireless MC73xx devices exist, with different USB IDs. Cc: Bjørn Mork <bjorn@mork.no> Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	ipv6: send NEWLINK on RA managed/otherconf changes	Marius Tomaschewski
	The kernel is applying the RA managed/otherconf flags silently and forgets to send ifinfo notify to inform about their change when the router provides a zero reachable_time and retrans_timer as dnsmasq and many routers send it, which just means unspecified by this router and the host should continue using whatever value it is already using. Userspace may monitor the ifinfo notifications to activate dhcpv6. Signed-off-by: Marius Tomaschewski <mt@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	Merge branch 'dsa-port-config'	David S. Miller
	Andrew Lunn says: ==================== DSA port configuration and status This patchset allows various switch port settings to be configured and port status to be sampled. Some of these patches have been posted before. The first three patches provide infrastructure for configuring a switch ports link speed and duplex from a fixed_link phy. Patch four then uses this infrastructure to allow the CPU and DSA ports of a switch to be configured using a fixed-link property in the device tree. Patches five and six allow a phy-mode property to be specified in the device tree, and allow this to be used for configuring RGMII delays. Patches seven through nine allow link status, for example that of an SFP module, to be read from a gpio. Changes since v1: Rewrite 9/9 so that it hopefully does not regression on 868a4215be9a6d80 ("net: phy: fixed_phy: handle link-down case") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: phy: fixed_phy: Set phy capabilities even when link down.	Andrew Lunn
	What features a phy supports is masked in genphy_config_init() by looking at the PHYs BMSR register. If the link is down, fixed_phy_update_regs() will only set the auto- negotiation capable bit in BMSR. Thus genphy_config_init() comes to the conclusion the PHY can only perform 10/Half, and masks out the higher speed features. If however the link it up, BMSR is set to indicate the speed the PHY is capable of auto-negotiating, and genphy_config_init() does not mask out the high speed features. To fix this, when the link is down, have fixed_phy_update_regs() leave the link status, auto-negotiation complete, and link partner capabilities unset, but set all the local capabilities depending on the fixed phy speed. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	phy: fixed_phy: Add gpio to determine link up/down.	Andrew Lunn
	An SFP module may have a link up/down status pin which can be connection to a GPIO line of the host. Add support for reading such an GPIO in the fixed_phy driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	dsa: mv88e6xxx: Don't poll forced interfaces for state changes	Andrew Lunn
	When polling for link status, don't consider ports which have a forced link. Such ports don't monitor their phy or may not even have a phy. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	dsa: mv88e6xxx: Set the RGMII delay based on phy interface	Andrew Lunn
	Some Marvell switches allow the RGMII Rx and Tx clock to be delayed when the port is using RGMII. Have the adjust_link function look at the phy interface type and enable this delay as requested. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: dsa: Allow DSA and CPU ports to have a phy-mode property	Andrew Lunn
	It can be useful for DSA and CPU ports to have a phy-mode property, in particular to specify RGMII delays. Parse the property and set it in the fixed-link phydev. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: dsa: Allow configuration of CPU & DSA port speeds/duplex	Andrew Lunn
	By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer devices port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	phy: fixed_phy: Set supported speed in phydev	Andrew Lunn
	Set the supported field of the phydev to indicate the speed features of the phy. If the phy is never attached to a netdev, but used in an adjust_link() function, the speed will be incorrectly evaluated to 10/half rather than the correct speed/duplex. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	dsa: mv88e6xxx: Allow speed/duplex of port to be configured	Andrew Lunn
	The current code sets user ports to perform auto negotiation using the phy. CPU and DSA ports are configured to full duplex and maximum speed the switch supports. There are however use cases where the CPU has a slower port, and when user ports have SFP modules with fixed speed. In these cases, port settings to be read from a fixed_phy devices. The switch driver then needs to implement the adjust_link op, so the port settings can be set. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: phy: Allow PHY devices to identify themselves as Ethernet switches, etc.	Florian Fainelli
	Some Ethernet MAC drivers using the PHY library require the hardcoding of link parameters when interfaced to a switch device, SFP module, switch to switch port, etc. This has typically lead to various ad-hoc implementations looking like this: - using a "fixed PHY" emulated device, which will provide link indication towards the Ethernet MAC driver and hardware - pretend there is no PHY and hardcode link parameters, ala mv643x_eth Based on that, it is desireable to have the PHY drivers advertise the correct link parameters, just like regular Ethernet PHYs towards their CPU Ethernet MAC drivers, however, Ethernet MAC drivers should be able to tell whether this link should be monitored or not. In the context of an Ethernet switch, SFP module, switch to switch link, we do not need to monitor this link since it should be always up. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	mpls: fix mpls_net_init memory leak	Nikolay Aleksandrov
	Fix a memory leak in the mpls netns init function in case of failure. If register_net_sysctl fails then we need to free the ctl_table. Fixes: 7720c01f3f59 ("mpls: Add a sysctl to control the size of the mpls label table") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: Add tos to validate source tracepoint	David Ahern
	TOS is another key aspect of the lookup passed to fib_validate_source. Add it to the tracepoint. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	lib: move strncpy_from_unsafe() into mm/maccess.c	Alexei Starovoitov
	To fix build errors: kernel/built-in.o: In function `bpf_trace_printk': bpf_trace.c:(.text+0x11a254): undefined reference to `strncpy_from_unsafe' kernel/built-in.o: In function `fetch_memory_string': trace_kprobe.c:(.text+0x11acf8): undefined reference to `strncpy_from_unsafe' move strncpy_from_unsafe() next to probe_kernel_read/write() which use the same memory access style. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 1a6877b9c0c2 ("lib: introduce strncpy_from_unsafe()") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	Merge branch 'per-route-dctcp-receive-side'	David S. Miller
	Daniel Borkmann says: ==================== tcp: receive-side per route dctcp handling Original cover letter: Currently, the following case doesn't use DCTCP, even if it should: - responder has f.e. cubic as system wide default - 'ip route congctl dctcp $src' was set Then, DCTCP is NOT used if a DCTCP sender attempts to connect from a host in the $src range: ECT(0) is set, but listen_sk is not dctcp, so we fail the INET_ECN_is_not_ect sanity check. We also have to examine the dst used for the SYN/ACK reply to make this case work. In order to minimize additional cost, store the 'ecn is must have' information is the dst_features field. The set targets -next instead of -net since this doesn't seem to be a serious bug and to give the change more soak time until it hits linus tree. v1 -> v2: - Addressed Dave's feedback, not exposing any bits to user space - Added patch 3 to reject incorrect configurations - Rest as is, rebased and retested ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	tcp: use dctcp if enabled on the route to the initiator	Daniel Borkmann
	Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	fib, fib6: reject invalid feature bits	Daniel Borkmann
	Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: fib6: reduce identation in ip6_convert_metrics	Daniel Borkmann
	Reduce the identation a bit, there's no need to artificically have it increased. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	net: fib: move metrics parsing to a helper	Florian Westphal
	fib_create_info() is already quite large, so before adding more code to the metrics section move that to a helper, similar to ip6_convert_metrics. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	IGMP: Document igmp_link_local_mcast_reports	Philip Downey
	Document the addition of a new sysctl variable which controls the generation of IGMP reports for link local multicast groups in the 224.0.0.X range. IGMP reports for local multicast groups can now be optionally inhibited by setting the value to zero e.g.: echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports To retain backwards compatibility the previous behaviour is retained by default on system boot or reverted by setting the value back to non-zero. Signed-off-by: Philip Downey <pdowney@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	ip-tunnel: Use API to access tunnel metadata options.	Pravin B Shelar
	Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31	ipv4: fix 32b build	Madalin Bucur
	Address remaining issue after 80ec192. Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	ipv4: Fix 32-bit build.	David S. Miller
	net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64': >> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function) v = (((u64 )bhptr) + offt); ^ net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in net/ipv4/af_inet.c: In function 'snmp_fold_field64': >> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function) res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ >> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field' res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset); ^ net/ipv4/af_inet.c:1455:5: note: declared here u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt) ^ Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	netlink: rx mmap: fix POLLIN condition	Ken-ichirou MATSUZAWA
	Poll() returns immediately after setting the kernel current frame (ring->head) to SKIP from user space even though there is no new frame. And in a case of all frames is VALID, user space program unintensionally sets (only) kernel current frame to UNUSED, then calls poll(), it will not return immediately even though there are VALID frames. To avoid situations like above, I think we need to scan all frames to find VALID frames at poll() like netlink_alloc_skb(), netlink_forward_ring() finding an UNUSED frame at skb allocation. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	Merge branch 'thunderx-features-fixes'	David S. Miller
	Aleksey Makarov says: ==================== net: thunderx: New features and fixes v2: - The unused affinity_mask field of the structure cmp_queue has been deleted. (thanks to David Miller) - The unneeded initializers have been dropped. (thanks to Alexey Klimov) - The commit message "net: thunderx: Rework interrupt handling" has been fixed. (thanks to Alexey Klimov) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: Support for internal loopback mode	Sunil Goutham
	Support for setting VF's corresponding BGX LMAC in internal loopback mode. This mode can be used for verifying basic HW functionality such as packet I/O, RX checksum validation, CQ/RBDR interrupts, stats e.t.c. Useful when DUT has no external network connectivity. 'loopback' mode can be enabled or disabled via ethtool. Note: This feature is not supported when no of VFs enabled are morethan no of physical interfaces i.e active BGX LMACs Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: Support for upto 96 queues for a VF	Sunil Goutham
	This patch adds support for handling multiple qsets assigned to a single VF. There by increasing no of queues from earlier 8 to max no of CPUs in the system i.e 48 queues on a single node and 96 on dual node system. User doesn't have option to assign which Qsets/VFs to be merged. Upon request from VF, PF assigns next free Qsets as secondary qsets. To maintain current behavior no of queues is kept to 8 by default which can be increased via ethtool. If user wants to unbind NICVF driver from a secondary Qset then it should be done after tearing down primary VF's interface. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: Rework interrupt handling	Sunil Goutham
	Rework interrupt handler to avoid checking IRQ affinity of CQ interrupts. Now separate handlers are registered for each IRQ including RBDR. Register interrupt handlers for only those which are being used. Add nicvf_dump_intr_status() and use it in irq handlers. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: Support for HW VLAN stripping	Sunil Goutham
	This patch configures HW to strip 802.1Q header if found in a receiving packet. The stripped VLAN ID and TCI information is passed on to software via CQE_RX. Also sets netdev's 'vlan_features' so that other HW offload features can be used for tagged packets. This offload feature can be enabled or disabled via ethtool. Network stack normally ignores RPS for 802.1Q packets and hence low throughput. With this offload enabled throughput for tagged packets will be almost same as normal packets. Note: This patch doesn't enable HW VLAN insertion for transmit packets. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: Receive hashing HW offload support	Sunil Goutham
	Adding support for receive hashing HW offload by using RSS_ALG and RSS_TAG fields of CQE_RX descriptor. Also removed dependency on minimum receive queue count to configure RSS so that hash is always generated. This hash is used by RPS logic to distribute flows across multiple CPUs. Offload can be disabled via ethtool. Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: mailboxes: remove code duplication	Sunil Goutham
	Use the nicvf_send_msg_to_pf() function in the mailbox code. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: Add receive error stats reporting via ethtool	Sunil Goutham
	Added ethtool support to dump receive packet error statistics reported in CQE. Also made some small fixes Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	net: thunderx: fix MAINTAINERS	Aleksey Makarov
	The liquidio and thunder drivers have different maintainers. Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30	Merge branch 'snmp-stat-aggregation'	David S. Miller
	Raghavendra K T says: ==================== Optimize the snmp stat aggregation for large cpus While creating 1000 containers, perf is showing lot of time spent in snmp_fold_field on a large cpu system. The current patch tries to improve by reordering the statistics gathering. Please note that similar overhead was also reported while creating veth pairs https://lkml.org/lkml/2013/3/19/556 Changes in V4: - remove 'item' variable and use IPSTATS_MIB_MAX to avoid sparse warning (Eric) also remove 'item' parameter (Joe) - add missing memset of padding. Changes in V3: - use memset to initialize temp buffer in leaf function. (David) - use memcpy to copy the buffer data to stat instead of unalign_pu (Joe) - Move buffer definition to leaf function __snmp6_fill_stats64() (Eric) - Changes in V2: - Allocate the stat calculation buffer in stack. (Eric) Setup: 160 cpu (20 core) baremetal powerpc system with 1TB memory 1000 docker containers was created with command docker run -itd ubuntu:15.04 /bin/bash in loop observation: Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) perf data showed, creating veth interfaces resulting in the below code path was taking more time. rtnl_fill_ifinfo -> inet6_fill_link_af -> inet6_fill_ifla6_attrs -> snmp_fold_field proposed idea: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data to of an item (iteratively for around 36 items). The patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Performance of docker creation improved by around more than 2x after the patch. before the patch: ================ 3f45ba571a42e925c4ec4aaee0e48d7610a9ed82a4c931f83324d41822cf6617 real 0m6.836s user 0m0.095s sys 0m0.011s perf record -a docker run -itd ubuntu:15.04 /bin/bash ======================================================= 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock 1.37% docker docker [.] backtrace_qsort 1.31% docker docker [.] strings.FieldsFunc cache-misses: 2.7% after the patch: ============= 9178273e9df399c8290b6c196e4aef9273be2876225f63b14a60cf97eacfafb5 real 0m3.249s user 0m0.088s sys 0m0.020s perf record -a docker run -itd ubuntu:15.04 /bin/bash ======================================================= 10.57% docker docker [.] scanblock 8.37% swapper [kernel.kallsyms] [k] snooze_loop 6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field 6.67% docker [kernel.kallsyms] [k] veth_stats_one 3.96% docker docker [.] runtime_MSpan_Sweep 2.47% docker docker [.] strings.FieldsFunc cache-misses: 1.41 % Please let me know if you have suggestions/comments. Thanks Eric, Joe and David for the comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>