linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2018-03-29	ipv6: export ip6 fragments sysctl to unprivileged users	Eric Dumazet
	IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment sysctl to unprivileged users") The only sysctl that is not per-netns is not used : ip6frag_secret_interval Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	liquidio: Prioritize control messages	Intiyaz Basha
	During heavy tx traffic, control messages (sent by liquidio driver to NIC firmware) sometimes do not get processed in a timely manner. Reason is: the low-level metadata of control messages and that of egress network packets indicate that they have the same priority. Fix it by setting a higher priority for control messages through the new ctrl_qpg field in the oct_txpciq struct. It is the NIC firmware that does the actual setting of priority by writing to the new ctrl_qpg field; the host driver treats that value as opaque and just assigns it to pki_ih3->qpg Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	Merge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace'	David S. Miller
	David Ahern says: ==================== net: Allow FIB notifiers to fail add and replace I wanted to revisit how resource overload is handled for hardware offload of FIB entries and rules. At the moment, the in-kernel fib notifier can tell a driver about a route or rule add, replace, and delete, but the notifier can not affect the action. Specifically, in the case of mlxsw if a route or rule add is going to overflow the ASIC resources the only recourse is to abort hardware offload. Aborting offload is akin to taking down the switch as the path from data plane to the control plane simply can not support the traffic bandwidth of the front panel ports. Further, the current state of FIB notifiers is inconsistent with other resources where a driver can affect a user request - e.g., enslavement of a port into a bridge or a VRF. As a result of the work done over the past 3+ years, I believe we are at a point where we can bring consistency to the stack and offloads, and reliably allow the FIB notifiers to fail a request, pushing an error along with a suitable error message back to the user. Rather than aborting offload when the switch is out of resources, userspace is simply prevented from adding more routes and has a clear indication of why. This set does not resolve the corner case where rules or routes not supported by the device are installed prior to the driver getting loaded and registering for FIB notifications. In that case, hardware offload has not been established and it can refuse to offload anything, sending errors back to userspace via extack. Since conceptually the driver owns the netdevices associated with its asic, this corner case mainly applies to unsupported rules and any races during the bringup phase. Patch 1 fixes call_fib_notifiers to extract the errno from the encoded response from handlers. Patches 2-5 allow the call to call_fib_notifiers to fail the add or replace of a route or rule. Patch 6 adds a simple resource controller to netdevsim to illustrate how a FIB resource controller can limit the number of route entries. Changes since RFC - correct return code for call_fib_notifier - dropped patch 6 exporting devlink symbols - limited example resource controller to init_net only - updated Kconfig for netdevsim to use MAY_USE_DEVLINK - updated cover letter regarding startup case noted by Ido ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	netdevsim: Add simple FIB resource controller via devlink	David Ahern
	Add devlink support to netdevsim and use it to implement a simple, profile based resource controller. Only one controller is needed per namespace, so the first netdevsim netdevice in a namespace registers with devlink. If that device is deleted, the resource settings are deleted. The resource controller allows a user to limit the number of IPv4 and IPv6 FIB entries and FIB rules. The resource paths are: /IPv4 /IPv4/fib /IPv4/fib-rules /IPv6 /IPv6/fib /IPv6/fib-rules The IPv4 and IPv6 top level resources are unlimited in size and can not be changed. From there, the number of FIB entries and FIB rule entries are unlimited by default. A user can specify a limit for the fib and fib-rules resources: $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96 $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16 $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64 $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16 $ devlink dev reload netdevsim/netdevsim0 such that the number of rules or routes is limited (96 ipv4 routes in the example above): $ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done Error: netdevsim: Exceeded number of supported fib entries. $ devlink resource show netdevsim/netdevsim0 netdevsim/netdevsim0: name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non resources: name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables ... With this template in place for resource management, it is fairly trivial to extend and shows one way to implement a simple counter based resource controller typical of network profiles. Currently, devlink only supports initial namespace. Code is in place to adapt netdevsim to a per namespace controller once the network namespace issues are resolved. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net/ipv6: Move call_fib6_entry_notifiers up for route adds	David Ahern
	Move call to call_fib6_entry_notifiers for new IPv6 routes to right before the insertion into the FIB. At this point notifier handlers can decide the fate of the new route with a clean path to delete the potential new entry if the notifier returns non-0. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net/ipv4: Allow notifier to fail route replace	David Ahern
	Add checking to call to call_fib_entry_notifiers for IPv4 route replace. Allows a notifier handler to fail the replace. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net/ipv4: Move call_fib_entry_notifiers up for new routes	David Ahern
	Move call to call_fib_entry_notifiers for new IPv4 routes to right before the call to fib_insert_alias. At this point the only remaining failure path is memory allocations in fib_insert_node. Handle that very unlikely failure with a call to call_fib_entry_notifiers to tell drivers about it. At this point notifier handlers can decide the fate of the new route with a clean path to delete the potential new entry if the notifier returns non-0. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: Move call_fib_rule_notifiers up in fib_nl_newrule	David Ahern
	Move call_fib_rule_notifiers up in fib_nl_newrule to the point right before the rule is inserted into the list. At this point there are no more failure paths within the core rule code, so if the notifier does not fail then the rule will be inserted into the list. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: Fix fib notifer to return errno	David Ahern
	Notifier handlers use notifier_from_errno to convert any potential error to an encoded format. As a consequence the other side, call_fib_notifier{s} in this case, needs to use notifier_to_errno to return the error from the handler back to its caller. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	Merge tag 'mlx5-updates-2018-03-27' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2018-03-27 (Misc updates & SQ recovery) This series contains Misc updates and cleanups for mlx5e rx path and SQ recovery feature for tx path. From Tariq: (RX updates) - Disable Striding RQ when PCI devices, striding RQ limits the use of CQE compression feature, which is very critical for slow PCI devices performance, in this change we will prefer CQE compression over Striding RQ only on specific "slow" PCIe links. - RX path cleanups - Private flag to enable/disable striding RQ From Eran: (TX fast recovery) - TX timeout logic improvements, fast SQ recovery and TX error reporting if a HW error occurs while transmitting on a specific SQ, the driver will ignore such error and will wait for TX timeout to occur and reset all the rings. Instead, the current series improves the resiliency for such HW errors by detecting TX completions with errors, which will report them and perform a fast recover for the specific faulty SQ even before a TX timeout is detected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	Merge branch 'Introduce-net_rwsem-to-protect-net_namespace_list'	David S. Miller
	Kirill Tkhai says: ==================== Introduce net_rwsem to protect net_namespace_list The series introduces fine grained rw_semaphore, which will be used instead of rtnl_lock() to protect net_namespace_list. This improves scalability and allows to do non-exclusive sleepable iteration for_each_net(), which is enough for most cases. scripts/get_maintainer.pl gives enormous list of people, and I add all to CC. Note, that this patch is independent of "Close race between {un, }register_netdevice_notifier and pernet_operations": https://patchwork.ozlabs.org/project/netdev/list/?series=36495 Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: Remove rtnl_lock() in nf_ct_iterate_destroy()	Kirill Tkhai
	rtnl_lock() doesn't protect net::ct::count, and it's not needed for__nf_ct_unconfirmed_destroy() and for nf_queue_nf_hook_drop(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	ovs: Remove rtnl_lock() from ovs_exit_net()	Kirill Tkhai
	Here we iterate for_each_net() and removes vport from alive net to the exiting net. ovs_net::dps are protected by ovs_mutex(), and the others, who change it (ovs_dp_cmd_new(), __dp_destroy()) also take it. The same with datapath::ports list. So, we remove rtnl_lock() here. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	security: Remove rtnl_lock() in selinux_xfrm_notify_policyload()	Kirill Tkhai
	rt_genid_bump_all() consists of ipv4 and ipv6 part. ipv4 part is incrementing of net::ipv4::rt_genid, and I see many places, where it's read without rtnl_lock(). ipv6 part calls __fib6_clean_all(), and it's also called without rtnl_lock() in other places. So, rtnl_lock() here was used to iterate net_namespace_list only, and we can remove it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: Don't take rtnl_lock() in wireless_nlevent_flush()	Kirill Tkhai
	This function iterates over net_namespace_list and flushes the queue for every of them. What does this rtnl_lock() protects?! Since we may add skbs to net::wext_nlevents without rtnl_lock(), it does not protects us about queuers. It guarantees, two threads can't flush the queue in parallel, that can change the order, but since skb can be queued in any order, it doesn't matter, how many threads do this in parallel. In case of several threads, this will be even faster. So, we can remove rtnl_lock() here, as it was used for iteration over net_namespace_list only. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: Introduce net_rwsem to protect net_namespace_list	Kirill Tkhai
	rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	Merge branch 'net-bgmac-Couple-of-small-bgmac-changes'	David S. Miller
	Florian Fainelli says: ==================== net: bgmac: Couple of small bgmac changes This patch series addresses two minor issues with the bgmac driver: - provides the interface name through /proc/interrupts rather than "bgmac" - makes sure the interrupts are masked during probe, in case the block was not properly reset ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: bgmac: Mask interrupts during probe	Florian Fainelli
	We can have interrupts left enabled form e.g: the bootloader which used the network device for network boot. Make sure we have those disabled as early as possible to avoid spurious interrupts. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: bgmac: Use interface name to request interrupt	Florian Fainelli
	When the system contains several BGMAC adapters, it is nice to be able to tell which one is which by looking at /proc/interrupts. Use the network device name as a name to request_irq() with. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	Merge tag 'rxrpc-next-20180327' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Tracing updates Here are some patches that update tracing in AF_RXRPC and AFS: (1) Add a tracepoint for tracking resend events. (2) Use debug_ids in traces rather than pointers (as pointers are now hashed) and allow use of the same debug_id in AFS calls as in the corresponding AF_RXRPC calls. This makes filtering the trace output much easier. (3) Add a tracepoint for tracking call completion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	net: ethernet: nixge: Add support for National Instruments XGE netdev	Moritz Fischer
	Add support for the National Instruments XGE 1/10G network device. It uses the EEPROM on the board via NVMEM. Signed-off-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	dt-bindings: net: Add bindings for National Instruments XGE netdev	Moritz Fischer
	This adds bindings for the NI XGE 1G/10G network device. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-03-29 1) Remove a redundant pointer initialization esp_input_set_header(). From Colin Ian King. 2) Mark the xfrm kmem_caches as __ro_after_init. From Alexey Dobriyan. 3) Do the checksum for an ipsec offlad packet in software if the device does not advertise NETIF_F_HW_ESP_TX_CSUM. From Shannon Nelson. 4) Use booleans for true and false instead of integers in xfrm_policy_cache_flush(). From Gustavo A. R. Silva Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	net/mlx5e: Recover Send Queue (SQ) from error state	Eran Ben Elisha
	An error TX completion (CQE) which arrived on a specific SQ indicates that this SQ got moved by the hardware to error state, which means all pending and incoming TX requests are dropped or will be dropped and no further "Good" CQEs will be generated for that SQ. Before this patch TX completions (CQEs) were not monitored and were handled as a regular CQE. This caused the SQ to stay in an error state, making it useless for xmiting new packets. Mitigation plan: In case of an error completion, schedule a recovery work which would do the following: - Mark the TXQ as DRV_XOFF to disable new packets to arrive from the stack - NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ consumer/producer indices synced. - Modify the SQ state ERR -> RST -> RDY (restart the SQ). - Reactivate the SQ and reset SQ cc and pc If we identify two consecutive requests for SQ recover in less than 500 msecs, drop the recover request to avoid CPU overload, as this scenario most likely happened due to a severe repeated bug. In addition, add SQ recover SW counter to monitor successful recoveries. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Dump xmit error completions	Eran Ben Elisha
	Monitor and dump xmit error completions. In addition, add err_cqe counter to track the number of error completion per send queue. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	mlx5: Move dump error CQE function out of mlx5_ib for code sharing	Eran Ben Elisha
	Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in order to use it in a downstream patch from mlx5e. In addition, use print_hex_dump instead of manual dumping of the buffer. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	mlx5_{ib,core}: Add query SQ state helper function	Eran Ben Elisha
	Move query SQ state function from mlx5_ib to mlx5_core in order to have it in shared code. It will be used in a downstream patch from mlx5e. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Move all TX timeout logic to be under state lock	Eran Ben Elisha
	Driver callback for handling TX timeout should access some internal resources (SQ, CQ) in order to decide if the tx timeout work should be scheduled. These resources might be unavailable if channels are closed in parallel (ifdown for example). The state lock is the mechanism to protect from such races. Move all TX timeout logic to be in the work under a state lock. In addition, Move the work from the global WQ to mlx5e WQ to make sure this work is flushed when device is detached.. Also, move the mlx5e_tx_timeout_work code to be next to the TX timeout NDO for better code locality. Fixes: 3947ca185999 ("net/mlx5e: Implement ndo_tx_timeout callback") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Remove unused max inline related code	Gal Pressman
	Commit 58d522912ac7 ("net/mlx5e: Support TX packet copy into WQE") introduced the max inline WQE as an ethtool tunable. One commit later, that functionality was made dependent on BlueFlame. Commit 6982ab609768 ("net/mlx5e: Xmit, no write combining") removed BlueFlame support, and with it the max inline WQE. This patch cleans up the leftovers from the removed feature. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Add ethtool priv-flag for Striding RQ	Tariq Toukan
	Add a control private flag in ethtool to enable/disable Striding RQ feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Do not reset Receive Queue params on every type change	Tariq Toukan
	Do not implicit a call to mlx5e_init_rq_type_params() upon every change in RQ type. It should be called only on channels creation. Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Remove rq_headroom field from params	Tariq Toukan
	It can be derived from other params, calculate it via the dedicated function when needed. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Remove RQ MPWQE fields from params	Tariq Toukan
	Introduce functions to calculate them when needed. They can be derived from other params. This will simplify transition between RQ configurations. In general, any parameter that is not explicitly set or controlled, but derived from other parameters, should not have a control-path field itself, but a getter function. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Use no-offset function in skb header copy	Tariq Toukan
	In copying skb header to skb->data, replace the call to skb_copy_to_linear_data_offset() with a zero offset with the call to the no-offset function skb_copy_to_linear_data(). Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Separate dma base address and offset in dma_sync call	Tariq Toukan
	Pass the base dma address and offset to dma_sync_single_range_for_cpu(), instead of doing the pre-calculation. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Remove unused define MLX5_MPWRQ_STRIDES_PER_PAGE	Tariq Toukan
	Clean it up as it's not in use. Fixes: d9d9f156f380 ("net/mlx5e: Expand WQE stride when CQE compression is enabled") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Disable Striding RQ when PCI is slower than link	Tariq Toukan
	We turn the feature off for servers with PCI BW bounded by a threshold (16G) and lower than MAX LINK BW. This improves the effectiveness of CQE compression feature, that is defaulted to ON for the same case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	net/mlx5e: Unify slow PCI heuristic	Tariq Toukan
	Get the link/pci speed query and logic into a single function. Unify the heuristics and use a single PCI threshold (16G) for all. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27	rxrpc: Trace call completion	David Howells
	Add a tracepoint to track rxrpc calls moving into the completed state and to log the completion type and the recorded error value and abort code. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27	rxrpc, afs: Use debug_ids rather than pointers in traces	David Howells
	In rxrpc and afs, use the debug_ids that are monotonically allocated to various objects as they're allocated rather than pointers as kernel pointers are now hashed making them less useful. Further, the debug ids aren't reused anywhere nearly as quickly. In addition, allow kernel services that use rxrpc, such as afs, to take numbers from the rxrpc counter, assign them to their own call struct and pass them in to rxrpc for both client and service calls so that the trace lines for each will have the same ID tag. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27	rxrpc: Trace resend	David Howells
	Add a tracepoint to trace packet resend events and to dump the Tx annotation buffer for added illumination. Signed-off-by: David Howells <dhowells@rdhat.com>
2018-03-27	Merge branch 'sfc-filter-locking'	David S. Miller
	Edward Cree says: ==================== sfc: rework locking around filter management The use of a spinlock to protect filter state combined with the need for a sleeping operation (MCDI) to apply that state to the NIC (on EF10) led to unfixable race conditions, around the handling of filter restoration after an MC reboot. So, this patch series removes the requirement to be able to modify the SW filter table from atomic context, by using a workqueue to request asynchronous filter operations (which are needed for ARFS). Then, the filter table locks are changed to mutexes, replacing the dance of spinlocks and 'busy' flags. Also, a mutex is added to protect the RSS context state, since otherwise a similar race is possible around restoring that after an MC reboot. While we're at it, fix a couple of other related bugs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	sfc: fix flow type handling for RSS filters	Edward Cree
	The FLOW_RSS flag was causing us to insert UDP filters when TCP was wanted. Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	sfc: protect list of RSS contexts under a mutex	Edward Cree
	Otherwise races are possible between ethtool ops and efx_ef10_rx_restore_rss_contexts(). Also, don't try to perform the restore on every reset, only after an MC reboot, otherwise we'll leak RSS contexts on the NIC. Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	sfc: return a better error if filter insertion collides with MC reboot	Edward Cree
	If some other operation gets the MCDI lock ahead of us and performs an MC reboot, then our attempt to insert the filter will fail with EINVAL, because the destination VI (spec->dmaq_id, MC_CMD_FILTER_OP_IN_RX_QUEUE) does not exist. But the caller's request (which might e.g. be an ethtool ntuple request from userland) isn't invalid, it just got unlucky; so return EAGAIN. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	sfc: use a semaphore to lock farch filters too	Edward Cree
	With this change, the spinlock efx->filter_lock is no longer used and is thus removed. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	sfc: give ef10 its own rwsem in the filter table instead of filter_lock	Edward Cree
	efx->filter_lock remains in place for use on farch, but EF10 now ignores it. EFX_EF10_FILTER_FLAG_BUSY is no longer needed, hence it is removed. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	sfc: replace asynchronous filter operations	Edward Cree
	Instead of having an efx->type->filter_rfs_insert() method, just use workitems with a worker function that calls efx->type->filter_insert(). The only user of this is efx_filter_rfs(), which now queues a call to efx_filter_rfs_work(). Similarly, efx_filter_rfs_expire() is now a worker function called on a new channel->filter_work work_struct, so the method efx->type->filter_rfs_expire_one() is no longer called in atomic context. We also add a new mutex efx->rps_mutex to protect the RPS state (efx-> rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so that the taking of efx->filter_lock can be moved to efx->type->filter_rfs_expire_one(). Thus, all filter table functions are now called in a sleepable context, allowing them to use sleeping locks in a future patch. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	Merge branch 'pernet-all-async'	David S. Miller
	Kirill Tkhai says: ==================== Make pernet_operations always read locked All the pernet_operations are converted, and the last one is in this patchset (nfsd_net_ops acked by J. Bruce Fields). So, it's the time to kill pernet_operations::async field, and make setup_net() and cleanup_net() always require the rwsem only read locked. All further pernet_operations have to be developed to fit this rule. Some of previous patches added a comment to struct pernet_operations about that. Also, this patchset renames net_sem to pernet_ops_rwsem to make the target area of the rwsem is more clear visible, and adds more comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27	net: Add more comments	Kirill Tkhai
	This adds comments to different places to improve readability. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>