aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-08net: hns3: fix for not setting pause parametersFuyun Liang
Pause parameters include source address, transmit gap and pause time. The default value of the pause source address is zero in the hardware. Default pause parameters need to be set to the hardware. Also, when setting new mac address, the pause source address need to be updated. Fixes: 9dc2145d910e ("net: hns3: Add support for PFC setting in TM module") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: add MTU initialization for hardwareFuyun Liang
When initializing the MAC, the MTU vlaue need to be set to the hardware too. Otherwise, the MTU value of software will be different from the MTU value of hardware. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for changing MTUFuyun Liang
when changing MTU, The new MTU must need to be set to netdevice. Fixes: a8e8b7ff3517 ("net: hns3: Add support to change MTU in HNS3 hardware") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for setting MTUFuyun Liang
When setting MTU, actually what we do is configuring the max frame size for the hardware. ETH_HLEN、ETH_FCS_LEN and VLAN_HLEN must need to be considered. And the frame size which is less than the default value should not be set to the hardware. Because in the hardware, the the max frame size not only controls the RX packet size, but also controls the TX packet size. the RX packets whose size are greater than the setting value will be dropped. This patch fixes the bug setting a error max frame size to hardware. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for updating fc_mode_last_timeFuyun Liang
commit a9c782822166 ("net: hns3: add support for set_pauseparam") adds set_pauseparam support for ethtool cmd, but forgets to update fc_mode_last_time when PFC mode is disabled in hclge_cfg_pauseparam(). The wrong fc_mode_last_time will be used to update flow control mode when lldpad has been running. As a result, when using the ethtool command "-a", user will get a wrong pause parameter. This patch adds the fc_mode_last_time update when PFC mode is disabled. Fixes: a9c782822166 ("net: hns3: add support for set_pauseparam") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix a response data read error of tqp statistics queryJian Shen
The result of tqp statistics query was read with an error position, fix it according to the user manual. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Add packet statistics of netdevJian Shen
Add packet statistics of netdev for ethtool -S, in order to show the statistics data for current net device. Remove update_stats() calling because it has been completed in hns3_get_netdev_stats(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Remove a useless member of struct hns3_statsJian Shen
The member "stats_size" of struct hns3_stats is useless, remove it and fix the macro definition which has uses this struct. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix an error macro definition of HNS3_TQP_STATJian Shen
The member "stats_offset" was designed to indicate the offset of each member of struct ring_stats in struct hns3_enet_ring, but forgot to add the offset of the member in struct ring_stats. Fixes: 496d03e960a ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix a loop index error of tqp statistics queryJian Shen
An error loop index was used while querying statistics data of tqps, which may cause call trace. Fixes: 496d03e960ae ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix an error of total drop packet statisticsJian Shen
The dropped tx/rx packets number of each tqp should also be counted into the total drop tx/rx packets numbers. Fixes: 76ad4f0ee74 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Mask the packet statistics query when NIC is downJian Shen
Update the HNS3_NIC_STATE_DOWN bit when NIC state changes. When NIC is down, mask the packet statistics for querying with ifconfig command. It's a common practice. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Modify the update period of packet statisticsJian Shen
It takes more than 200 query response messages between driver and IMP, while updating the packet statistics. It's too heavy for IMP to update it per second. Extend the update period of packet statistics data from 1 second to 300 seconds(if too long, the statistics may overflow). As a result, we need to update it while querying with ifconfig tool to keep the statistics data fresh. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Remove repeat statistic of rx_errorsJian Shen
The igu_rx_err_pkt indicates the same error with mac_rx_fcs_err_pkt_num, so remove it. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix spelling errorsJian Shen
Fix spelling error "overrsize" --> "oversize". Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Unify the strings display of packet statisticsJian Shen
Some members of packet statistics are named in different styles. This patch unifies them with new internal name rules, the main modification are below: trans --> tx rcv --> rx rcb_q%d_tx --> txq#%d rcb_q%d_rx --> rxq#%d sw_err_cnt(tx side) --> tx_dropped sw_err_cnt(rx side) --> rx_dropped pkts --> packets tx_err_cnt --> errors rx_err_cnt --> errors Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Disable VFs change rxvlan offload statusJian Shen
Rxvlan offload status can only be changed by PF. Initialize the value of NETIF_F_HW_VLAN_CTAG_RX bit of hw_features for VFS to false, make sure user can't be able to change it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Add ethtool interface for vlan filterJian Shen
This patch adds vlan filter enable switch to support ethtool -K ethX rx-vlan-filter on/off. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08Merge branch 'net-qualcomm-rmnet-Enable-csum-offloads'David S. Miller
Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Enable csum offloads This series introduces the MAPv4 packet format for checksum offload plus some other minor changes. Patches 1-3 are cleanups. Patch 4 renames the ingress format to data format so that all data formats can be configured using this going forward. Patch 5 uses the pacing helper to improve TCP transmit performance. Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX. A new header and trailer format are used as part of MAPv4. For RX checksum offload, only the 1's complement of the IP payload portion is computed by hardware. The meta data from RX header is used to verify the checksum field in the packet. Note that the IP packet and its field itself is not modified by hardware. This gives metadata to help with the RX checksum. For TX, the required metadata is filled up so hardware can compute the checksum. Patch 10 enables GSO on rmnet devices v1->v2: Fix sparse errors reported by kbuild test robot v2->v3: Update the commit message for Patch 5 based on Eric's comments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Add support for GSOSubash Abhinov Kasiviswanathan
Real devices may support scatter gather(SG), so enable SG on rmnet devices to use GSO. GSO reduces CPU cycles by 20% for a rate of 146Mpbs for a single stream TCP connection. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Add support for TX checksum offloadSubash Abhinov Kasiviswanathan
TX checksum offload applies to TCP / UDP packets which are not fragmented using the MAPv4 checksum trailer. The following needs to be done to have checksum computed in hardware - 1. Set the checksum start offset and inset offset. 2. Set the csum_enabled bit 3. Compute and set 1's complement of partial checksum field in transport header. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Handle command packets with checksum trailerSubash Abhinov Kasiviswanathan
When using the MAPv4 packet format in conjunction with MAP commands, a dummy DL checksum trailer will be appended to the packet. Before this packet is sent out as an ACK, the DL checksum trailer needs to be removed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Add support for RX checksum offloadSubash Abhinov Kasiviswanathan
When using the MAPv4 packet format, receive checksum offload can be enabled in hardware. The checksum computation over pseudo header is not offloaded but the rest of the checksum computation over the payload is offloaded. This applies only for TCP / UDP packets which are not fragmented. rmnet validates the TCP/UDP checksum for the packet using the checksum from the checksum trailer added to the packet by hardware. The validation performed is as following - 1. Perform 1's complement over the checksum value from the trailer 2. Compute 1's complement checksum over IPv4 / IPv6 header and subtracts it from the value from step 1 3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and adds it to the value from step 2 4. Subtracts the checksum value from the TCP / UDP header from the value from step 3. 5. Compares the value from step 4 to the checksum value from the TCP / UDP header. 6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set and the packet is passed on to network stack. If there is a failure, then the packet is passed on as such without modifying the ip_summed field. The checksum field is also checked for UDP checksum 0 as per RFC 768 and for unexpected TCP checksum of 0. If checksum offload is disabled when using MAPv4 packet format in receive path, the packet is queued as is to network stack without the validations above. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Define the MAPv4 packet formatsSubash Abhinov Kasiviswanathan
The MAPv4 packet format adds support for RX / TX checksum offload. For a bi-directional UDP stream at a rate of 570 / 146 Mbps, roughly 10% CPU cycles are saved. For receive path, there is a checksum trailer appended to the end of the MAP packet. The valid field indicates if hardware has computed the checksum. csum_start_offset indicates the offset from the start of the IP header from which hardware has computed checksum. csum_length is the number of bytes over which the checksum was computed and the resulting value is csum_value. In the transmit path, a header is appended between the end of the MAP header and the start of the IP packet. csum_start_offset is the offset in bytes from which hardware will compute the checksum if the csum_enabled bit is set. udp_ip4_ind indicates if the checksum value of 0 is valid or not. csum_insert_offset is the offset from the csum_start_offset where hardware will insert the computed checksum. The use of this additional packet format for checksum offload is explained in subsequent patches. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Set pacing shiftSubash Abhinov Kasiviswanathan
The real device over which the rmnet devices are installed also aggregate multiple IP packets and sends them as a single large aggregate frame to the hardware. This causes degraded throughput for TCP TX due to bufferbloat. To overcome this problem, pacing shift value of 8 is set using the sk_pacing_shift_update() helper. This value was determined based on experiments with a single stream TCP TX using iperf for a duration of 30s. Pacing shift | Observed data rate (Mbps) 10 | 9 9 | 140 8 | 146 (Max link rate) Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Rename ingress data format to data formatSubash Abhinov Kasiviswanathan
This is done so that we can use this field for both ingress and egress flags. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Remove unused function declarationSubash Abhinov Kasiviswanathan
rmnet_map_demultiplex() is only declared but not defined anywhere, so remove it. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Remove invalid condition while stamping mux idSubash Abhinov Kasiviswanathan
rmnet devices cannot have a mux id of 255. This is validated when assigning the mux id to the rmnet devices. As a result, checking for mux id 255 does not apply in egress path. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Remove redundant check when stamping map headerSubash Abhinov Kasiviswanathan
We already check the headroom once in rmnet_map_egress_handler(), so this is not needed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07Merge branch 'ipv6-ipv4-nexthop-align'David S. Miller
Ido Schimmel says: ==================== ipv6: Align nexthop behaviour with IPv4 This set tries to eliminate some differences between IPv4's and IPv6's treatment of nexthops. These differences are most likely a side effect of IPv6's data structures (specifically 'rt6_info') that incorporate both the route and the nexthop and the late addition of ECMP support in commit 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)"). IPv4 and IPv6 do not react the same to certain netdev events. For example, upon carrier change affected IPv4 nexthops are marked using the RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly. IPv6 on the other hand, does nothing which forces us to perform a carrier check during route lookup and dump. This makes it difficult to introduce features such as non-equal-cost multipath that are built on top of this set [1]. In addition, when a netdev is put administratively down IPv4 nexthops are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all the routes using these nexthops. To be consistent with IPv4, multipath routes should only be flushed when all nexthops in the group are considered dead. The first 12 patches introduce non-functional changes that store the RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev events, in a similar fashion to IPv4. This allows us to remove the carrier check performed during route lookup and dump. The next three patches make sure we only flush a multipath route when all of its nexthops are dead. Last three patches add test cases for IPv4/IPv6 FIB. These verify that both address families react similarly to netdev events. Finally, this series also serves as a good first step towards David Ahern's goal of treating nexthops as standalone objects [2], as it makes the code more in line with IPv4 where the nexthop and the nexthop group are separate objects from the route itself. 1. https://github.com/idosch/linux/tree/ipv6-nexthops 2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf Changes since RFC (feedback from David Ahern): * Remove redundant declaration of rt6_ifdown() in patch 4 and adjust comment referencing it accordingly * Drop patch to flush multipath routes upon NETDEV_UNREGISTER. Reword cover letter accordingly * Use a temporary variable to make code more readable in patch 15 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07selftests: fib_tests: Add test cases for netdev carrier changeIdo Schimmel
Check that IPv4 and IPv6 react the same when the carrier of a netdev is toggled. Local routes should not be affected by this, whereas unicast routes should. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07selftests: fib_tests: Add test cases for netdev downIdo Schimmel
Check that IPv4 and IPv6 react the same when a netdev is being put administratively down. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07selftests: fib_tests: Add test cases for IPv4/IPv6 FIBIdo Schimmel
Add test cases to check that IPv4 and IPv6 react to a netdev being unregistered as expected. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Flush multipath routes when all siblings are deadIdo Schimmel
By default, IPv6 deletes nexthops from a multipath route when the nexthop device is put administratively down. This differs from IPv4 where the nexthops are kept, but marked with the RTNH_F_DEAD flag. A multipath route is flushed when all of its nexthops become dead. Align IPv6 with IPv4 and have it conform to the same guidelines. In case the multipath route needs to be flushed, its siblings are flushed one by one. Otherwise, the nexthops are marked with the appropriate flags and the tree walker is instructed to skip all the siblings. As explained in previous patches, care is taken to update the sernum of the affected tree nodes, so as to prevent the use of wrong dst entries. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Take table lock outside of sernum update functionIdo Schimmel
The next patch is going to allow dead routes to remain in the FIB tree in certain situations. When this happens we need to be sure to bump the sernum of the nodes where these are stored so that potential copies cached in sockets are invalidated. The function that performs this update assumes the table lock is not taken when it is invoked, but that will not be the case when it is invoked by the tree walker. Have the function assume the lock is taken and make the single caller take the lock itself. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Export sernum update functionIdo Schimmel
We are going to allow dead routes to stay in the FIB tree (e.g., when they are part of a multipath route, directly connected route with no carrier) and revive them when their nexthop device gains carrier or when it is put administratively up. This is equivalent to the addition of the route to the FIB tree and we should therefore take care of updating the sernum of all the parent nodes of the node where the route is stored. Otherwise, we risk sockets caching and using sub-optimal dst entries. Export the function that performs the above, so that it could be invoked from fib6_ifup() later on. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Teach tree walker to skip multipath routesIdo Schimmel
As explained in previous patch, fib6_ifdown() needs to consider the state of all the sibling routes when a multipath route is traversed. This is done by evaluating all the siblings when the first sibling in a multipath route is traversed. If the multipath route does not need to be flushed (e.g., not all siblings are dead), then we should just skip the multipath route as our work is done. Have the tree walker jump to the last sibling when it is determined that the multipath route needs to be skipped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Add explicit flush indication to routesIdo Schimmel
When routes that are a part of a multipath route are evaluated by fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events the state of their sibling routes is not considered. This will change in subsequent patches in order to align IPv6 with IPv4's behavior. For example, when the last sibling in a multipath route becomes dead, the entire multipath route needs to be removed. To prevent the tree walker from re-evaluating all the sibling routes each time, we can simply evaluate them once - when the first sibling is traversed. If we determine the entire multipath route needs to be removed, then the 'should_flush' bit is set in all the siblings, which will cause the walker to flush them when it traverses them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Report dead flag during route dumpIdo Schimmel
Up until now the RTNH_F_DEAD flag was only reported in route dump when the 'ignore_routes_with_linkdown' sysctl was set. This is expected as dead routes were flushed otherwise. The reliance on this sysctl is going to be removed, so we need to report the flag regardless of the sysctl's value. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Ignore dead routes during lookupIdo Schimmel
Currently, dead routes are only present in the routing tables in case the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are flushed. Subsequent patches are going to remove the reliance on this sysctl and make IPv6 more consistent with IPv4. Before this is done, we need to make sure dead routes are skipped during route lookup, so as to not cause packet loss. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Check nexthop flags in route dump instead of carrierIdo Schimmel
Similar to previous patch, there is no need to check for the carrier of the nexthop device when dumping the route and we can instead check for the presence of the RTNH_F_LINKDOWN flag. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Check nexthop flags during route lookup instead of carrierIdo Schimmel
Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the need to dereference the nexthop device and check its carrier and instead check for the presence of the flag. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Set nexthop flags during route creationIdo Schimmel
It is valid to install routes with a nexthop device that does not have a carrier, so we need to make sure they're marked accordingly. As explained in the previous patch, host and anycast routes are never marked with the 'linkdown' flag. Note that reject routes are unaffected, as these use the loopback device which always has a carrier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Set nexthop flags upon carrier changeIdo Schimmel
Similar to IPv4, when the carrier of a netdev changes we should toggle the 'linkdown' flag on all the nexthops using it as their nexthop device. This will later allow us to test for the presence of this flag during route lookup and dump. Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") host and anycast routes used the loopback netdev as their nexthop device and thus were not marked with the 'linkdown' flag. The patch preserves this behavior and allows one to ping the local address even when the nexthop device does not have a carrier and the 'ignore_routes_with_linkdown' sysctl is set. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Prepare to handle multiple netdev eventsIdo Schimmel
To make IPv6 more in line with IPv4 we need to be able to respond differently to different netdev events. For example, when a netdev is unregistered all the routes using it as their nexthop device should be flushed, whereas when the netdev's carrier changes only the 'linkdown' flag should be toggled. Currently, this is not possible, as the function that traverses the routing tables is not aware of the triggering event. Propagate the triggering event down, so that it could be used in later patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Clear nexthop flags upon netdev upIdo Schimmel
Previous patch marked nexthops with the 'dead' and 'linkdown' flags. Clear these flags when the netdev comes back up. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Mark dead nexthops with appropriate flagsIdo Schimmel
When a netdev is put administratively down or unregistered all the nexthops using it as their nexthop device should be marked with the 'dead' and 'linkdown' flags. Currently, when a route is dumped its nexthop device is tested and the flags are set accordingly. A similar check is performed during route lookup. Instead, we can simply mark the nexthops based on netdev events and avoid checking the netdev's state during route dump and lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Remove redundant route flushing during namespace dismantleIdo Schimmel
By the time fib6_net_exit() is executed all the netdevs in the namespace have been either unregistered or pushed back to the default namespace. That is because pernet subsys operations are always ordered before pernet device operations and therefore invoked after them during namespace dismantle. Thus, all the routing tables in the namespace are empty by the time fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed. This allows us to simplify the condition in fib6_ifdown() as it's only ever called with an actual netdev. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-01-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a start of a framework for extending struct xdp_buff without having the overhead of populating every data at runtime. Idea is to have a new per-queue struct xdp_rxq_info that holds read mostly data (currently that is, queue number and a pointer to the corresponding netdev) which is set up during rxqueue config time. When a XDP program is invoked, struct xdp_buff holds a pointer to struct xdp_rxq_info that the BPF program can then walk. The user facing BPF program that uses struct xdp_md for context can use these members directly, and the verifier rewrites context access transparently by walking the xdp_rxq_info and net_device pointers to load the data, from Jesper. 2) Redo the reporting of offload device information to user space such that it works in combination with network namespaces. The latter is reported through a device/inode tuple as similarly done in other subsystems as well (e.g. perf) in order to identify the namespace. For this to work, ns_get_path() has been generalized such that the namespace can be retrieved not only from a specific task (perf case), but also from a callback where we deduce the netns (ns_common) from a netdevice. bpftool support using the new uapi info and extensive test cases for test_offload.py in BPF selftests have been added as well, from Jakub. 3) Add two bpftool improvements: i) properly report the bpftool version such that it corresponds to the version from the kernel source tree. So pick the right linux/version.h from the source tree instead of the installed one. ii) fix bpftool and also bpf_jit_disasm build with bintutils >= 2.9. The reason for the build breakage is that binutils library changed the function signature to select the disassembler. Given this is needed in multiple tools, add a proper feature detection to the tools/build/features infrastructure, from Roman. 4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the stacktrace map. It is currently unimplemented, but there are use cases where user space needs to walk all stacktrace map entries e.g. for dumping or deleting map entries w/o having to close and recreate the map. Add BPF selftests along with it, from Yonghong. 5) Few follow-up cleanups for the bpftool cgroup code: i) rename the cgroup 'list' command into 'show' as we have it for other subcommands as well, ii) then alias the 'show' command such that 'list' is accepted which is also common practice in iproute2, and iii) remove couple of newlines from error messages using p_err(), from Jakub. 6) Two follow-up cleanups to sockmap code: i) remove the unused bpf_compute_data_end_sk_skb() function and ii) only build the sockmap infrastructure when CONFIG_INET is enabled since it's only aware of TCP sockets at this time, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-06Merge branch 'bpf-stacktrace-map-next-key-support'Daniel Borkmann
Yonghong Song says: ==================== The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map. Patch #1 is the core implementation and Patch #2 implements a bpf test at tools/testing/selftests/bpf directory. Please see individual patch comments for details. Changelog: v1 -> v2: - For invalid key (key pointer is non-NULL), sets next_key to be the first valid key. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>