aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-11-13tcp: limit GSO packets to half cwndEric Dumazet
In DC world, GSO packets initially cooked by tcp_sendmsg() are usually big, as sk_pacing_rate is high. When network is congested, cwnd can be smaller than the GSO packets found in socket write queue. tcp_write_xmit() splits GSO packets using the available cwnd, and we end up sending a single GSO packet, consuming all available cwnd. With GRO aggregation on the receiver, we might handle a single GRO packet, sending back a single ACK. 1) This single ACK might be lost TLP or RTO are forced to attempt a retransmit. 2) This ACK releases a full cwnd, sender sends another big GSO packet, in a ping pong mode. This behavior does not fill the pipes in the best way, because of scheduling artifacts. Make sure we always have at least two GSO packets in flight. This allows us to safely increase GRO efficiency without risking spurious retransmits. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13rhashtable: Drop gfp_flags arg in insert/remove functionsThomas Graf
Reallocation is only required for shrinking and expanding and both rely on a mutex for synchronization and callers of rhashtable_init() are in non atomic context. Therefore, no reason to continue passing allocation hints through the API. Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow for silent fall back to vzalloc() without the OOM killer jumping in as pointed out by Eric Dumazet and Eric W. Biederman. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13Merge branch 'mlx4-next'David S. Miller
Or Gerlitz says: ==================== mlx4: Flexible (asymmetric) allocation of EQs and MSI-X vectors This series from Matan Barak is built as follows: The 1st two patches fix small bugs w.r.t firmware spec. Next are two patches which do more re-factoring of the init/fini flow and a patch that adds support for the QUERY_FUNC firmware command, these are all pre-steps for the major patch of the series. In this patch (#6) we change the order of talking/querying the firmware and enabling SRIOV. This allows to remote worst-case assumption w.r.t the number of available MSI-X vectors and EQs per function. The last patch easily enjoys this ordering change, to enable supports > 64 VFs over a firmware that allows for that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Support more than 64 VFsMatan Barak
We now allow up to 126 VFs. Note though that certain firmware versions only allow up to 80 VFs. Moreover, old HCAs only support 64 VFs. In these cases, we limit the maximum number of VFs to 64. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for ↵Matan Barak
PF/VFs Previously, the driver queried the firmware in order to get the number of supported EQs. Under SRIOV, since this was done before the driver notified the firmware how many VFs it actually needs, the firmware had to take into account a worst case scenario and always allocated four EQs per VF, where one was used for events while the others were used for completions. Now, when the firmware supports the asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X vectors per function. Moreover, when running in the new firmware/driver mode, the limitation that the number of EQs should be a power of two is lifted. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Add QUERY_FUNC firmware commandMatan Barak
QUERY_FUNC firmware command could be used in order to query the number of EQs, reserved EQs, etc for a specific function. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Refactor mlx4_load_oneMatan Barak
Refactor mlx4_load_one, as a preparation step for a new and more complicated load function. The goal is to support both newer firmware that required init_hca to be done before enable_sriov and legacy firmwares that requires things to be done the other way around. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Refactor mlx4_cmd_init and mlx4_cmd_cleanupMatan Barak
Refactoring mlx4_cmd_init and mlx4_cmd_cleanup such that partial init and cleanup are possible. After this refactoring, calling mlx4_cmd_init several times is safe. This is necessary in the VF init flow when mlx4_init_hca returns -EACCESS, we need to issue cleanup and re-attempt to call it with the slave flag. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Use correct variable type for mlx4_slave_capMatan Barak
We've used an incorrect type for the loop counter and the mlx4_QUERY_FUNC_CAP function. The current input modifier is either a port or a boolean. Since the number of ports is always a positive value < 255, we should use u8 instead of an integer with casting. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net/mlx4_core: Fix wrong reading of reserved_eqsMatan Barak
We mistakenly read the reserved_eqs field as a standard numeric value rather than a log2 value. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13Merge branch 'rhash_prove_locking'David S. Miller
Herbert Xu says: ==================== rhashtable: Allow local locks to be used and tested This series moves mutex_is_held entirely under PROVE_LOCKING so there is zero foot print when we're not debugging. More importantly it adds a parrent argument to mutex_is_held so that we can test local locks rather than global ones (e.g., per-namespace locks). ==================== Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13rhashtable: Add parent argument to mutex_is_heldHerbert Xu
Currently mutex_is_held can only test locks in the that are global since it takes no arguments. This prevents rhashtable from being used in places where locks are lock, e.g., per-namespace locks. This patch adds a parent field to mutex_is_held and rhashtable_params so that local locks can be used (and tested). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13rhashtable: Move mutex_is_held under PROVE_LOCKINGHerbert Xu
The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch makes the mutex_is_held field in rhashtable optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13netfilter: Move mutex_is_held under PROVE_LOCKINGHerbert Xu
The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch modifies netfilter so that we can rhashtable.h itself can later make mutex_is_held optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13netlink: Move mutex_is_held under PROVE_LOCKINGHerbert Xu
The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch modifies netlink so that we can rhashtable.h itself can later make mutex_is_held optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net: sh_eth: Add r8a7793 supportHisashi Nakamura
The device tree probing for R-Car M2N (r8a7793) is added. Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net: sh_eth: Add RMII mode setting in probeHisashi Nakamura
When using RMMI mode, it is necessary to change in probe. Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net: generic dev_disable_lro() stacked device handlingMichal Kubeček
Large receive offloading is known to cause problems if received packets are passed to other host. Therefore the kernel disables it by calling dev_disable_lro() whenever a network device is enslaved in a bridge or forwarding is enabled for it (or globally). For virtual devices we need to disable LRO on the underlying physical device (which is actually receiving the packets). Current dev_disable_lro() code handles this propagation for a vlan (including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan. It doesn't handle other stacked devices and their combinations, in particular propagation from a bond to its slaves which often causes problems in virtualization setups. As we now have generic data structures describing the upper-lower device relationship, dev_disable_lro() can be generalized to disable LRO also for all lower devices (if any) once it is disabled for the device itself. For bonding and teaming devices, it is necessary to disable LRO not only on current slaves at the moment when dev_disable_lro() is called but also on any slave (port) added later. v2: use lower device links for all devices (including vlan and macvlan) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13amd-xgbe: fix ->rss_hash_typeDan Carpenter
There was a missing break statement so we set everything to PKT_HASH_TYPE_L3 even when we intended to use PKT_HASH_TYPE_L4. Fixes: 5b9dfe299e55 ('amd-xgbe: Provide support for receive side scaling') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13cxgb4i/cxgb4 : Refactor macros to conform to uniform standardsAnish Bhatt
Refactored all macros used in cxgb4i as part of previously started cxgb4 macro names cleanup. Makes them more uniform and avoids namespace collision. Minor changes in other drivers where required as some of these macros are used by multiple drivers, affected drivers are iw_cxgb4, cxgb4(vf) & csiostor Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13tun: fix issues of iovec iterators using in tun_put_user()Jason Wang
This patch fixes two issues after using iovec iterators: - vlan_offset should be initialized to zero, otherwise unexpected offset will be used in skb_copy_datagram_iter() - advance iovec iterator when vnet_hdr_sz is greater than sizeof(gso), this is the case when mergeable rx buffer were enabled for a virt guest. Fixes e0b46d0ee9c240c7430a47e9b0365674d4a04522 ("tun: Use iovec iterators") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13FOU: Fix no return statement warning for !CONFIG_NET_FOU_IP_TUNNELSThomas Graf
net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’: net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type] Fixes: a8c5f90fb5 ("ip_tunnel: Ops registration for secondary encap (fou, gue)") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: systemport: fix tx work done in TX napi pollFlorian Fainelli
With commit d75b1ade567 ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. bcm_sysport_tx_poll() always returns 0 whether or not we completed the poll quantum. Fix this by returning either 0 when we did complete the TX ring reclaim, or budget to trigger a repoll. Fixes: d75b1ade567 ("net: less interrupt masking in NAPI") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12enic: fix work done in tx napi_pollGovindarajulu Varadarajan
With the commit d75b1ade567 ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. In tx napi poll we always return 0. So tx napi is not called again and we do not clean up the tx ring. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12irda: Fix build failures after IRDA_DEBUG->pr_debugJoe Perches
Fix the build failures that result from the use of pr_debug without the referenced char * arrays being defined. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12hyperv: Add processing of MTU reduced by the hostHaiyang Zhang
If the host uses packet encapsulation feature, the MTU may be reduced by the host due to headroom reservation for encapsulation. This patch handles this new MTU value. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12amd-xgbe: Fix sparse endian warningsLendacky, Thomas
Change the types of the descriptor entries in the xgbe_ring_desc struct from u32 to __le32 to fix endian warnings issued by sparse. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: pxa168_eth: move SET_NETDEV_DEV a bit earlierJisheng Zhang
This is to ensure the net_device's dev.parent is set before we used it in dma_zalloc_coherent() from init_hash_table(). Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12cxgb4: Fix static checker warningHariprasad Shenai
Fix static checker warning that got introduced in commit e2ac9628959cc152 ("cxgb4: Cleanup macros so they follow the same style and look consistent, part 2") due to accidental checkin of bogus line. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12ip_tunnel: Ops registration for secondary encap (fou, gue)Tom Herbert
Instead of calling fou and gue functions directly from ip_tunnel use ops for these that were previously registered. This patch adds the logic to add and remove encapsulation operations for ip_tunnel, and modified fou (and gue) to register with ip_tunnels. This patch also addresses a circular dependency between ip_tunnel and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y and CONFIG_NET_FOU=m. References to fou an gue have been removed from ip_tunnel.c Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12udp: Neaten function pointer calls and add bracesJoe Perches
Standardize function pointer uses. Convert calling style from: (*foo)(args...); to: foo(args...); Other miscellanea: o Add braces around loops with single ifs on multiple lines o Realign arguments around these functions o Invert logic in if to return immediately. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12Merge branch 'r8152-next'David S. Miller
Hayes Wang says: ==================== Code adjustment v3: Remove the test_bit for patch #2. v2: Correct the spelling error for the comment of patch #3. v1: Adjust some codes to make them more reasonable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12r8152: check RTL8152_UNPLUG and netif_running before autoresumehayeswang
If the device is unplugged or !netif_running(), the workqueue doesn't need to wake the device, and could return directly. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12r8152: clear the flag of SCHEDULE_TASKLET in tasklethayeswang
Clear the flag of SCHEDULE_TASKLET in bottom_half() to avoid re-schedule the tasklet again by workqueue. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12r8152: remove the duplicate init for the list of rx_donehayeswang
The INIT_LIST_HEAD(&tp->rx_done) would be done in rtl_start_rx(), so remove the unnecessary one in alloc_all_mem(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12Merge branch 'bcm7xxx-next'David S. Miller
Florian Fainelli says: ==================== net: phy: bcm7xxx: workaround updates This patch series contains some updates to the Broadcom BCM7xxx internal PHY driver, including: - removing an annonying print that would appear during interface up/down and suspend/resume cycles - drop a workaround sequence for a non-production PHY revision - add new workarounds for the latest and greatest PHY devices found out there ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: bcm7xxx: add workaround for PHY revision E0 and F0Florian Fainelli
PHY revisions E0 and F0 share the same shorter workaround initialization sequence. Dedicate a special function for these two PHY revisions to perform the needed workaround sequence. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: bcm7xxx: add PHY revision D0 workaround sequenceFlorian Fainelli
PHY revision D0 requires a specific workaround sequence which needs to be applied to get the HW to behave properly in all corner cases conditions. Do this based on the revision we just read out of the HW using a specific function. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: bcm7xxx: introduce r_rc_cal_reset helperFlorian Fainelli
This function performs a R/RC calibration reset and will start being used by more than one function in the next patches, create a helper function to factor code. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: bcm7xxx: drop A0 revision workaround and fix B0 workaroundFlorian Fainelli
bcm7445_config_init() was working around non-production version of the PHY HW block, so just remove it entirely. bcm7xxx_28nm_afe_config_init() was running for all PHY revisions greater than B0, but this workaround sequence is really specific to the B0 PHY revision, so rename the function accordingly and update the GPHY macro to use the generic config_init callback. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: bcm7xxx: only show PHY revision onceFlorian Fainelli
bcm7xxx_28nm_config_init() can be called as frequently as needed by the PHY library upon suspend/resume cycles and interface bring up/down, just print the PHY revision once and for all in order not to spam kernel logs. Fixes: d8ebfed3f11b ("net: phy: bcm7xxx: utilize PHY revision in config_init") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12irda: Convert IRDA_DEBUG to pr_debugJoe Perches
Use the normal kernel debugging mechanism which also enables dynamic_debug at the same time. Other miscellanea: o Remove sysctl for irda_debug o Remove function tracing like uses (use ftrace instead) o Coalesce formats o Realign arguments o Remove unnecessary OOM messages Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12Merge branch 'micrel-next'David S. Miller
Johan Hovold says: ==================== net: phy: micrel: refactoring and KSZ8081/KSZ8091 features This series cleans up and refactors parts of the micrel PHY driver, and adds support for broadcast-address-disable and led-mode configuration for KSZ8081 and KSZ8091 PHYs. Specifically, this enables dual KSZ8081 setups (which are limited to using address 0 and 3). A follow up series will add device-type abstraction which will allow for further refactoring and shared initialisation code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: enable led-mode for KSZ8081/KSZ8091Johan Hovold
Enable led-mode configuration for KSZ8081 and KSZ8091. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: clean up led-mode setupJohan Hovold
Clean up led-mode setup by introducing proper defines for PHY Control registers 1 and 2 and only passing the register to the setup function. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: refactor led-mode error handlingJohan Hovold
Refactor led-mode error handling. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: add led-mode sanity checkJohan Hovold
Make sure never to update more than two bits when setting the led mode, something which could for example change the reference-clock setting. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: disable broadcast for KSZ8081/KSZ8091Johan Hovold
Disable PHY address 0 as the broadcast address, so that it can be used as a unique (non-broadcast) address on a shared bus. Note that this can also be configured using the B-CAST_OFF pin on KSZ9091, but that KSZ8081 lacks this pin and is also limited to addresses 0 and 3. Specifically, this allows for dual KSZ8081 setups. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: refactor broadcast disableJohan Hovold
Refactor and clean up broadcast disable. Some Micrel PHYs have a broadcast-off bit in the Operation Mode Strap Override register which can be used to disable PHY address 0 as the broadcast address, so that it can be used as a unique (non-broadcast) address on a shared bus. Note that the KSZPHY_OMSO_RMII_OVERRIDE bit is set by default on KSZ8021/8031. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12net: phy: micrel: use BIT macroJohan Hovold
Use BIT macro for bitmask definitions. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>