aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Simple overlapping changes in bpf land wrt. bpf_helper_defs.h handling. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix big endian overflow in nf_flow_table, from Arnd Bergmann. 2) Fix port selection on big endian in nft_tproxy, from Phil Sutter. 3) Fix precision tracking for unbound scalars in bpf verifier, from Daniel Borkmann. 4) Fix integer overflow in socket rcvbuf check in UDP, from Antonio Messina. 5) Do not perform a neigh confirmation during a pmtu update over a tunnel, from Hangbin Liu. 6) Fix DMA mapping leak in dpaa_eth driver, from Madalin Bucur. 7) Various PTP fixes for sja1105 dsa driver, from Vladimir Oltean. 8) Add missing to dummy definition of of_mdiobus_child_is_phy(), from Geert Uytterhoeven * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() net/sched: add delete_empty() to filters and use it in cls_flower tcp: Fix highest_sack and highest_sack_seq ptp: fix the race between the release of ptp_clock and cdev net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S Documentation: net: dsa: sja1105: Remove text about taprio base-time limitation net: dsa: sja1105: Remove restriction of zero base-time for taprio offload net: dsa: sja1105: Really make the PTP command read-write net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slot cxgb4/cxgb4vf: fix flow control display for auto negotiation mlxsw: spectrum: Use dedicated policer for VRRP packets mlxsw: spectrum_router: Skip loopback RIFs during MAC validation net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device net_sched: sch_fq: properly set sk->sk_pacing_status bnx2x: Fix accounting of vlan resources among the PFs bnx2x: Use appropriate define for vlan credit of: mdio: Add missing inline to of_mdiobus_child_is_phy() dummy net: phy: aquantia: add suspend / resume ops for AQR105 dpaa_eth: fix DMA mapping leak ...
2019-12-31Merge tag 'tomoyo-fixes-for-5.5' of ↵Linus Torvalds
git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 Pull tomoyo fixes from Tetsuo Handa: "Two bug fixes: - Suppress RCU warning at list_for_each_entry_rcu() - Don't use fancy names on sockets" * tag 'tomoyo-fixes-for-5.5' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: tomoyo: Suppress RCU warning at list_for_each_entry_rcu(). tomoyo: Don't use nifty names on sockets.
2019-12-30hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename()Taehee Yoo
hsr slave interfaces don't have debugfs directory. So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name is changed. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 ip link set dummy0 name ap Splat looks like: [21071.899367][T22666] ap: renamed from dummy0 [21071.914005][T22666] ================================================================== [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666 [21071.926941][T22666] [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240 [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [21071.935094][T22666] Call Trace: [21071.935867][T22666] dump_stack+0x96/0xdb [21071.936687][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.937774][T22666] print_address_description.constprop.5+0x1be/0x360 [21071.939019][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.940081][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.940949][T22666] __kasan_report+0x12a/0x16f [21071.941758][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.942674][T22666] kasan_report+0xe/0x20 [21071.943325][T22666] hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.944187][T22666] hsr_netdev_notify+0x1fe/0x9b0 [hsr] [21071.945052][T22666] ? __module_text_address+0x13/0x140 [21071.945897][T22666] notifier_call_chain+0x90/0x160 [21071.946743][T22666] dev_change_name+0x419/0x840 [21071.947496][T22666] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [21071.948600][T22666] ? netdev_adjacent_rename_links+0x280/0x280 [21071.949577][T22666] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [21071.950672][T22666] ? lock_downgrade+0x6e0/0x6e0 [21071.951345][T22666] ? do_setlink+0x811/0x2ef0 [21071.951991][T22666] do_setlink+0x811/0x2ef0 [21071.952613][T22666] ? is_bpf_text_address+0x81/0xe0 [ ... ] Reported-by: syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com Fixes: 4c2d5e33dcd3 ("hsr: rename debugfs file when interface name is changed") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net/sched: add delete_empty() to filters and use it in cls_flowerDavide Caratti
Revert "net/sched: cls_u32: fix refcount leak in the error path of u32_change()", and fix the u32 refcount leak in a more generic way that preserves the semantic of rule dumping. On tc filters that don't support lockless insertion/removal, there is no need to guard against concurrent insertion when a removal is in progress. Therefore, for most of them we can avoid a full walk() when deleting, and just decrease the refcount, like it was done on older Linux kernels. This fixes situations where walk() was wrongly detecting a non-empty filter, like it happened with cls_u32 in the error path of change(), thus leading to failures in the following tdc selftests: 6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev 6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle 74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id On cls_flower, and on (future) lockless filters, this check is necessary: move all the check_empty() logic in a callback so that each filter can have its own implementation. For cls_flower, it's sufficient to check if no IDRs have been allocated. This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620. Changes since v1: - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED is used, thanks to Vlad Buslov - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov - squash revert and new fix in a single patch, to be nice with bisect tests that run tdc on u32 filter, thanks to Dave Miller Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()") Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Suggested-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net/ncsi: Fix gma flag setting after responseVijay Khemka
gma_flag was set at the time of GMA command request but it should only be set after getting successful response. Movinng this flag setting in GMA response handler. This flag is used mainly for not repeating GMA command once received MAC address. Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30sctp: add enabled check for path tracepoint loop.Kevin Kou
sctp_outq_sack is the main function handles SACK, it is called very frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack" added below code to this function, sctp tracepoint is disabled most of time, but the loop of transport list will be always called even though the tracepoint is disabled, this is unnecessary. + /* SCTP path tracepoint for congestion control debugging. */ + list_for_each_entry(transport, transport_list, transports) { + trace_sctp_probe_path(transport, asoc); + } This patch is to add tracepoint enabled check at outside of the loop of transport list, and avoid traversing the loop when trace is disabled, it is a small optimization. Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30Merge branch 'Improvements-to-SJA1105-DSA-RX-timestamping'David S. Miller
Vladimir Oltean says: ==================== Improvements to SJA1105 DSA RX timestamping This series makes the sja1105 DSA driver use a dedicated kernel thread for RX timestamping, a process which is time-sensitive and otherwise a bit fragile. This allows users to customize their system (probabil an embedded PTP switch) fully and allocate the CPU bandwidth for the driver to expedite the RX timestamps as quickly as possible. While doing this conversion, add a function to the PTP core for cancelling this kernel thread (function which I found rather strange to be missing). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Empty the RX timestamping queue on PTP settings changeVladimir Oltean
When disabling PTP timestamping, don't reset the switch with the new static config until all existing PTP frames have been timestamped on the RX path or dropped. There's nothing we can do with these afterwards. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Use PTP core's dedicated kernel thread for RX timestampingVladimir Oltean
And move the queue of skb's waiting for RX timestamps into the ptp_data structure, since it isn't needed if PTP is not compiled. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30ptp: introduce ptp_cancel_worker_syncVladimir Oltean
In order to effectively use the PTP kernel thread for tasks such as timestamping packets, allow the user control over stopping it, which is needed e.g. when the timestamping queues must be drained. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30tcp: Fix highest_sack and highest_sack_seqCambda Zhu
>From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about setting tp->highest_sack to the head of the send queue was removed. Of course the logic is error prone, but it is logical. Before we remove the pointer to the highest sack skb and use the seq instead, we need to set tp->highest_sack to NULL when there is no skb after the last sack, and then replace NULL with the real skb when new skb inserted into the rtx queue, because the NULL means the highest sack seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent, the next ACK with sack option will increase tp->reordering unexpectedly. This patch sets tp->highest_sack to the tail of the rtx queue if it's NULL and new data is sent. The patch keeps the rule that the highest_sack can only be maintained by sack processing, except for this only case. Fixes: 50895b9de1d3 ("tcp: highest_sack fix") Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30ptp: fix the race between the release of ptp_clock and cdevVladis Dronov
In a case when a ptp chardev (like /dev/ptp0) is open but an underlying device is removed, closing this file leads to a race. This reproduces easily in a kvm virtual machine: ts# cat openptp0.c int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } ts# uname -r 5.5.0-rc3-46cf053e ts# cat /proc/cmdline ... slub_debug=FZP ts# modprobe ptp_kvm ts# ./openptp0 & [1] 670 opened /dev/ptp0, sleeping 10s... ts# rmmod ptp_kvm ts# ls /dev/ptp* ls: cannot access '/dev/ptp*': No such file or directory ts# ...woken up [ 48.010809] general protection fault: 0000 [#1] SMP [ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25 [ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80 [ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202 [ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0 [ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b [ 48.019470] ... ^^^ a slub poison [ 48.023854] Call Trace: [ 48.024050] __fput+0x21f/0x240 [ 48.024288] task_work_run+0x79/0x90 [ 48.024555] do_exit+0x2af/0xab0 [ 48.024799] ? vfs_write+0x16a/0x190 [ 48.025082] do_group_exit+0x35/0x90 [ 48.025387] __x64_sys_exit_group+0xf/0x10 [ 48.025737] do_syscall_64+0x3d/0x130 [ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.026479] RIP: 0033:0x7f53b12082f6 [ 48.026792] ... [ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm] [ 48.045001] Fixing recursive fault but reboot is needed! This happens in: static void __fput(struct file *file) { ... if (file->f_op->release) file->f_op->release(inode, file); <<< cdev is kfree'd here if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); <<< cdev fields are accessed here Namely: __fput() posix_clock_release() kref_put(&clk->kref, delete_clock) <<< the last reference delete_clock() delete_ptp_clock() kfree(ptp) <<< cdev is embedded in ptp cdev_put module_put(p->owner) <<< *p is kfree'd, bang! Here cdev is embedded in posix_clock which is embedded in ptp_clock. The race happens because ptp_clock's lifetime is controlled by two refcounts: kref and cdev.kobj in posix_clock. This is wrong. Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() created especially for such cases. This way the parent device with its ptp_clock is not released until all references to the cdev are released. This adds a requirement that an initialized but not exposed struct device should be provided to posix_clock_register() by a caller instead of a simple dev_t. This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix the race between the release of watchdog_core_data and cdev"). See details of the implementation in the commit 233ed09d7fda ("chardev: add helper function to register char devs with a struct device"). Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/SVladimir Oltean
For first-generation switches (SJA1105E and SJA1105T): - TPID means C-Tag (typically 0x8100) - TPID2 means S-Tag (typically 0x88A8) While for the second generation switches (SJA1105P, SJA1105Q, SJA1105R, SJA1105S) it is the other way around: - TPID means S-Tag (typically 0x88A8) - TPID2 means C-Tag (typically 0x8100) In other words, E/T tags untagged traffic with TPID, and P/Q/R/S with TPID2. So the patch mentioned below fixed VLAN filtering for P/Q/R/S, but broke it for E/T. We strive for a common code path for all switches in the family, so just lie in the static config packing functions that TPID and TPID2 are at swapped bit offsets than they actually are, for P/Q/R/S. This will make both switches understand TPID to be ETH_P_8021Q and TPID2 to be ETH_P_8021AD. The meaning from the original E/T was chosen over P/Q/R/S because E/T is actually the one with public documentation available (UM10944.pdf). Fixes: f9a1a7646c0d ("net: dsa: sja1105: Reverse TPID and TPID2") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30Documentation: net: dsa: sja1105: Remove text about taprio base-time limitationVladimir Oltean
Since commit 86db36a347b4 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source"), this paragraph is no longer true. So remove it. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Remove restriction of zero base-time for taprio offloadVladimir Oltean
The check originates from the initial implementation which was not based on PTP time but on a standalone clock source. In the meantime we can now program the PTPSCHTM register at runtime with the dynamic base time (actually with a value that is 200 ns smaller, to avoid writing DELTA=0 in the Schedule Entry Points Parameters Table). And we also have logic for moving the actual base time in the future of the PHC's current time base, so the check for zero serves no purpose, since even if the user will specify zero, that's not what will end up in the static config table where the limitation is. Fixes: 86db36a347b4 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Really make the PTP command read-writeVladimir Oltean
When activating tc-taprio offload on the switch ports, the TAS state machine will try to check whether it is running or not, but will find both the STARTED and STOPPED bits as false in the sja1105_tas_check_running function. So the function will return -EINVAL (an abnormal situation) and the kernel will keep printing this from the TAS FSM workqueue: [ 37.691971] sja1105 spi0.1: An operation returned -22 The reason is that the underlying function that gets called, sja1105_ptp_commit, does not actually do a SPI_READ, but a SPI_WRITE. So the command buffer remains initialized with zeroes instead of retrieving the hardware state. Fix that. Fixes: 41603d78b362 ("net: dsa: sja1105: Make the PTP command read-write") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slotVladimir Oltean
The PTP egress timestamp N must be captured from register PTPEGR_TS[n], where n = 2 * PORT + TSREG. There are 10 PTPEGR_TS registers, 2 per port. We are only using TSREG=0. As opposed to the management slots, which are 4 in number (SJA1105_NUM_PORTS, minus the CPU port). Any management frame (which includes PTP frames) can be sent to any non-CPU port through any management slot. When the CPU port is not the last port (#4), there will be a mismatch between the slot and the port number. Luckily, the only mainline occurrence with this switch (arch/arm/boot/dts/ls1021a-tsn.dts) does have the CPU port as #4, so the issue did not manifest itself thus far. Fixes: 47ed985e97f5 ("net: dsa: sja1105: Add logic for TX timestamping") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30sfc: avoid duplicate error handling code in 'efx_ef10_sriov_set_vf_mac()'Christophe JAILLET
'eth_zero_addr()' is already called in the error handling path. This is harmless, but there is no point in calling it twice, so remove one. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30tcp_cubic: refactor code to perform a divide only when neededEric Dumazet
Neal Cardwell suggested to not change ca->delay_min and apply the ack delay cushion only when Hystart ACK train is still under consideration. This should avoid a 64bit divide unless needed. Tested: 40Gbit(mlx4) testbed (with sch_fq as packet scheduler) $ echo -n 'file tcp_cubic.c +p' >/sys/kernel/debug/dynamic_debug/control $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 14815 16280 15293 15563 11574 15145 14789 18548 16972 12520 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 1396 0.0 $ dmesg | tail -10 [ 4873.951350] hystart_ack_train (116 > 93) delay_min 24 (+ ack_delay 69) cwnd 80 [ 4875.155379] hystart_ack_train (55 > 50) delay_min 21 (+ ack_delay 29) cwnd 160 [ 4876.333921] hystart_ack_train (69 > 62) delay_min 23 (+ ack_delay 39) cwnd 130 [ 4877.519037] hystart_ack_train (69 > 60) delay_min 22 (+ ack_delay 38) cwnd 130 [ 4878.701559] hystart_ack_train (87 > 63) delay_min 24 (+ ack_delay 39) cwnd 160 [ 4879.844597] hystart_ack_train (93 > 50) delay_min 21 (+ ack_delay 29) cwnd 216 [ 4880.956650] hystart_ack_train (74 > 67) delay_min 20 (+ ack_delay 47) cwnd 108 [ 4882.098500] hystart_ack_train (61 > 57) delay_min 23 (+ ack_delay 34) cwnd 130 [ 4883.262056] hystart_ack_train (72 > 67) delay_min 21 (+ ack_delay 46) cwnd 130 [ 4884.418760] hystart_ack_train (74 > 67) delay_min 29 (+ ack_delay 38) cwnd 152 10Gbit(bnx2x) testbed (with sch_fq as packet scheduler) $ echo -n 'file tcp_cubic.c +p' >/sys/kernel/debug/dynamic_debug/control $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpk52 -l -4000000; done;nstat|egrep "Hystart" 7050 7065 7100 6900 7202 7263 7189 6869 7463 7034 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 3199 0.0 $ dmesg | tail -10 [ 176.920012] hystart_ack_train (161 > 141) delay_min 83 (+ ack_delay 58) cwnd 264 [ 179.144645] hystart_ack_train (164 > 159) delay_min 120 (+ ack_delay 39) cwnd 444 [ 181.354527] hystart_ack_train (214 > 168) delay_min 125 (+ ack_delay 43) cwnd 436 [ 183.539565] hystart_ack_train (170 > 147) delay_min 96 (+ ack_delay 51) cwnd 326 [ 185.727309] hystart_ack_train (177 > 160) delay_min 61 (+ ack_delay 99) cwnd 128 [ 187.947142] hystart_ack_train (184 > 167) delay_min 123 (+ ack_delay 44) cwnd 367 [ 190.166680] hystart_ack_train (230 > 153) delay_min 116 (+ ack_delay 37) cwnd 444 [ 192.327285] hystart_ack_train (210 > 206) delay_min 86 (+ ack_delay 120) cwnd 152 [ 194.511392] hystart_ack_train (173 > 151) delay_min 94 (+ ack_delay 57) cwnd 239 [ 196.736023] hystart_ack_train (149 > 146) delay_min 105 (+ ack_delay 41) cwnd 399 Fixes: 42f3a8aaae66 ("tcp_cubic: tweak Hystart detection for short RTT flows") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Link: https://www.spinics.net/lists/netdev/msg621886.html Link: https://www.spinics.net/lists/netdev/msg621797.html Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30cxgb4/cxgb4vf: fix flow control display for auto negotiationRahul Lakkireddy
As per 802.3-2005, Section Two, Annex 28B, Table 28B-2 [1], when _only_ Rx pause is enabled, both symmetric and asymmetric pause towards local device must be enabled. Also, firmware returns the local device's flow control pause params as part of advertised capabilities and negotiated params as part of current link attributes. So, fix up ethtool's flow control pause params fetch logic to read from acaps, instead of linkattr. [1] https://standards.ieee.org/standard/802_3-2005.html Fixes: c3168cabe1af ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities") Signed-off-by: Surendra Mobiya <surendra@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner. 2) Document ingress hook in netdevice, also from Lukas. 3) Remove htons() in tunnel metadata port netlink attributes, from Xin Long. 4) Missing erspan netlink attribute validation also from Xin Long. 5) Missing erspan version in tunnel, from Xin Long. 6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN} Patch from Xin Long. 7) Missing nla_nest_cancel() in tunnel netlink dump path, from Xin Long. 8) Remove two exported conntrack symbols with no clients, from Florian Westphal. 9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian. 10) Add nft_meta_pkttype helper for loopback, also from Florian. 11) Add nft_meta_socket uid helper, from Florian Westphal. 12) Add nft_meta_cgroup helper, from Florian. 13) Add nft_meta_ifkind helper, from Florian. 14) Group all interface related meta selector, from Florian. 15) Add nft_prandom_u32() helper, from Florian. 16) Add nft_meta_rtclassid helper, from Florian. 17) Add support for matching on the slave device index, from Florian. This batch, among other things, contains updates for the netfilter tunnel netlink interface: This extension is still incomplete and lacking proper userspace support which is actually my fault, I did not find the time to go back and finish this. This update is breaking tunnel UAPI in some aspects to fix it but do it better sooner than never. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29Linux 5.5-rc4Linus Torvalds
2019-12-29Merge branch 'mlxsw-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Couple of fixes This patch set contains two fixes for mlxsw. Please consider both for stable. Patch #1 from Amit fixes a wrong check during MAC validation when creating router interfaces (RIFs). Given a particular order of configuration this can result in the driver refusing to create new RIFs. Patch #2 fixes a wrong trap configuration in which VRRP packets and routing exceptions were policed by the same policer towards the CPU. In certain situations this can prevent VRRP packets from reaching the CPU. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29mlxsw: spectrum: Use dedicated policer for VRRP packetsIdo Schimmel
Currently, VRRP packets and packets that hit exceptions during routing (e.g., MTU error) are policed using the same policer towards the CPU. This means, for example, that misconfiguration of the MTU on a routed interface can prevent VRRP packets from reaching the CPU, which in turn can cause the VRRP daemon to assume it is the Master router. Fix this by using a dedicated policer for VRRP packets. Fixes: 11566d34f895 ("mlxsw: spectrum: Add VRRP traps") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29mlxsw: spectrum_router: Skip loopback RIFs during MAC validationAmit Cohen
When a router interface (RIF) is created the MAC address of the backing netdev is verified to have the same MSBs as existing RIFs. This is required in order to avoid changing existing RIF MAC addresses that all share the same MSBs. Loopback RIFs are special in this regard as they do not have a MAC address, given they are only used to loop packets from the overlay to the underlay. Without this change, an error is returned when trying to create a RIF after the creation of a GRE tunnel that is represented by a loopback RIF. 'rif->dev->dev_addr' points to the GRE device's local IP, which does not share the same MSBs as physical interfaces. Adding an IP address to any physical interface results in: Error: mlxsw_spectrum: All router interface MAC addresses must have the same prefix. Fix this by skipping loopback RIFs during MAC validation. Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29Merge tag 'riscv/for-v5.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "One important fix for RISC-V: - Redirect any incoming syscall with an ID less than -1 to sys_ni_syscall, rather than allowing them to fall through into the syscall handler. and two minor build fixes: - Export __asm_copy_{from,to}_user() from where they are defined. This fixes a build error triggered by some randconfigs. - Export flush_icache_all(). I'd resisted this before, since historically we didn't want modules to be able to flush the I$ directly; but apparently everyone else is doing it now" * tag 'riscv/for-v5.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: export flush_icache_all to modules riscv: reject invalid syscalls below -1 riscv: fix compile failure with EXPORT_SYMBOL() & !MMU
2019-12-29Merge tag 'locks-v5.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull /proc/locks formatting fix from Jeff Layton: "This is a trivial fix for a _very_ long standing bug in /proc/locks formatting. Ordinarily, I'd wait for the merge window for something like this, but it is making it difficult to validate some overlayfs fixes. I've also gone ahead and marked this for stable" * tag 'locks-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: print unsigned ino in /proc/locks
2019-12-29Merge tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "One performance fix for large directory searches, and one minor style cleanup noticed by Clang" * tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Optimize readdir on reparse points cifs: Adjust indentation in smb2_open_file
2019-12-29locks: print unsigned ino in /proc/locksAmir Goldstein
An ino is unsigned, so display it as such in /proc/locks. Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-12-28Merge branch 'DSA-TX-tstamp'David S. Miller
Vladimir Oltean says: ==================== The DSA TX timestamping situation This series is the moral v2 of "[PATCH net] net: dsa: sja1105: Fix double delivery of TX timestamps to socket error queue" [0] which did not manage to convince public opinion (actually it didn't convince me neither). This fixes PTP timestamping on one particular board, where the DSA switch is sja1105 and the master is gianfar. Unfortunately there is no way to make the fix more general without committing logical inaccuracies: the SKBTX_IN_PROGRESS flag does serve a purpose, even if the sja1105 driver is not using it now: it prevents delivering a SW timestamp to the app socket when the HW timestamp will be provided. So not setting this flag (the approach from v1) might create avoidable complications in the future (not to mention that there isn't any satisfactory explanation on why that would be the correct solution). So the goal of this change set is to create a more strict framework for DSA master devices when attached to PTP switches, and to fix the first master driver that is overstepping its duties and is delivering unsolicited TX timestamps. [0]: https://www.spinics.net/lists/netdev/msg619699.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28net: dsa: Deny PTP on master if switch supports itVladimir Oltean
It is possible to kill PTP on a DSA switch completely and absolutely, until a reboot, with a simple command: tcpdump -i eth2 -j adapter_unsynced where eth2 is the switch's DSA master. Why? Well, in short, the PTP API in place today is a bit rudimentary and relies on applications to retrieve the TX timestamps by polling the error queue and looking at the cmsg structure. But there is no timestamp identification of any sorts (except whether it's HW or SW), you don't know how many more timestamps are there to come, which one is this one, from whom it is, etc. In other words, the SO_TIMESTAMPING API is fundamentally limited in that you can get a single HW timestamp from the stack. And the "-j adapter_unsynced" flag of tcpdump enables hardware timestamping. So let's imagine what happens when the DSA master decides it wants to deliver TX timestamps to the skb's socket too: - The timestamp that the user space sees is taken by the DSA master. Whereas the RX timestamp will eventually be overwritten by the DSA switch. So the RX and TX timestamps will be in different time bases (aka garbage). - The user space applications have no way to deal with the second (real) TX timestamp finally delivered by the DSA switch, or even to know to wait for it. Take ptp4l from the linuxptp project, for example. This is its behavior after running tcpdump, before the patch: ptp4l[172]: [6469.594] Unexpected data on socket err queue: ptp4l[172]: [6469.693] rms 8 max 16 freq -21257 +/- 11 delay 748 +/- 0 ptp4l[172]: [6469.711] Unexpected data on socket err queue: ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.721] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.838] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.848] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02 ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05 ptp4l[172]: 0040 de 06 00 01 ptp4l[172]: [6469.855] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.974] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 The ptp4l program itself is heavily patched to show this (more details here [0]). Otherwise, by default it just hangs. On the other hand, with the DSA patch to disallow HW timestamping applied: tcpdump -i eth2 -j adapter_unsynced tcpdump: SIOCSHWTSTAMP failed: Device or resource busy So it is a fact of life that PTP timestamping on the DSA master is incompatible with timestamping on the switch MAC, at least with the current API. And if the switch supports PTP, taking the timestamps from the switch MAC is highly preferable anyway, due to the fact that those don't contain the queuing latencies of the switch. So just disallow PTP on the DSA master if there is any PTP-capable switch attached. [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/ Fixes: 0336369d3a4d ("net: dsa: forward hardware timestamping ioctls to switch driver") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28gianfar: Fix TX timestamping with a stacked DSA driverVladimir Oltean
The driver wrongly assumes that it is the only entity that can set the SKBTX_IN_PROGRESS bit of the current skb. Therefore, in the gfar_clean_tx_ring function, where the TX timestamp is collected if necessary, the aforementioned bit is used to discriminate whether or not the TX timestamp should be delivered to the socket's error queue. But a stacked driver such as a DSA switch can also set the SKBTX_IN_PROGRESS bit, which is actually exactly what it should do in order to denote that the hardware timestamping process is undergoing. Therefore, gianfar would misinterpret the "in progress" bit as being its own, and deliver a second skb clone in the socket's error queue, completely throwing off a PTP process which is not expecting to receive it, _even though_ TX timestamping is not enabled for gianfar. There have been discussions [0] as to whether non-MAC drivers need or not to set SKBTX_IN_PROGRESS at all (whose purpose is to avoid sending 2 timestamps, a sw and a hw one, to applications which only expect one). But as of this patch, there are at least 2 PTP drivers that would break in conjunction with gianfar: the sja1105 DSA switch and the felix switch, by way of its ocelot core driver. So regardless of that conclusion, fix the gianfar driver to not do stuff based on flags set by others and not intended for it. [0]: https://www.spinics.net/lists/netdev/msg619699.html Fixes: f0ee7acfcdd4 ("gianfar: Add hardware TX timestamping support") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28net/wan/fsl_ucc_hdlc: remove set but not used variables 'ut_info' and 'ret'Chen Zhou
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/wan/fsl_ucc_hdlc.c: In function ucc_hdlc_irq_handler: drivers/net/wan/fsl_ucc_hdlc.c:643:23: warning: variable ut_info set but not used [-Wunused-but-set-variable] drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_suspend: drivers/net/wan/fsl_ucc_hdlc.c:880:23: warning: variable ut_info set but not used [-Wunused-but-set-variable] drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_resume: drivers/net/wan/fsl_ucc_hdlc.c:925:6: warning: variable ret set but not used [-Wunused-but-set-variable] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27riscv: export flush_icache_all to modulesOlof Johansson
This is needed by LKDTM (crash dump test module), it calls flush_icache_range(), which on RISC-V turns into flush_icache_all(). On other architectures, the actual implementation is exported, so follow that precedence and export it here too. Fixes build of CONFIG_LKDTM that fails with: ERROR: "flush_icache_all" [drivers/misc/lkdtm/lkdtm.ko] undefined! Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27riscv: reject invalid syscalls below -1David Abdurachmanov
Running "stress-ng --enosys 4 -t 20 -v" showed a large number of kernel oops with "Unable to handle kernel paging request at virtual address" message. This happens when enosys stressor starts testing random non-valid syscalls. I forgot to redirect any syscall below -1 to sys_ni_syscall. With the patch kernel oops messages are gone while running stress-ng enosys stressor. Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com> Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27riscv: fix compile failure with EXPORT_SYMBOL() & !MMULuc Van Oostenryck
When support for !MMU was added, the declaration of __asm_copy_to_user() & __asm_copy_from_user() were #ifdefed out hence their EXPORT_SYMBOL() give an error message like: .../riscv_ksyms.c:13:15: error: '__asm_copy_to_user' undeclared here .../riscv_ksyms.c:14:15: error: '__asm_copy_from_user' undeclared here Since these symbols are not defined with !MMU it's wrong to export them. Same for __clear_user() (even though this one is also declared in include/asm-generic/uaccess.h and thus doesn't give an error message). Fix this by doing the EXPORT_SYMBOL() directly where these symbols are defined: inside lib/uaccess.S itself. Fixes: 6bd33e1ece52 ("riscv: fix compile failure with EXPORT_SYMBOL() & !MMU") Reported-by: kbuild test robot <lkp@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four fixes and one spelling update, all in drivers: two in lpfc and the rest in mp3sas, cxgbi and target" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target/iblock: Fix protection error with blocks greater than 512B scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy() scsi: lpfc: fix spelling mistakes of asynchronous scsi: lpfc: fix build failure with DEBUGFS disabled scsi: mpt3sas: Fix double free in attach error handling
2019-12-27Merge branch 'ethtool-netlink-part-one'David S. Miller
Michal Kubecek says: ==================== ethtool netlink interface, part 1 This is first part of netlink based alternative userspace interface for ethtool. It aims to address some long known issues with the ioctl interface, mainly lack of extensibility, raciness, limited error reporting and absence of notifications. The goal is to allow userspace ethtool utility to provide all features it currently does but without using the ioctl interface. However, some features provided by ethtool ioctl API will be available through other netlink interfaces (rtnetlink, devlink) if it's more appropriate. The interface uses generic netlink family "ethtool" and provides multicast group "monitor" which is used for notifications. Documentation for the interface is in Documentation/networking/ethtool-netlink.rst file. The netlink interface is optional, it is built when CONFIG_ETHTOOL_NETLINK (bool) option is enabled. There are three types of request messages distinguished by suffix "_GET" (query for information), "_SET" (modify parameters) and "_ACT" (perform an action). Kernel reply messages have name with additional suffix "_REPLY" (e.g. ETHTOOL_MSG_SETTINGS_GET_REPLY). Most "_SET" and "_ACT" message types do not have matching reply type as only some of them need additional reply data beyond numeric error code and extack. Kernel also broadcasts notification messages ("_NTF" suffix) on changes. Basic concepts: - make extensions easier not only by allowing new attributes but also by imposing as few artificial limits as possible, e.g. by using arbitrary size bit sets for most bitmap attributes or by not using fixed size strings - use extack for error reporting and warnings - send netlink notifications on changes (even if they were done using the ioctl interface) and actions - avoid the racy read/modify/write cycle between kernel and userspace by sending only attributes which userspace wants to change; there is still a read/modify/write cycle between generic kernel code and ethtool_ops handler in NIC driver but it is only in kernel and under RTNL lock - reduce the number of name lists that need to be kept in sync between kernel and userspace (e.g. recognized link modes) - where feasible, allow dump requests to query specific information for all network devices - as parsing and generating netlink messages is more complicated than simply copying data structures between userspace API and ethtool_ops handlers (which most ioctl commands do), split the code into multiple files in net/ethtool directory; move net/core/ethtool.c also to this directory and rename it to ioctl.c Changes between v8 and v9: - fix ethnl_update_u8() - fix description of ETHTOOL_A_LINKSTATE_LINK in rst file - add explanation of verbose vs. compact bitset usage to documentation - link ethtool-netlink.rst into toctree Main changes between v7 and v8: - preliminary patches sent as a separate series (already in net-next) - split notification related changes out of _SET patches - drop request specific flags from common header - use FLAG/flag rather than GFLAG/gflag for global flags (as there are only global flags now) - allow device names up to ALTIFNAMSIZ characters - rename ETHTOOL_A_BITSET_LIST to ETHTOOL_A_BITSET_NOMASK - rename ETHTOOL_A_BIT{,S}_* to ETHTOOL_A_BITSET_BIT{,S}_* - use standard bitset helpers for link modes (rather than in-place conversion) - use "default" rather than "standard" for unified _GET handlers - fixed 64-bit big endian bitset code Main changes between v6 and v7: - split complex messages into small single purpose ones (drop info and request masks and one level of nesting) - separate request information and reply data into two structures - refactor bitset handling (no simultaneous u32/ulong handling but avoid kmalloc() except for long bitmaps on 64-bit big endian architectures) - use only fixed size strings internally (will be replaced by char * eventually but that will require rewriting also existing ioctl code) - rework ethnl_update_* helpers to return error code - rename request flag constants (to ETHTOOL_[GR]FLAG_ prefix) - convert documentation to rst Main changes between v5 and v6: - use ETHTOOL_MSG_ prefix for message types - replace ETHA_ prefix for netlink attributes by ETHTOOL_A_ - replace ETH_x_IM_y for infomask bits by ETHTOOL_IM_x_y - split GET reply types from SET requests and notifications - split kernel and userspace message types into different enums - remove INFO_GET requests from submitted part - drop EVENT notifications (use rtnetlink and on-demand string set load) - reorganize patches to reduce the number of intermitent warnings - unify request/reply header and its processing - another nest around strings in a string set for consistency - more consistent identifier naming - coding style cleanup - get rid of some of the helpers - set bad attribute in extack where applicable - various bug fixes - improve documentation and code comments, more kerneldoc comments - more verbose commit messages Changes between v4 and v5: - do not panic on failed initialization, only WARN() Main changes between RFC v3 and v4: - use more kerneldoc style comments - strict attribute policy checking - use macros for tables of link mode names and parameters - provide permanent hardware address in rtnetlink - coding style cleanup - split too long patches, reorder - wrap more ETHA_SETTINGS_* attributes in nests - add also some SET_* implementation into submitted part Main changes between RFC v2 and RFC v3: - do not allow building as a module (no netdev notifiers needed) - drop some obsolete fields - add permanent hw address, timestamping and private flags support - rework bitset handling to get rid of variable length arrays - notify monitor on device renames - restructure GET_SETTINGS/SET_SETTINGS messages - split too long patches and submit only first part of the series Main changes between RFC v1 and RFC v2: - support dumps for all "get" requests - provide notifications for changes related to supported request types - support getting string sets (both global and per device) - support getting/setting device features - get rid of family specific header, everything passed as attributes - split netlink code into multiple files in net/ethtool/ directory ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide link state with LINKSTATE_GET requestMichal Kubecek
Implement LINKSTATE_GET netlink request to get link state information. At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command is returned. LINKSTATE_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: add LINKMODES_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKMODES_GET request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKMODES_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: set link modes related data with LINKMODES_SET requestMichal Kubecek
Implement LINKMODES_SET netlink request to set advertised linkmodes and related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do. The request allows setting autonegotiation flag, speed, duplex and advertised link modes. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide link mode information with LINKMODES_GET requestMichal Kubecek
Implement LINKMODES_GET netlink request to get link modes related information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides supported, advertised and peer advertised link modes, autonegotiation flag, speed and duplex. LINKMODES_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: add LINKINFO_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKINFO_GET request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKINFO_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: add default notification handlerMichal Kubecek
The ethtool netlink notifications have the same format as related GET replies so that if generic GET handling framework is used to process GET requests, its callbacks and instance of struct get_request_ops can be also used to compose corresponding notification message. Provide function ethnl_std_notify() to be used as notification handler in ethnl_notify_handlers table. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: set link settings with LINKINFO_SET requestMichal Kubecek
Implement LINKINFO_SET netlink request to set link settings queried by LINKINFO_GET message. Only physical port, phy MDIO address and MDI(-X) control can be set, attempt to modify MDI(-X) status and transceiver is rejected. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide link settings with LINKINFO_GET requestMichal Kubecek
Implement LINKINFO_GET netlink request to get basic link settings provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides settings not directly related to autonegotiation and link mode selection: physical port, phy MDIO address, MDI(-X) status, MDI(-X) control and transceiver. LINKINFO_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide string sets with STRSET_GET requestMichal Kubecek
Requests a contents of one or more string sets, i.e. indexed arrays of strings; this information is provided by ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all information can be retrieved with one request and mulitple string sets can be requested at once. There are three types of requests: - no NLM_F_DUMP, no device: get "global" stringsets - no NLM_F_DUMP, with device: get string sets related to the device - NLM_F_DUMP, no device: get device related string sets for all devices Client can request either all string sets of given type (global or device related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only set sizes (numbers of strings) are returned. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: default handlers for GET requestsMichal Kubecek
Significant part of GET request processing is common for most request types but unfortunately it cannot be easily separated from type specific code as we need to alternate between common actions (parsing common request header, allocating message and filling netlink/genetlink headers etc.) and specific actions (querying the device, composing the reply). The processing also happens in three different situations: "do" request, "dump" request and notification, each doing things in slightly different way. The request specific code is implemented in four or five callbacks defined in an instance of struct get_request_ops: parse_request() - parse incoming message prepare_data() - retrieve data from driver or NIC reply_size() - estimate reply message size fill_reply() - compose reply message cleanup_data() - (optional) clean up additional data Other members of struct get_request_ops describe the data structure holding information from client request and data used to compose the message. The default handlers ethnl_default_doit(), ethnl_default_dumpit(), ethnl_default_start() and ethnl_default_done() can be then used in genl_ops handler. Notification handler will be introduced in a later patch. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: support for netlink notificationsMichal Kubecek
Add infrastructure for ethtool netlink notifications. There is only one multicast group "monitor" which is used to notify userspace about changes and actions performed. Notification messages (types using suffix _NTF) share the format with replies to GET requests. Notifications are supposed to be broadcasted on every configuration change, whether it is done using the netlink interface or ioctl one. Netlink SET requests only trigger a notification if some data is actually changed. To trigger an ethtool notification, both ethtool netlink and external code use ethtool_notify() helper. This helper requires RTNL to be held and may sleep. Handlers sending messages for specific notification message types are registered in ethnl_notify_handlers array. As notifications can be triggered from other code, ethnl_ok flag is used to prevent an attempt to send notification before genetlink family is registered. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>