aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-07-30Merge branch 'clean-devlink-net-namespace-operations'Jakub Kicinski
Leon Romanovsky says: ==================== Clean devlink net namespace operations This short series continues my work on devlink core code to make devlink reload less prone to errors and harden it from API abuse. Despite first patch being a clear fix, I would ask you to apply it to net-next anyway, because the fixed patch is anyway old and it will help us to eliminate merge conflicts that will arise for following patches or even for the second one. ==================== Link: https://lore.kernel.org/r/cover.1627578998.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30devlink: Allocate devlink directly in requested net namespaceLeon Romanovsky
There is no need in extra call indirection and check from impossible flow where someone tries to set namespace without prior call to devlink_alloc(). Instead of this extra logic and additional EXPORT_SYMBOL, use specialized devlink allocation function that receives net namespace as an argument. Such specialized API allows clear view when devlink initialized in wrong net namespace and/or kernel users don't try to change devlink namespace under the hood. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30devlink: Break parameter notification sequence to be before/after ↵Leon Romanovsky
unload/load driver The change of namespaces during devlink reload calls to driver unload before it accesses devlink parameters. The commands below causes to use-after-free bug when trying to get flow steering mode. * ip netns add n1 * devlink dev reload pci/0000:00:09.0 netns n1 ================================================================== BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core] Read of size 4 at addr ffff888009d04308 by task devlink/275 CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x93/0xc2 print_address_description.constprop.0+0x18/0x140 ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core] ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core] kasan_report.cold+0x7c/0xd8 ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core] mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core] devlink_nl_param_fill+0x1c8/0xe80 ? __free_pages_ok+0x37a/0x8a0 ? devlink_flash_update_timeout_notify+0xd0/0xd0 ? lock_acquire+0x1a9/0x6d0 ? fs_reclaim_acquire+0xb7/0x160 ? lock_is_held_type+0x98/0x110 ? 0xffffffff81000000 ? lock_release+0x1f9/0x6c0 ? fs_reclaim_release+0xa1/0xf0 ? lock_downgrade+0x6d0/0x6d0 ? lock_is_held_type+0x98/0x110 ? lock_is_held_type+0x98/0x110 ? memset+0x20/0x40 ? __build_skb_around+0x1f8/0x2b0 devlink_param_notify+0x6d/0x180 devlink_reload+0x1c3/0x520 ? devlink_remote_reload_actions_performed+0x30/0x30 ? mutex_trylock+0x24b/0x2d0 ? devlink_nl_cmd_reload+0x62b/0x1070 devlink_nl_cmd_reload+0x66d/0x1070 ? devlink_reload+0x520/0x520 ? devlink_get_from_attrs+0x1bc/0x260 ? devlink_nl_pre_doit+0x64/0x4d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 ? mutex_lock_io_nested+0x1130/0x1130 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? security_capable+0x51/0x90 genl_rcv_msg+0x27f/0x4a0 ? genl_get_cmd+0x3c0/0x3c0 ? lock_acquire+0x1a9/0x6d0 ? devlink_reload+0x520/0x520 ? lock_release+0x6c0/0x6c0 netlink_rcv_skb+0x11d/0x340 ? genl_get_cmd+0x3c0/0x3c0 ? netlink_ack+0x9f0/0x9f0 ? lock_release+0x1f9/0x6c0 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 ? netlink_attachskb+0x730/0x730 ? _copy_from_iter_full+0x178/0x650 ? __alloc_skb+0x113/0x2b0 netlink_sendmsg+0x6f1/0xbd0 ? netlink_unicast+0x700/0x700 ? lock_is_held_type+0x98/0x110 ? netlink_unicast+0x700/0x700 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x193/0x240 ? __x64_sys_getpeername+0xb0/0xb0 ? do_sys_openat2+0x10b/0x370 ? __up_read+0x1a1/0x7b0 ? do_user_addr_fault+0x219/0xdc0 ? __x64_sys_openat+0x120/0x1d0 ? __x64_sys_open+0x1a0/0x1a0 __x64_sys_sendto+0xdd/0x1b0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc69d0af14a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003 RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 146: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x99/0xc0 mlx5_init_fs+0xf0/0x1c50 [mlx5_core] mlx5_load+0xd2/0x180 [mlx5_core] mlx5_init_one+0x2f6/0x450 [mlx5_core] probe_one+0x47d/0x6e0 [mlx5_core] pci_device_probe+0x2a0/0x4a0 really_probe+0x20a/0xc90 driver_probe_device+0xd8/0x380 device_driver_attach+0x1df/0x250 __driver_attach+0xff/0x240 bus_for_each_dev+0x11e/0x1a0 bus_add_driver+0x309/0x570 driver_register+0x1ee/0x380 0xffffffffa06b8062 do_one_initcall+0xd5/0x410 do_init_module+0x1c8/0x760 load_module+0x6d8b/0x9650 __do_sys_finit_module+0x118/0x1b0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 275: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0x102/0x140 slab_free_freelist_hook+0x74/0x1b0 kfree+0xd7/0x2a0 mlx5_unload+0x16/0xb0 [mlx5_core] mlx5_unload_one+0xae/0x120 [mlx5_core] mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core] devlink_reload+0x141/0x520 devlink_nl_cmd_reload+0x66d/0x1070 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x27f/0x4a0 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6f1/0xbd0 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x193/0x240 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff888009d04300 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 8 bytes inside of 128-byte region [ffff888009d04300, ffff888009d04380) The buggy address belongs to the page: page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04 head:0000000086a64ecc order:1 compound_mapcount:0 flags: 0x4000000000010200(slab|head) raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0 raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== The right solution to devlink reload is to notify about deletion of parameters, unload driver, change net namespaces, load driver and notify about addition of parameters. Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload") Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30sk_buff: avoid potentially clearing 'slow_gro' fieldPaolo Abeni
If skb_dst_set_noref() is invoked with a NULL dst, the 'slow_gro' field is cleared, too. That could lead to wrong behavior if the skb later enters the GRO stage. Fix the potential issue replacing preserving a non-zero value of the 'slow_gro' field. Additionally, fix a comment typo. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 8a886b142bd0 ("sk_buff: track dst status in slow_gro") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/aa42529252dc8bb02bd42e8629427040d1058537.1627662501.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30net: netlink: Remove unused functionYajun Deng
lockdep_genl_is_held() and its caller arm not used now, just remove them. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Link: https://lore.kernel.org/r/20210729074854.8968-1-yajun.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30Merge branch 'nfc-constify-pointed-data-missed-part'Jakub Kicinski
Krzysztof Kozlowski says: ==================== nfc: constify pointed data - missed part This was previously sent [1] but got lost. It was a prerequisite to part two of NFC const [2]. Changes since v2: 1. Drop patch previously 7/8 which cases new warnings "warning: Using plain integer as NULL pointer". Changes since v1: 1. Add patch 1/8 fixing up nfcmrvl_spi_parse_dt() [1] https://lore.kernel.org/lkml/20210726145224.146006-1-krzysztof.kozlowski@canonical.com/ [2] https://lore.kernel.org/linux-nfc/20210729104022.47761-1-krzysztof.kozlowski@canonical.com/T/#m199fbdde180fa005a10addf28479fcbdc6263eab ==================== Link: https://lore.kernel.org/r/20210730144202.255890-1-krzysztof.kozlowski@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: hci: cleanup unneeded spacesKrzysztof Kozlowski
No need for multiple spaces in variable declaration (the code does not use them in other places). No functional change. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: nci: constify several pointers to u8, sk_buff and other structsKrzysztof Kozlowski
Several functions receive pointers to u8, sk_buff or other structs but do not modify the contents so make them const. This allows doing the same for local variables and in total makes the code a little bit safer. This makes const also data passed as "unsigned long opt" argument to nci_request() function. Usual flow for such functions is: 1. Receive "u8 *" and store it (the pointer) in a structure allocated on stack (e.g. struct nci_set_config_param), 2. Call nci_request() or __nci_request() passing a callback function an the pointer to the structure via an "unsigned long opt", 3. nci_request() calls the callback which dereferences "unsigned long opt" in a read-only way. This converts all above paths to use proper pointer to const data, so entire flow is safer. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: constify local pointer variablesKrzysztof Kozlowski
Few pointers to struct nfc_target and struct nfc_se can be made const. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: constify several pointers to u8, char and sk_buffKrzysztof Kozlowski
Several functions receive pointers to u8, char or sk_buff but do not modify the contents so make them const. This allows doing the same for local variables and in total makes the code a little bit safer. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: hci: annotate nfc_llc_init() as __initKrzysztof Kozlowski
The nfc_llc_init() is used only in other __init annotated context. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: annotate af_nfc_exit() as __exitKrzysztof Kozlowski
The af_nfc_exit() is used only in other __exit annotated context (nfc_exit()). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30nfc: mrvl: correct nfcmrvl_spi_parse_dt() device_node argumentKrzysztof Kozlowski
The device_node in nfcmrvl_spi_parse_dt() cannot be const as it is passed to OF functions which modify it. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30net: convert fib_treeref from int to refcount_tYajun Deng
refcount_t type should be used instead of int when fib_treeref is used as a reference counter,and avoid use-after-free risks. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210729071350.28919-1-yajun.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-29bcm63xx_enet: delete a redundant assignmentTang Bin
In the function bcm_enetsw_probe(), 'ret' will be assigned by bcm_enet_change_mtu(), so 'ret = 0' make no sense. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: dsa: don't set skb->offload_fwd_mark when not offloading the bridgeVladimir Oltean
DSA has gained the recent ability to deal gracefully with upper interfaces it cannot offload, such as the bridge, bonding or team drivers. When such uppers exist, the ports are still in standalone mode as far as the hardware is concerned. But when we deliver packets to the software bridge in order for that to do the forwarding, there is an unpleasant surprise in that the bridge will refuse to forward them. This is because we unconditionally set skb->offload_fwd_mark = true, meaning that the bridge thinks the frames were already forwarded in hardware by us. Since dp->bridge_dev is populated only when there is hardware offload for it, but not in the software fallback case, let's introduce a new helper that can be called from the tagger data path which sets the skb->offload_fwd_mark accordingly to zero when there is no hardware offload for bridging. This lets the bridge forward packets back to other interfaces of our switch, if needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29ipvlan: Add handling of NETDEV_UP eventsDi Zhu
When an ipvlan device is created on a bond device, the link state of the ipvlan device may be abnormal. This is because bonding device allows to add physical network card device in the down state and so NETDEV_CHANGE event will not be notified to other listeners, so ipvlan has no chance to update its link status. The following steps can cause such problems: 1) bond0 is down 2) ip link add link bond0 name ipvlan type ipvlan mode l2 3) echo +enp2s7 >/sys/class/net/bond0/bonding/slaves 4) ip link set bond0 up After these steps, use ip link command, we found ipvlan has NO-CARRIER: ipvlan@bond0: <NO-CARRIER, BROADCAST,MULTICAST,UP,M-DOWN> mtu ...> We can deal with this problem like VLAN: Add handling of NETDEV_UP events. If we receive NETDEV_UP event, we will update the link status of the ipvlan. Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net/sched: store the last executed chain also for clsact egressDavide Caratti
currently, only 'ingress' and 'clsact ingress' qdiscs store the tc 'chain id' in the skb extension. However, userspace programs (like ovs) are able to setup egress rules, and datapath gets confused in case it doesn't find the 'chain id' for a packet that's "recirculated" by tc. Change tcf_classify() to have the same semantic as tcf_classify_ingress() so that a single function can be called in ingress / egress, using the tc ingress / egress block respectively. Suggested-by: Alaa Hleilel <alaa@nvidia.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29Merge branch 'dpaa2-switch-add-mirroring-support'David S. Miller
Ioana Ciornei says: ==================== dpaa2-switch: add mirroring support This patch set adds per port and per VLAN mirroring in dpaa2-switch. The first 4 patches are just cosmetic changes. We renamed the dpaa2_switch_acl_tbl structure into dpaa2_switch_filter_block so that we can reuse it for filters that do not use the ACL table and reorganized the addition of trap, redirect and drop filters into a separate function. All this just to make for a more streamlined addition of the support for mirroring. The next 4 patches are actually adding the advertised support. Mirroring rules can be added in shared blocks, the driver will replicate the same configuration on all the switch ports part of the same block. The last patch documents the feature, presents its behavior and limitations and gives a couple of examples. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29docs: networking: dpaa2: document mirroring support on the switchIoana Ciornei
Document the mirroring capabilities of the dpaa2-switch driver, any restrictions that are imposed and some example commands. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: offload shared block mirror filters when binding to a portIoana Ciornei
When mirroring rules are added in shared filter blocks, the same mirroring rule has to be configured on all the switch ports that are part of the same block. In case a switch port joins a shared block after mirroring filters have been already added to it, then all the mirror rules should be offloaded to the port. The reverse, removal of mirroring rules, has to be done at block unbind. For this purpose, the dpaa2_switch_block_offload_mirror() and dpaa2_switch_block_unoffload_mirror() functions are added and called upon binding and unbinding a switch port to/from a block. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: add VLAN based mirroringIoana Ciornei
Using the infrastructure added in the previous patch, extend tc-flower support with FLOW_ACTION_MIRRED based on VLAN. Tested with: tc qdisc add dev eth8 ingress_block 1 clsact tc filter add block 1 ingress protocol 802.1q flower skip_sw \ vlan_id 100 action mirred egress mirror dev eth6 tc filter del block 1 ingress pref 49152 Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: add support for port mirroringIoana Ciornei
Add support for per port mirroring for the DPAA2 switch. We support only single mirror port, therefore we allow mirroring rules only as long as the destination port is always the same. Unlike all the actions (drop, redirect, trap) already supported by the dpaa2-switch driver, adding mirroring filters in shared blocks is not achieved by a singular ACL entry added in a table shared by the ports. This is why, when a new mirror filter is added in a block we have to got through all the switch ports sharing it and configure the filter individually on all. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: add API for setting up mirroringIoana Ciornei
Add the necessary MC API for setting up and configuring the mirroring feature on the DPSW DPAA2 object. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: reorganize dpaa2_switch_cls_matchall_replaceIoana Ciornei
Extract the necessary steps to offload a filter by using the ACL table in a separate function - dpaa2_switch_cls_matchall_replace_acl(). This is intended to help with the code readability when the mirroring support is added. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: reorganize dpaa2_switch_cls_flower_replaceIoana Ciornei
Extract the necessary steps to offload a filter by using the ACL table in a separate function - dpaa2_switch_cls_flower_replace_acl(). This is intended to help with the code readability when the mirroring support is added. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: rename dpaa2_switch_acl_tbl into filter_blockIoana Ciornei
Until now, shared filter blocks were implemented only by ACL tables shared between ports. Going forward, when the mirroring support will be added, this will not be true anymore. Rename the dpaa2_switch_acl_tbl into dpaa2_switch_filter_block so that we make it clear that the structure is used not only for filters that use the ACL table but will be used for all the filters that are added in a block. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29dpaa2-switch: rename dpaa2_switch_tc_parse_action to specify the ACLIoana Ciornei
Until now, the dpaa2_switch_tc_parse_action() function was used for all the supported tc actions since all of them were implemented by adding ACL table entries. In the next commits, the dpaa2-switch driver will gain mirroring support which is not using the same HW feature. Make sure that we specify the ACL in the function name so that we make it clear that it's only used for specific actions. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29qede: Remove the qede module versionShai Malin
Removing the qede module version which is not needed and not allowed with inbox drivers. Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29qed: Remove the qed module versionShai Malin
Removing the qed module version which is not needed and not allowed with inbox drivers. Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29Merge branch 'sja110-vlan-fixes'David S. Miller
Vladimir Oltean says: ==================== NXP SJA1105 VLAN regressions These are 3 patches to fix issues seen with some more varied testing done after the changes in the "Traffic termination for sja1105 ports under VLAN-aware bridge" series were made: https://patchwork.kernel.org/project/netdevbpf/cover/20210726165536.1338471-1-vladimir.oltean@nxp.com/ Issue 1: traffic no longer works on a port after leaving a VLAN-aware bridge Issue 2: untagged traffic not dropped if pvid is absent from a VLAN-aware port Issue 3: PTP and STP broken on ports under a VLAN-aware bridge ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: dsa: tag_sja1105: fix control packets on SJA1110 being received on an ↵Vladimir Oltean
imprecise port On RX, a control packet with SJA1110 will have: - an in-band control extension (DSA tag) composed of a header and an optional trailer (if it is a timestamp frame). We can (and do) deduce the source port and switch id from this. - a VLAN header, which can either be the tag_8021q RX VLAN (pvid) or the bridge VLAN. The sja1105_vlan_rcv() function attempts to deduce the source port and switch id a second time from this. The basic idea is that even though we don't need the source port information from the tag_8021q header if it's a control packet, we do need to strip that header before we pass it on to the network stack. The problem is that we call sja1105_vlan_rcv for ports under VLAN-aware bridges, and that function tells us it couldn't identify a tag_8021q header, so we need to perform imprecise RX by VID. Well, we don't, because we already know the source port and switch ID. This patch drops the return value from sja1105_vlan_rcv and we just look at the source_port and switch_id values from sja1105_rcv and sja1110_rcv which were initialized to -1. If they are still -1 it means we need to perform imprecise RX. Fixes: 884be12f8566 ("net: dsa: sja1105: add support for imprecise RX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: dsa: sja1105: make sure untagged packets are dropped on ingress ports ↵Vladimir Oltean
with no pvid Surprisingly, this configuration: ip link add br0 type bridge vlan_filtering 1 ip link set swp2 master br0 bridge vlan del dev swp2 vid 1 still has the sja1105 switch sending untagged packets to the CPU (and failing to decode them, since dsa_find_designated_bridge_port_by_vid searches by VID 1 and rightfully finds no bridge VLAN 1 on a port). Dumping the switch configuration, the VLANs are managed properly: - the pvid of swp2 is 1 in the MAC Configuration Table, but - only the CPU port is in the port membership of VLANID 1 in the VLAN Lookup Table When the ingress packets are tagged with VID 1, they are properly dropped. But when they are untagged, they are able to reach the CPU port. Also, when the pvid in the MAC Configuration Table is changed to e.g. 55 (an unused VLAN), the untagged packets are also dropped. So it looks like: - the switch bypasses ingress VLAN membership checks for untagged traffic - the reason why the untagged traffic is dropped when I make the pvid 55 is due to the lack of valid destination ports in VLAN 55, rather than an ingress membership violation - the ingress VLAN membership cheks are only done for VLAN-tagged traffic Interesting. It looks like there is an explicit bit to drop untagged traffic, so we should probably be using that to preserve user expectations. Note that only VLAN-aware ports should drop untagged packets due to no pvid - when VLAN-unaware, the software bridge doesn't do this even if there is no pvid on any bridge port and on the bridge itself. So the new sja1105_drop_untagged() function cannot simply be called with "false" from sja1105_bridge_vlan_add() and with "true" from sja1105_bridge_vlan_del. Instead, we need to also consider the VLAN awareness state. That means we need to hook the "drop untagged" setting in all the same places where the "commit pvid" logic is, and it needs to factor in all the state when flipping the "drop untagged" bit: is our current pvid in the VLAN Lookup Table, and is the current port in that VLAN's port membership list? VLAN-unaware ports will never drop untagged frames because these checks always succeed by construction, and the tag_8021q VLANs cannot be changed by the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: dsa: sja1105: reset the port pvid when leaving a VLAN-aware bridgeVladimir Oltean
Now that we no longer have the ultra-central sja1105_build_vlan_table(), we need to be more careful about checking all corner cases manually. For example, when a port leaves a VLAN-aware bridge, it becomes standalone so its pvid should become a tag_8021q RX VLAN again. However, sja1105_commit_pvid() only gets called from sja1105_bridge_vlan_add() and from sja1105_vlan_filtering(), and no VLAN awareness change takes place (VLAN filtering is a global setting for sja1105, so the switch remains VLAN-aware overall). This means that we need to put another sja1105_commit_pvid() call in sja1105_bridge_member(). Fixes: 6dfd23d35e75 ("net: dsa: sja1105: delete vlan delta save/restore logic") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29Merge branch 'mctp'David S. Miller
Jeremy Kerr says: ==================== Add Management Component Transport Protocol support This series adds core MCTP support to the kernel. From the Kconfig description: Management Component Transport Protocol (MCTP) is an in-system protocol for communicating between management controllers and their managed devices (peripherals, host processors, etc.). The protocol is defined by DMTF specification DSP0236. This option enables core MCTP support. For communicating with other devices, you'll want to enable a driver for a specific hardware channel. This implementation allows a sockets-based API for sending and receiving MCTP messages via sendmsg/recvmsg on SOCK_DGRAM sockets. Kernel stack control is all via netlink, using existing RTM_* messages. The userspace ABI change is fairly small; just the necessary AF_/ETH_P_/ARPHDR_ constants, a new sockaddr, and a new netlink attribute. For MAINTAINERS, I've just included netdev@ as the list entry. I'm happy to alter this based on preferences here - an alternative would be the OpenBMC list (the main user of the MCTP interface), or we can create a new list entirely. We have a couple of interface drivers almost ready to go at the moment, but those can wait until the core code has some review. This is v4 of the series; v1 and v2 were both RFC. selinux folks: CCing 01/15 due to the new PF_MCTP protocol family. linux-doc folks: CCing 15/15 for the new MCTP overview document. Review, comments, questions etc. are most welcome. Cheers, Jeremy v2: - change to match spec terminology: controller -> component - require specific capabilities for bind() & sendmsg() - add address and tag defintions to uapi - add selinux AF_MCTP table definitions - remove strict cflags; warnings are present in common headers v3: - require caps for MCTP bind() & send() - comment typo fixes - switch to an array for local EIDs - fix addrinfo dump iteration & error path - add RTM_DELADDR - remove GENMASK() and BIT() from uapi v4: - drop tun patch; that can be submitted separately - keep nipa happy: add maintainer CCs, including doc and selinux - net-next rebase - Include AF_MCTP in af_family_slock_keys and pf_family_names - Introduce MODULE_ definitions earlier - upstream change: set_link_af no longer called with RTNL held - add kdoc for net_device.mctp_ptr - don't inline mctp_rt_match_eid - require rtm_type == RTN_UNICAST in route management handlers - remove unused RTAX policy table - fix mctp_sock->keys rcu annotations - fix spurious rcu_read_unlock in route input ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add MCTP overview documentJeremy Kerr
This change adds a brief document about the sockets API provided for sending and receiving MCTP messages from userspace. This is roughly based on the OpenBMC design document, at: https://github.com/openbmc/docs/blob/master/designs/mctp/mctp-kernel.md Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Allow per-netns default networksMatt Johnston
Currently we have a compile-time default network (MCTP_INITIAL_DEFAULT_NET). This change introduces a default_net field on the net namespace, allowing future configuration for new interfaces. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add dest neighbour lladdr to route outputMatt Johnston
Now that we have a neighbour implementation, hook it up to the output path to set the dest hardware address for outgoing packets. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Implement message fragmentation & reassemblyJeremy Kerr
This change implements MCTP fragmentation (based on route & device MTU), and corresponding reassembly. The MCTP specification only allows for fragmentation on the originating message endpoint, and reassembly on the destination endpoint - intermediate nodes do not need to reassemble/refragment. Consequently, we only fragment in the local transmit path, and reassemble locally-bound packets. Messages are required to be in-order, so we simply cancel reassembly on out-of-order or missing packets. In the fragmentation path, we just break up the message into MTU-sized fragments; the skb structure is a simple copy for now, which we can later improve with a shared data implementation. For reassembly, we keep track of incoming message fragments using the existing tag infrastructure, allocating a key on the (src,dest,tag) tuple, and reassembles matching fragments into a skb->frag_list. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Populate socket implementationJeremy Kerr
Start filling-out the socket syscalls: bind, sendmsg & recvmsg. This requires an input route implementation, so we add to mctp_route_input, allowing lookups on binds & message tags. This just handles single-packet messages at present, we will add fragmentation in a future change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add neighbour netlink interfaceMatt Johnston
This change adds the netlink interfaces for manipulating the MCTP neighbour table. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add neighbour implementationMatt Johnston
Add an initial neighbour table implementation, to be used in the route output path. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add netlink route managementMatt Johnston
This change adds RTM_GETROUTE, RTM_NEWROUTE & RTM_DELROUTE handlers, allowing management of the MCTP route table. Includes changes from Jeremy Kerr <jk@codeconstruct.com.au>. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add initial routing frameworkJeremy Kerr
Add a simple routing table, and a couple of route output handlers, and the mctp packet_type & handler. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add device handling and netlink interfaceJeremy Kerr
This change adds the infrastructure for managing MCTP netdevices; we add a pointer to the AF_MCTP-specific data to struct netdevice, and hook up the rtnetlink operations for adding and removing addresses. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add initial driver infrastructureJeremy Kerr
Add an empty drivers/net/mctp/, for future interface drivers. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add sockaddr_mctp to uapiJeremy Kerr
This change introduces the user-visible MCTP header, containing the protocol-specific addressing definitions. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add base packet definitionsJeremy Kerr
Simple packet header format as defined by DMTF DSP0236. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add base socket/protocol definitionsJeremy Kerr
Add an empty socket implementation, plus initialisation/destruction handlers. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add MCTP baseJeremy Kerr
Add basic Kconfig, an initial (empty) af_mctp source object, and {AF,PF}_MCTP definitions, and the required definitions for a new protocol type. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>