Age | Commit message (Collapse) | Author |
|
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged and refactor the related code accordingly:
$ pahole -C group_filter net/ipv4/ip_sockglue.o
struct group_filter {
union {
struct {
__u32 gf_interface_aux; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct __kernel_sockaddr_storage gf_group_aux; /* 8 128 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
__u32 gf_fmode_aux; /* 136 4 */
__u32 gf_numsrc_aux; /* 140 4 */
struct __kernel_sockaddr_storage gf_slist[1]; /* 144 128 */
}; /* 0 272 */
struct {
__u32 gf_interface; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct __kernel_sockaddr_storage gf_group; /* 8 128 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
__u32 gf_fmode; /* 136 4 */
__u32 gf_numsrc; /* 140 4 */
struct __kernel_sockaddr_storage gf_slist_flex[0]; /* 144 0 */
}; /* 0 144 */
}; /* 0 272 */
/* size: 272, cachelines: 5, members: 1 */
/* last cacheline: 16 bytes */
};
$ pahole -C compat_group_filter net/ipv4/ip_sockglue.o
struct compat_group_filter {
union {
struct {
__u32 gf_interface_aux; /* 0 4 */
struct __kernel_sockaddr_storage gf_group_aux __attribute__((__aligned__(4))); /* 4 128 */
/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
__u32 gf_fmode_aux; /* 132 4 */
__u32 gf_numsrc_aux; /* 136 4 */
struct __kernel_sockaddr_storage gf_slist[1] __attribute__((__aligned__(4))); /* 140 128 */
} __attribute__((__packed__)) __attribute__((__aligned__(4))); /* 0 268 */
struct {
__u32 gf_interface; /* 0 4 */
struct __kernel_sockaddr_storage gf_group __attribute__((__aligned__(4))); /* 4 128 */
/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
__u32 gf_fmode; /* 132 4 */
__u32 gf_numsrc; /* 136 4 */
struct __kernel_sockaddr_storage gf_slist_flex[0] __attribute__((__aligned__(4))); /* 140 0 */
} __attribute__((__packed__)) __attribute__((__aligned__(4))); /* 0 140 */
} __attribute__((__aligned__(1))); /* 0 268 */
/* size: 268, cachelines: 5, members: 1 */
/* forced alignments: 1 */
/* last cacheline: 12 bytes */
} __attribute__((__packed__));
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Nikolay Aleksandrov says:
====================
net: bridge: fix recent ioctl changes
These are three fixes for the recent bridge removal of ndo_do_ioctl
done by commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of
.ndo_do_ioctl"). Patch 01 fixes a deadlock of the new bridge ioctl
hook lock and rtnl by taking a netdev reference and always taking the
bridge ioctl lock first then rtnl from within the bridge hook.
Patch 02 fixes old_deviceless() bridge calls device name argument, and
patch 03 checks in dev_ifsioc()'s SIOCBRADD/DELIF cases if the netdevice is
actually a bridge before interpreting its private ptr as net_bridge.
Patch 01 was tested by running old bridge-utils commands with lockdep
enabled. Patch 02 was tested again by using bridge-utils and using the
respective ioctl calls on a "up" bridge device. Patch 03 was tested by
using the addif ioctl on a non-bridge device (e.g. loopback).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
changed SIOCBRADD/DELIF to use bridge's ioctl hook (br_ioctl_hook)
without checking if the target netdevice is actually a bridge which can
cause crashes and generally interpreting other devices' private pointers
as net_bridge pointers.
Crash example (lo - loopback):
$ brctl addif lo ens16
BUG: kernel NULL pointer dereference, address: 000000000000059898
#PF: supervisor read access in kernel modede
#PF: error_code(0x0000) - not-present pagege
PGD 0 P4D 0 ^Ac
Oops: 0000 [#1] SMP NOPTI
CPU: 2 PID: 1376 Comm: brctl Kdump: loaded Tainted: G W 5.14.0-rc3+ #405
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
RIP: 0010:add_del_if+0x1f/0x7c [bridge]
Code: 80 bf 1b a0 41 5c e9 c0 3c 03 e1 0f 1f 44 00 00 41 55 41 54 41 89 f4 be 0c 00 00 00 55 48 89 fd 53 48 8b 87 88 00 00 00 89 d3 <4c> 8b a8 98 05 00 00 49 8b bd d0 00 00 00 e8 17 d7 f3 e0 84 c0 74
RSP: 0018:ffff888109d97cb0 EFLAGS: 00010202^Ac
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff888101239bc0
RBP: ffff888101239bc0 R08: 0000000000000001 R09: 0000000000000000
R10: ffff888109d97cd8 R11: 00000000000000a3 R12: 0000000000000012
R13: 0000000000000000 R14: ffff888101239bc0 R15: ffff888109d97e10
FS: 00007fc1e365b540(0000) GS:ffff88822be80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000598 CR3: 0000000106506000 CR4: 00000000000006e0
Call Trace:
br_ioctl_stub+0x7c/0x441 [bridge]
br_ioctl_call+0x6d/0x8a
dev_ifsioc+0x325/0x4e8
dev_ioctl+0x46b/0x4e1
sock_do_ioctl+0x7b/0xad
sock_ioctl+0x2de/0x2f2
vfs_ioctl+0x1e/0x2b
__do_sys_ioctl+0x63/0x86
do_syscall_64+0xcb/0xf2
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc1e3589427
Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc8d501d38 EFLAGS: 00000202 ORIG_RAX: 000000000000001010
RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fc1e3589427
RDX: 00007ffc8d501d60 RSI: 00000000000089a3 RDI: 0000000000000003
RBP: 00007ffc8d501d60 R08: 0000000000000000 R09: fefefeff77686d74
R10: fffffffffffff8f9 R11: 0000000000000202 R12: 00007ffc8d502e06
R13: 00007ffc8d502e06 R14: 0000000000000000 R15: 0000000000000000
Modules linked in: bridge stp llc bonding ipv6 virtio_net [last unloaded: llc]^Ac
CR2: 0000000000000598
Reported-by: syzbot+79f4a8692e267bdb7227@syzkaller.appspotmail.com
Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
changed the source of the argument copy in bridge's old_deviceless() from
args[1] (user ptr to device name) to uarg (ptr to ioctl arguments) causing
wrong device name to be used.
Example (broken, bridge exists but is up):
$ brctl delbr bridge
bridge bridge doesn't exist; can't delete it
Example (working):
$ brctl delbr bridge
bridge bridge is still up; can't delete it
Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of
.ndo_do_ioctl") the bridge ioctl calls were divided in two parts:
one was deviceless called by sock_ioctl and didn't expect rtnl to be held,
the other was with a device called by dev_ifsioc() and expected rtnl to be
held. After the commit above they were united in a single ioctl stub, but
it didn't take care of the locking expectations.
For sock_ioctl now we acquire (1) br_ioctl_mutex, (2) rtnl
and for dev_ifsioc we acquire (1) rtnl, (2) br_ioctl_mutex
The fix is to get a refcnt on the netdev for dev_ifsioc calls and drop rtnl
then to reacquire it in the bridge ioctl stub after br_ioctl_mutex has
been acquired. That will avoid playing locking games and make the rules
straight-forward: we always take br_ioctl_mutex first, and then rtnl.
Reported-by: syzbot+34fe5894623c4ab1b379@syzkaller.appspotmail.com
Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Revert the use of structr_size() and stay with IP_MSFILTER_SIZE() for
now, as in this case, the size of struct ip_msfilter didn't change with
the addition of the flexible array imsf_slist_flex[]. So, if we use
struct_size() we will be allocating and calculating the size of
struct ip_msfilter with one too many items for imsf_slist_flex[].
We might use struct_size() in the future, but for now let's stay
with IP_MSFILTER_SIZE().
Fixes: 2d3e5caf96b9 ("net/ipv4: Replace one-element array with flexible-array member")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock
reference") introduces a serious regression at the GRO layer setting
the wrong truesize for stolen-head skbs.
Restore the correct truesize: SKB_DATA_ALIGN(...) instead of
SKB_TRUESIZE(...)
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alex Elder says:
====================
net: ipa: more work toward runtime PM
The first two patches in this series are basically bug fixes, but in
practice I don't think we've seen the problems they might cause.
The third patch moves clock and interconnect related error messages
around a bit, reporting better information and doing so in the
functions where they are enabled or disabled (rather than those
functions' callers).
The last three patches move power-related code into "ipa_clock.c",
as a step toward generalizing the purpose of that source file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ipa->flags field is only ever used in "ipa_clock.c", related to
suspend/resume activity.
Move the definition of the ipa_flag enumerated type to "ipa_clock.c".
And move the flags field from the ipa structure and to the ipa_clock
structure. Rename the type and its values to include "power" or
"POWER" in the name.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move ipa_suspend_handler() into "ipa_clock.c" from "ipa_main.c", to
group with the reset of the suspend/resume code. This IPA interrupt
is triggered if an IPA RX endpoint is suspended but has a packet to
be delivered.
Introduce ipa_power_setup() and ipa_power_teardown() to add and
remove the handler for the IPA SUSPEND interrupt at the same place
as before, while allowing the handler to remain private.
The "power" naming convention will be adopted elsewhere in this
file as well (soon).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move ipa_suspend() and ipa_resume(), as well as the definition of
the ipa_pm_ops structure into "ipa_clock.c". Make ipa_pm_ops public
and declare it as extern in "ipa_clock.h".
This is part of centralizing IPA power management functionality into
"ipa_clock.c" (the file will eventually get a name change).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rearrange messages reported when errors occur in the IPA clock code,
so that the specific interconnect is identified when an error occurs
enabling or disabling it, or the core clock is indicated when an
error occurs enabling it.
Have ipa_interconnect_disable() return zero or the negative error
value returned by the first interconnect that produced an error
when disabled. For now, the callers ignore the returned value.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Assign the ipa->modem_netdev and endpoint->netdev pointers *before*
registering the network device. As soon as the device is
registered it can be opened, and by that time we'll want those
pointers valid.
Similarly, don't make those pointers NULL until *after* the modem
network device is unregistered in ipa_modem_stop().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The modem network device is set up by ipa_modem_start(). But its
TX queue is not actually started and endpoints enabled until it is
opened.
So avoid stopping the modem network device TX queue and disabling
endpoints on suspend or stop unless the netdev is marked UP. And
skip attempting to resume unless it is UP.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir Oltean says:
====================
NXP SJA1105 driver support for "H" switch topologies
Changes in v3:
Preserve the behavior of dsa_tree_setup_default_cpu() which is to pick
the first CPU port and not the last.
Changes in v2:
Send as non-RFC, drop the patches for discarding DSA-tagged packets on
user ports and DSA-untagged packets on DSA and CPU ports for now.
NXP builds boards like the Bluebox 3 where there are multiple SJA1110
switches connected to an LX2160A, but they are also connected to each
other. I call this topology an "H" tree because of the lateral
connection between switches. A piece extracted from a non-upstream
device tree looks like this:
&spi_bridge {
/* SW1 */
ethernet-switch@0 {
compatible = "nxp,sja1110a";
reg = <0>;
dsa,member = <0 0>;
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
/* SW1_P1 */
port@1 {
reg = <1>;
label = "con_2x20";
phy-mode = "sgmii";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@2 {
reg = <2>;
ethernet = <&dpmac17>;
phy-mode = "rgmii-id";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@3 {
reg = <3>;
label = "1ge_p1";
phy-mode = "rgmii-id";
phy-handle = <&sw1_mii3_phy>;
};
sw1p4: port@4 {
reg = <4>;
link = <&sw2p1>;
phy-mode = "sgmii";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@5 {
reg = <5>;
label = "trx1";
phy-mode = "internal";
phy-handle = <&sw1_port5_base_t1_phy>;
};
port@6 {
reg = <6>;
label = "trx2";
phy-mode = "internal";
phy-handle = <&sw1_port6_base_t1_phy>;
};
port@7 {
reg = <7>;
label = "trx3";
phy-mode = "internal";
phy-handle = <&sw1_port7_base_t1_phy>;
};
port@8 {
reg = <8>;
label = "trx4";
phy-mode = "internal";
phy-handle = <&sw1_port8_base_t1_phy>;
};
port@9 {
reg = <9>;
label = "trx5";
phy-mode = "internal";
phy-handle = <&sw1_port9_base_t1_phy>;
};
port@a {
reg = <10>;
label = "trx6";
phy-mode = "internal";
phy-handle = <&sw1_port10_base_t1_phy>;
};
};
};
/* SW2 */
ethernet-switch@2 {
compatible = "nxp,sja1110a";
reg = <2>;
dsa,member = <0 1>;
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
sw2p1: port@1 {
reg = <1>;
link = <&sw1p4>;
phy-mode = "sgmii";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@2 {
reg = <2>;
ethernet = <&dpmac18>;
phy-mode = "rgmii-id";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@3 {
reg = <3>;
label = "1ge_p2";
phy-mode = "rgmii-id";
phy-handle = <&sw2_mii3_phy>;
};
port@4 {
reg = <4>;
label = "to_sw3";
phy-mode = "2500base-x";
fixed-link {
speed = <2500>;
full-duplex;
};
};
port@5 {
reg = <5>;
label = "trx7";
phy-mode = "internal";
phy-handle = <&sw2_port5_base_t1_phy>;
};
port@6 {
reg = <6>;
label = "trx8";
phy-mode = "internal";
phy-handle = <&sw2_port6_base_t1_phy>;
};
port@7 {
reg = <7>;
label = "trx9";
phy-mode = "internal";
phy-handle = <&sw2_port7_base_t1_phy>;
};
port@8 {
reg = <8>;
label = "trx10";
phy-mode = "internal";
phy-handle = <&sw2_port8_base_t1_phy>;
};
port@9 {
reg = <9>;
label = "trx11";
phy-mode = "internal";
phy-handle = <&sw2_port9_base_t1_phy>;
};
port@a {
reg = <10>;
label = "trx12";
phy-mode = "internal";
phy-handle = <&sw2_port10_base_t1_phy>;
};
};
};
};
Basically it is a single DSA tree with 2 "ethernet" properties, i.e. a
multi-CPU-port system. There is also a DSA link between the switches,
but it is not a daisy chain topology, i.e. there is no "upstream" and
"downstream" switch, the DSA link is only to be used for the bridge data
plane (autonomous forwarding between switches, between the RJ-45 ports
and the automotive Ethernet ports), otherwise all traffic that should
reach the host should do so through the dedicated CPU port of the switch.
Of course, plain forwarding in this topology is bound to create packet
loops. I have thought long and hard about strategies to cut forwarding
in such a way as to prevent loops but also not impede normal operation
of the network on such a system, and I believe I have found a solution
that does work as expected. This relies heavily on DSA's recent ability
to perform RX filtering towards the host by installing MAC addresses as
static FDB entries. Since we have 2 distinct DSA masters, we have 2
distinct MAC addresses, and if the bridge is configured to have its own
MAC address that makes it 3 distinct MAC addresses. The bridge core,
plus the switchdev_handle_fdb_add_to_device() extension, handle each MAC
address by replicating it to each port of the DSA switch tree. So the
end result is that both switch 1 and switch 2 will have static FDB
entries towards their respective CPU ports for the 3 MAC addresses
corresponding to the DSA masters and to the bridge net device (and of
course, towards any station learned on a foreign interface).
So I think the basic design works, and it is basically just as fragile
as any other multi-CPU-port system is bound to be in terms of reliance
on static FDB entries towards the host (if hardware address learning on
the CPU port is to be used, MAC addresses would randomly bounce between
one CPU port and the other otherwise). In fact, I think it is even
better to start DSA's support of multi-CPU-port systems with something
small like the NXP Bluebox 3, because we allow some time for the code
paths like dsa_switch_host_address_match(), which were specifically
designed for it, to break in, and this board needs no user space
configuration of CPU ports, like static assignments between user and CPU
ports, or bonding between the CPU ports/DSA masters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Right now, address learning is disabled on DSA ports, which means that a
packet received over a DSA port from a cross-chip switch will be flooded
to unrelated ports.
It is desirable to eliminate that, but for that we need a breakdown of
the possibilities for the sja1105 driver. A DSA port can be:
- a downstream-facing cascade port. This is simple because it will
always receive packets from a downstream switch, and there should be
no other route to reach that downstream switch in the first place,
which means it should be safe to learn that MAC address towards that
switch.
- an upstream-facing cascade port. This receives packets either:
* autonomously forwarded by an upstream switch (and therefore these
packets belong to the data plane of a bridge, so address learning
should be ok), or
* injected from the CPU. This deserves further discussion, as normally,
an upstream-facing cascade port is no different than the CPU port
itself. But with "H" topologies (a DSA link towards a switch that
has its own CPU port), these are more "laterally-facing" cascade
ports than they are "upstream-facing". Here, there is a risk that
the port might learn the host addresses on the wrong port (on the
DSA port instead of on its own CPU port), but this is solved by
DSA's RX filtering infrastructure, which installs the host addresses
as static FDB entries on the CPU port of all switches in a "H" tree.
So even if there will be an attempt from the switch to migrate the
FDB entry from the CPU port to the laterally-facing cascade port, it
will fail to do that, because the FDB entry that already exists is
static and cannot migrate. So address learning should be safe for
this configuration too.
Ok, so what about other MAC addresses coming from the host, not
necessarily the bridge local FDB entries? What about MAC addresses
dynamically learned on foreign interfaces, isn't there a risk that
cascade ports will learn these entries dynamically when they are
supposed to be delivered towards the CPU port? Well, that is correct,
and this is why we also need to enable the assisted learning feature, to
snoop for these addresses and write them to hardware as static FDB
entries towards the CPU, to make the switch's learning process on the
cascade ports ineffective for them. With assisted learning enabled, the
hardware learning on the CPU port must be disabled.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
H topologies like this one have a problem:
eth0 eth1
| |
CPU port CPU port
| DSA link |
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 -------- sw1p4 sw1p3 sw1p2 sw1p1 sw1p0
| | | | | |
user user user user user user
port port port port port port
Basically any packet sent by the eth0 DSA master can be flooded on the
interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the
eth1 DSA master too. Basically we are talking to ourselves.
In VLAN-unaware mode, these packets are encoded using a tag_8021q TX
VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains.
Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN
which _can_ be decoded by the tagger running on eth1, so it will attempt
to reinject that packet into the network stack (the bridge, if there is
any port under eth1 that is under a bridge). In the case where the ports
under eth1 are under the same cross-chip bridge as the ports under eth0,
the TX packets will even be learned as RX packets. The only thing that
will prevent loops with the software bridging path, and therefore
disaster, is that the source port and the destination port are in the
same hardware domain, and the bridge will receive packets from the
driver with skb->offload_fwd_mark = true and will not forward between
the two.
The proper solution to this problem is to detect H topologies and
enforce that all packets are received through the local switch and we do
not attempt to receive packets on our CPU port from switches that have
their own. This is a viable solution which works thanks to the fact that
MAC addresses which should be filtered towards the host are installed by
DSA as static MAC addresses towards the CPU port of each switch.
TX from a CPU port towards the DSA port continues to be allowed, this is
because sja1105 supports bridge TX forwarding offload, and the skb->dev
used initially for xmit does not have any direct correlation with where
the station that will respond to that packet is connected. It may very
well happen that when we send a ping through a br0 interface that spans
all switch ports, the xmit packet will exit the system through a DSA
switch interface under eth1 (say sw1p2), but the destination station is
connected to a switch port under eth0, like sw0p0. So the switch under
eth1 needs to communicate on TX with the switch under eth0. The
response, however, will not follow the same path, but instead, this
patch enforces that the response is sent by the first switch directly to
its DSA master which is eth0.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since all packets are transmitted as VLAN-tagged over a DSA link (this
VLAN tag represents the tag_8021q header), we need to increase the MTU
of these interfaces to account for the possibility that we are already
transporting a user-visible VLAN header.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit ed040abca4c1 ("net: dsa: sja1105: use 4095 as the private
VLAN for untagged traffic"), this driver uses a reserved value as pvid
for the host port (DSA CPU port). Control packets which are sent as
untagged get classified to this VLAN, and all ports are members of it
(this is to be expected for control packets).
Manage all cascade ports in the same way and allow control packets to
egress everywhere.
Also, all VLANs need to be sent as egress-tagged on all cascade ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Manage DSA links towards other switches, be they host ports or cascade
ports, the same as the CPU port, i.e. allow forwarding and flooding
unconditionally from all user ports.
We send packets as always VLAN-tagged on a DSA port, and we rely on the
cross-chip notifiers from tag_8021q to install the RX VLAN of a switch
port only on the proper remote ports of another switch (the ports that
are in the same bridging domain). So if there is no cross-chip bridging
in the system, the flooded packets will be sent on the DSA ports too,
but they will be dropped by the remote switches due to either
(a) a lack of the RX VLAN in the VLAN table of the ingress DSA port, or
(b) a lack of valid destinations for those packets, due to a lack of the
RX VLAN on the user ports of the switch
Note that switches which only transport packets in a cross-chip bridge,
but have no user ports of their own as part of that bridge, such as
switch 1 in this case:
DSA link DSA link
sw0p0 sw0p1 sw0p2 -------- sw1p0 sw1p2 sw1p3 -------- sw2p0 sw2p2 sw2p3
ip link set sw0p0 master br0
ip link set sw2p3 master br0
will still work, because the tag_8021q cross-chip notifiers keep the RX
VLANs installed on all DSA ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The sja1105 switch family has a feature called "cascade ports" which can
be used in topologies where multiple SJA1105/SJA1110 switches are daisy
chained. Upstream switches set this bit for the DSA link towards the
downstream switches. This is used when the upstream switch receives a
control packet (PTP, STP) from a downstream switch, because if the
source port for a control packet is marked as a cascade port, then the
source port, switch ID and RX timestamp will not be taken again on the
upstream switch, it is assumed that this has already been done by the
downstream switch (the leaf port in the tree) and that the CPU has
everything it needs to decode the information from this packet.
We need to distinguish between an upstream-facing DSA link and a
downstream-facing DSA link, because the upstream-facing DSA links are
"host ports" for the SJA1105/SJA1110 switches, and the downstream-facing
DSA links are "cascade ports".
Note that SJA1105 supports a single cascade port, so only daisy chain
topologies work. With SJA1110, there can be more complex topologies such
as:
eth0
|
host port
|
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4
| | | |
cascade cascade user user
port port port port
| |
| |
| |
| host
| port
| |
| sw1p0 sw1p1 sw1p2 sw1p3 sw1p4
| | | | |
| user user user user
host port port port port
port
|
sw2p0 sw2p1 sw2p2 sw2p3 sw2p4
| | | |
user user user user
port port port port
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Be there an "H" switch topology, where there are 2 switches connected as
follows:
eth0 eth1
| |
CPU port CPU port
| DSA link |
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 -------- sw1p4 sw1p3 sw1p2 sw1p1 sw1p0
| | | | | |
user user user user user user
port port port port port port
basically one where each switch has its own CPU port for termination,
but there is also a DSA link in case packets need to be forwarded in
hardware between one switch and another.
DSA insists to see this as a daisy chain topology, basically registering
all network interfaces as sw0p0@eth0, ... sw1p0@eth0 and disregarding
eth1 as a valid DSA master.
This is only half the story, since when asked using dsa_port_is_cpu(),
DSA will respond that sw1p1 is a CPU port, however one which has no
dp->cpu_dp pointing to it. So sw1p1 is enabled, but not used.
Furthermore, be there a driver for switches which support only one
upstream port. This driver iterates through its ports and checks using
dsa_is_upstream_port() whether the current port is an upstream one.
For switch 1, two ports pass the "is upstream port" checks:
- sw1p4 is an upstream port because it is a routing port towards the
dedicated CPU port assigned using dsa_tree_setup_default_cpu()
- sw1p1 is also an upstream port because it is a CPU port, albeit one
that is disabled. This is because dsa_upstream_port() returns:
if (!cpu_dp)
return port;
which means that if @dp does not have a ->cpu_dp pointer (which is a
characteristic of CPU ports themselves as well as unused ports), then
@dp is its own upstream port.
So the driver for switch 1 rightfully says: I have two upstream ports,
but I don't support multiple upstream ports! So let me error out, I
don't know which one to choose and what to do with the other one.
Generally I am against enforcing any default policy in the kernel in
terms of user to CPU port assignment (like round robin or such) but this
case is different. To solve the conundrum, one would have to:
- Disable sw1p1 in the device tree or mark it as "not a CPU port" in
order to comply with DSA's view of this topology as a daisy chain,
where the termination traffic from switch 1 must pass through switch 0.
This is counter-productive because it wastes 1Gbps of termination
throughput in switch 1.
- Disable the DSA link between sw0p4 and sw1p4 and do software
forwarding between switch 0 and 1, and basically treat the switches as
part of disjoint switch trees. This is counter-productive because it
wastes 1Gbps of autonomous forwarding throughput between switch 0 and 1.
- Treat sw0p4 and sw1p4 as user ports instead of DSA links. This could
work, but it makes cross-chip bridging impossible. In this setup we
would need to have 2 separate bridges, br0 spanning the ports of
switch 0, and br1 spanning the ports of switch 1, and the "DSA links
treated as user ports" sw0p4 (part of br0) and sw1p4 (part of br1) are
the gateway ports between one bridge and another. This is hard to
manage from a user's perspective, who wants to have a unified view of
the switching fabric and the ability to transparently add ports to the
same bridge. VLANs would also need to be explicitly managed by the
user on these gateway ports.
So it seems that the only reasonable thing to do is to make DSA prefer
CPU ports that are local to the switch. Meaning that by default, the
user and DSA ports of switch 0 will get assigned to the CPU port from
switch 0 (sw0p1) and the user and DSA ports of switch 1 will get
assigned to the CPU port from switch 1.
The way this solves the problem is that sw1p4 is no longer an upstream
port as far as switch 1 is concerned (it no longer views sw0p1 as its
dedicated CPU port).
So here we are, the first multi-CPU port that DSA supports is also
perhaps the most uneventful one: the individual switches don't support
multiple CPUs, however the DSA switch tree as a whole does have multiple
CPU ports. No user space assignment of user ports to CPU ports is
desirable, necessary, or possible.
Ports that do not have a local CPU port (say there was an extra switch
hanging off of sw0p0) default to the standard implementation of getting
assigned to the first CPU port of the DSA switch tree. Is that good
enough? Probably not (if the downstream switch was hanging off of switch
1, we would most certainly prefer its CPU port to be sw1p1), but in
order to support that use case too, we would need to traverse the
dst->rtable in search of an optimum dedicated CPU port, one that has the
smallest number of hops between dp->ds and dp->cpu_dp->ds. At the
moment, the DSA routing table structure does not keep the number of hops
between dl->dp and dl->link_dp, and while it is probably deducible,
there is zero justification to write that code now. Let's hope DSA will
never have to support that use case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is nothing specific to having a default CPU port to what
dsa_tree_teardown_default_cpu() does. Even with multiple CPU ports,
it would do the same thing: iterate through the ports of this switch
tree and reset the ->cpu_dp pointer to NULL. So rename it accordingly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Three interconnects are defined for IPA version 4.9, but there
should only be two. They should also use names that match what's
used for other platforms (and specified in the Device Tree binding).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pointer hdr is being initialized and also re-assigned with the
same value from the call to function mctp_hdr. Static analysis reports
that the initializated value is unused. The second assignment is
duplicated and can be removed.
Addresses-Coverity: ("Unused value").
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the netif_receive xmit_mode is set, a line is supposed to set
clone_skb to a default 0 value. This line is made redundant due to a
preceding line that checks if clone_skb is more than zero and returns
-ENOTSUPP.
Overriding clone_skb to 0 does not make any difference to the behavior
because if it was positive we return error. So it can be either 0 or
negative, and in both cases the behavior is the same.
Remove redundant line that sets clone_skb to zero.
Signed-off-by: Nick Richardson <richardsonnick@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The OpenCompute timecard driver has additional functionality besides
a clock. Make the following resources available:
- The external timestamp channels (ts0/ts1)
- devlink support for flashing and health reporting
- GPS and MAC serial ports
- board serial number (obtained from i2c device)
Also add watchdog functionality for when GNSS goes into holdover.
The resources are collected under a timecard class directory:
[jlemon@timecard ~]$ ls -g /sys/class/timecard/ocp1/
total 0
-r--r--r--. 1 root 4096 Aug 3 19:49 available_clock_sources
-rw-r--r--. 1 root 4096 Aug 3 19:49 clock_source
lrwxrwxrwx. 1 root 0 Aug 3 19:49 device -> ../../../0000:04:00.0/
-r--r--r--. 1 root 4096 Aug 3 19:49 gps_sync
lrwxrwxrwx. 1 root 0 Aug 3 19:49 i2c -> ../../xiic-i2c.1024/i2c-2/
drwxr-xr-x. 2 root 0 Aug 3 19:49 power/
lrwxrwxrwx. 1 root 0 Aug 3 19:49 pps ->
../../../../../virtual/pps/pps1/
lrwxrwxrwx. 1 root 0 Aug 3 19:49 ptp -> ../../ptp/ptp2/
-r--r--r--. 1 root 4096 Aug 3 19:49 serialnum
lrwxrwxrwx. 1 root 0 Aug 3 19:49 subsystem ->
../../../../../../class/timecard/
lrwxrwxrwx. 1 root 0 Aug 3 19:49 ttyGPS -> ../../tty/ttyS7/
lrwxrwxrwx. 1 root 0 Aug 3 19:49 ttyMAC -> ../../tty/ttyS8/
-rw-r--r--. 1 root 4096 Aug 3 19:39 uevent
The labeling is needed at the minimum, in order to tell the serial
devices apart.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
tcp_sndbuf_expand()). If we've just created a new socket this adjustment
is enabled on it, but if one changes the socket buffer size by
setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
restore as it first needs to increase buffer sizes for packet queues
restore and second it needs to restore back original buffer sizes. So
after CRIU restore all sockets become non-auto-adjustable, which can
decrease network performance of restored applications significantly.
CRIU need to be able to restore sockets with enabled/disabled adjustment
to the same state it was before dump, so let's add special setsockopt
for it.
Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so
that using these interface one can reenable automatic socket buffer
adjustment on their sockets.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently we added multi-queue support to netdevsim in commit d4861fc6be58
("netdevsim: Add multi-queue support"); add a few control-plane selftests
for sch_mq using this new feature.
Use nsPlugin.py to avoid network interface name collisions.
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit b0e81817629a496854ff1799f6cbd89597db65fd. Explicit
driver dependency on the bridge is no longer needed since
switchdev_bridge_port_{,un}offload() is no longer implemented by the
bridge driver but by switchdev.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the introduction of explicit offloading API in switchdev in commit
2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge
ports are offloaded"), we started having Ethernet switch drivers calling
directly into a function exported by net/bridge/br_switchdev.c, which is
a function exported by the bridge driver.
This means that drivers that did not have an explicit dependency on the
bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
possible to call a symbol exported by a driver that can be built as
module unless you are a module too.
There was an attempt to solve the dependency issue in the form of commit
b0e81817629a ("net: build all switchdev drivers as modules when the
bridge is a module"). Grygorii Strashko, however, says about it:
| In my opinion, the problem is a bit bigger here than just fixing the
| build :(
|
| In case, of ^cpsw the switchdev mode is kinda optional and in many
| cases (especially for testing purposes, NFS) the multi-mac mode is
| still preferable mode.
|
| There were no such tight dependency between switchdev drivers and
| bridge core before and switchdev serviced as independent, notification
| based layer between them, so ^cpsw still can be "Y" and bridge can be
| "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
| will need to be set as "Y", or we will have to update drivers to
| support build with BRIDGE=n and maintain separate builds for
| networking vs non-networking testing. But is this enough? Wouldn't
| it cause 'chain reaction' required to add more and more "Y" options
| (like CONFIG_VLAN_8021Q)?
|
| PS. Just to be sure we on the same page - ARM builds will be forced
| (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
| automation testing will just fail with omap2plus_defconfig.
In the light of this, it would be desirable for some configurations to
avoid dependencies between switchdev drivers and the bridge, and have
the switchdev mode as completely optional within the driver.
Arnd Bergmann also tried to write a patch which better expressed the
build time dependency for Ethernet switch drivers where the switchdev
support is optional, like cpsw/am65-cpsw, and this made the drivers
follow the bridge (compile as module if the bridge is a module) only if
the optional switchdev support in the driver was enabled in the first
place:
https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
but this still did not solve the fact that cpsw and am65-cpsw now must
be built as modules when the bridge is a module - it just expressed
correctly that optional dependency. But the new behavior is an apparent
regression from Grygorii's perspective.
So to support the use case where the Ethernet driver is built-in,
NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
need a framework that can handle the possible absence of the bridge from
the running system, i.e. runtime bloatware as opposed to build-time
bloatware.
Luckily we already have this framework, since switchdev has been using
it extensively. Events from the bridge side are transmitted to the
driver side using notifier chains - this was originally done so that
unrelated drivers could snoop for events emitted by the bridge towards
ports that are implemented by other drivers (think of a switch driver
with LAG offload that listens for switchdev events on a bonding/team
interface that it offloads).
There are also events which are transmitted from the driver side to the
bridge side, which again are modeled using notifiers.
SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
notifying the bridge that a MAC address has been dynamically learned.
So there is a precedent we can use for modeling the new framework.
The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
that the bridge needs to do when a port becomes offloaded is blocking in
its nature: replay VLANs, MDBs etc. The calling context is indeed
blocking (we are under rtnl_mutex), but the existing switchdev
notification chain that the bridge is subscribed to is only the atomic
one. So we need to subscribe the bridge to the blocking switchdev
notification chain too.
This patch:
- keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
unchanged
- moves the implementation of switchdev_bridge_port_{,un}offload from
the bridge module into the switchdev module.
- makes everybody that is subscribed to the switchdev blocking notifier
chain "hear" offload & unoffload events
- makes the bridge driver subscribe and handle those events
- moves the bridge driver's handling of those events into 2 new
functions called br_switchdev_port_{,un}offload. These functions
contain in fact the core of the logic that was previously in
switchdev_bridge_port_{,un}offload, just that now we go through an
extra indirection layer to reach them.
Unlike all the other switchdev notification structures, the structure
used to carry the bridge port information, struct
switchdev_notifier_brport_info, does not contain a "bool handled".
This is because in the current usage pattern, we always know that a
switchdev bridge port offloading event will be handled by the bridge,
because the switchdev_bridge_port_offload() call was initiated by a
NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
couldn't have happened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2021-08-04
this is a pull request of 5 patches for net-next/master.
The first patch is by me and fixes a typo in a comment in the CAN
J1939 protocol.
The next 2 patches are by Oleksij Rempel and update the CAN J1939
protocol to send RX status updates via the error queue mechanism.
The next patch is by me and adds a missing variable initialization to
the flexcan driver (the problem was introduced in the current net-next
cycle).
The last patch is by Aswath Govindraju and adds power-domains to the
Bosch m_can DT binding documentation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Document power-domains property for adding the Power domain provider.
Link: https://lore.kernel.org/r/20210802091822.16407-1-a-govindraju@ti.com
Signed-off-by: Aswath Govindraju <a-govindraju@ti.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch adds the missing initialization of the "err" variable in
the flexcan_clks_enable() function.
Fixes: d9cead75b1c6 ("can: flexcan: add mcf5441x support")
Link: https://lore.kernel.org/r/20210728075428.1493568-1-mkl@pengutronix.de
Reported-by: kernel test robot <lkp@intel.com>
Cc: Angelo Dureghello <angelo@kernel-space.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
To be able to create applications with user friendly feedback, we need be
able to provide receive status information.
Typical ETP transfer may take seconds or even hours. To give user some
clue or show a progress bar, the stack should push status updates.
Same as for the TX information, the socket error queue will be used with
following new signals:
- J1939_EE_INFO_RX_RTS - received and accepted request to send signal.
- J1939_EE_INFO_RX_DPO - received data package offset signal
- J1939_EE_INFO_RX_ABORT - RX session was aborted
Instead of completion signal, user will get data package.
To activate this signals, application should set
SOF_TIMESTAMPING_RX_SOFTWARE to the SO_TIMESTAMPING socket option. This
will avoid unpredictable application behavior for the old software.
Link: https://lore.kernel.org/r/20210707094854.30781-3-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Prepare the world for the J1939_ERRQUEUE_RX_ version
Link: https://lore.kernel.org/r/20210707094854.30781-2-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
As presented last month in our "BIG TCP" talk at netdev 0x15,
we plan using IPv6 jumbograms.
One of the minor problem we talked about is the fact that
ip6_parse_tlv() is currently using tables to list known tlvs,
thus using potentially expensive indirect calls.
While we could mitigate this cost using macros from
indirect_call_wrapper.h, we also can get rid of the tables
and let the compiler emit optimized code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Justin Iurman <justin.iurman@uliege.be>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DENG Qingfang says:
====================
mt7530 software fallback bridging fix
DSA core has gained software fallback support since commit 2f5dc00f7a3e,
but it does not work properly on mt7530. This patch series fixes the
issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 7e777021780e ("mt7530 mt7530_fdb_write only set ivl
bit vid larger than 1").
Before this series, the default value of all ports' PVID is 1, which is
copied into the FDB entry, even if the ports are VLAN unaware. So
`bridge fdb show` will show entries like `dev swp0 vlan 1 self` even on
a VLAN-unaware bridge.
The blamed commit does not solve that issue completely, instead it may
cause a new issue that FDB is inaccessible in a VLAN-aware bridge with
PVID 1.
This series sets PVID to 0 on VLAN-unaware ports, so `bridge fdb show`
will no longer print `vlan 1` on VLAN-unaware bridges, and that special
case in fdb_write is not required anymore.
Set FDB entries' filter ID to 1 to match the VLAN table.
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As filter ID 1 is the only one used for bridges, set STP state on it.
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Consider the following bridge configuration, where bond0 is not
offloaded:
+-- br0 --+
/ / | \
/ / | \
/ | | bond0
/ | | / \
swp0 swp1 swp2 swp3 swp4
. . .
. . .
A B C
Ideally, when the switch receives a packet from swp3 or swp4, it should
forward the packet to the CPU, according to the port matrix and unknown
unicast flood settings.
But packet loss will happen if the destination address is at one of the
offloaded ports (swp0~2). For example, when client C sends a packet to
A, the FDB lookup will indicate that it should be forwarded to swp0, but
the port matrix of swp3 and swp4 is configured to only allow the CPU to
be its destination, so it is dropped.
However, this issue does not happen if the bridge is VLAN-aware. That is
because VLAN-aware bridges use independent VLAN learning, i.e. use VID
for FDB lookup, on offloaded ports. As swp3 and swp4 are not offloaded,
shared VLAN learning with default filter ID of 0 is used instead. So the
lookup for A with filter ID 0 never hits and the packet can be forwarded
to the CPU.
In the current code, only two combinations were used to toggle user
ports' VLAN awareness: one is PCR.PORT_VLAN set to port matrix mode with
PVC.VLAN_ATTR set to transparent port, the other is PCR.PORT_VLAN set to
security mode with PVC.VLAN_ATTR set to user port.
It turns out that only PVC.VLAN_ATTR contributes to VLAN awareness, and
port matrix mode just skips the VLAN table lookup. The reference manual
is somehow misleading when describing PORT_VLAN modes. It states that
PORT_MEM (VLAN port member) is used for destination if the VLAN table
lookup hits, but actually **PORT_MEM & PORT_MATRIX** (bitwise AND of
VLAN port member and port matrix) is used instead, which means we can
have two or more separate VLAN-aware bridges with the same PVID and
traffic won't leak between them.
Therefore, to solve this, enable independent VLAN learning with PVID 0
on VLAN-unaware bridges, by setting their PCR.PORT_VLAN to fallback
mode, while leaving standalone ports in port matrix mode. The CPU port
is always set to fallback mode to serve those bridges.
During testing, it is found that FDB lookup with filter ID of 0 will
also hit entries with VID 0 even with independent VLAN learning. To
avoid that, install all VLANs with filter ID of 1.
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Consider the following bridge configuration, where bond0 is not
offloaded:
+-- br0 --+
/ / | \
/ / | \
/ | | bond0
/ | | / \
swp0 swp1 swp2 swp3 swp4
. . .
. . .
A B C
Address learning is enabled on offloaded ports (swp0~2) and the CPU
port, so when client A sends a packet to C, the following will happen:
1. The switch learns that client A can be reached at swp0.
2. The switch probably already knows that client C can be reached at the
CPU port, so it forwards the packet to the CPU.
3. The bridge core knows client C can be reached at bond0, so it
forwards the packet back to the switch.
4. The switch learns that client A can be reached at the CPU port.
5. The switch forwards the packet to either swp3 or swp4, according to
the packet's tag.
That makes client A's MAC address flap between swp0 and the CPU port. If
client B sends a packet to A, it is possible that the packet is
forwarded to the CPU. With offload_fwd_mark = 1, the bridge core won't
forward it back to the switch, resulting in packet loss.
As we have the assisted_learning_on_cpu_port in DSA core now, enable
that and disable hardware learning on the CPU port.
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <oltean@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alex Elder says:
====================
net: ipa: prepare GSI interrupts for runtime PM
The last patch in this series arranges for GSI interrupts to be
disabled when the IPA hardware is suspended. This ensures the clock
is always operational when a GSI interrupt fires. Leading up to
that are patches that rearrange the code a bit to allow this to
be done.
The first two patches aren't *directly* related. They remove some
flag arguments to some GSI suspend/resume related functions, using
the version field now present in the GSI structure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce new functions gsi_suspend() and gsi_resume(), which will
disable the GSI interrupt handler after all endpoints are suspended
and re-enable it before endpoints are resumed. This will ensure no
GSI interrupt handler will fire when the hardware is suspended.
Here's a little further explanation. There are seven GSI interrupt
types, and most are disabled except when needed.
- These two are not used (never enabled):
GSI_INTER_EE_CH_CTRL
GSI_INTER_EE_EV_CTRL
- These two are only used to implement channel and event ring
commands, and are only enabled while a command is underway:
GSI_CH_CTRL
GSI_EV_CTRL
- The IEOB interrupt signals I/O completion. It will not fire
when a channel is stopped (or "suspended").
GSI_IEOB
- This interrupt is used to allocate or halt modem channels,
and is only enabled while such a command is underway.
GSI_GLOB_EE
However it also is used to signal certain errors, and this could
occur at any time.
- The general interrupt signals general errors, and could occur at
any time.
GSI_GENERAL
The purpose for this change is to ensure no global or general
interrupts fire due to errors while the hardware is suspended.
We enable the clock on resume, and at that time we can "handle"
(at least report) these error conditions.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The GSI IRQ handler could be triggered as soon as it is registered
with request_irq(). The handler function, gsi_isr(), touches
hardware, meaning the IPA clock must be operational. The IPA clock
is not operating when the handler is registered (in gsi_irq_init()),
so this is a problem.
Move the call to request_irq() for the GSI interrupt handler into
gsi_irq_setup(), which is called when the IPA clock is known to be
operational (and furthermore, the GSI firmware will have been
loaded). Request the IRQ at the end of that function, after all
interrupt types have been disabled and masked.
Move the matching free_irq() call into gsi_irq_teardown(), and get
rid of the now empty gsi_irq_exit(),
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change gsi_irq_setup() so it returns an error value, and introduce
gsi_irq_teardown() as its inverse. Set the interrupt type (IRQ
rather than MSI) in gsi_irq_setup().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move gsi_irq_setup() and gsi_ring_setup() so they're defined right
above gsi_setup() where they're called. This is a trivial movement
of code to prepare for upcoming patches.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the Boolean flags passed to __gsi_channel_start() and
__gsi_channel_stop() so they represent whether the request is being
made to implement suspend (versus stop) or resume (versus start).
Then stop or start the channel for suspend/resume requests only if
the hardware version indicates it should be done.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|