aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-10-02net: mscc: ocelot: write full VLAN TCI in the injection headerVladimir Oltean
The VLAN TCI contains more than the VLAN ID, it also has the VLAN PCP and Drop Eligibility Indicator. If the ocelot driver is going to write the VLAN header inside the DSA tag, it could just as well write the entire TCI. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-02net: mscc: ocelot: support egress VLAN rewriting via VCAP ES0Vladimir Oltean
Currently the ocelot driver does support the 'vlan modify' action, but in the ingress chain, and it is offloaded to VCAP IS1. This action changes the classified VLAN before the packet enters the bridging service, and the bridging works with the classified VLAN modified by VCAP IS1. That is good for some use cases, but there are others where the VLAN must be modified at the stage of the egress port, after the packet has exited the bridging service. One example is simulating IEEE 802.1CB active stream identification filters ("active" means that not only the rule matches on a packet flow, but it is also able to change some headers). For example, a stream is replicated on two egress ports, but they must have different VLAN IDs on egress ports A and B. This seems like a task for the VCAP ES0, but that currently only supports pushing the ES0 tag A, which is specified in the rule. Pushing another VLAN header is not what we want, but rather overwriting the existing one. It looks like when we push the ES0 tag A, it is actually possible to not only take the ES0 tag A's value from the rule itself (VID_A_VAL), but derive it from the following formula: ES0_TAG_A = Classified VID + VID_A_VAL Otherwise said, ES0_TAG_A can be used to increment with a given value the VLAN ID that the packet was already classified to, and the packet will have this value as an outer VLAN tag. This new VLAN ID value then gets stripped on egress (or not) according to the value of the native VLAN from the bridging service. While the hardware will happily increment the classified VLAN ID for all packets that match the ES0 rule, in practice this would be rather insane, so we only allow this kind of ES0 action if the ES0 filter contains a VLAN ID too, so as to restrict the matching on a known classified VLAN. If we program VID_A_VAL with the delta between the desired final VLAN (ES0_TAG_A) and the classified VLAN, we obtain the desired behavior. It doesn't look like it is possible with the tc-vlan action to modify the VLAN ID but not the PCP. In hardware it is possible to leave the PCP to the classified value, but we unconditionally program it to overwrite it with the PCP value from the rule. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== bpf-next 2021-10-02 We've added 85 non-merge commits during the last 15 day(s) which contain a total of 132 files changed, 13779 insertions(+), 6724 deletions(-). The main changes are: 1) Massive update on test_bpf.ko coverage for JITs as preparatory work for an upcoming MIPS eBPF JIT, from Johan Almbladh. 2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool, with driver support for i40e and ice from Magnus Karlsson. 3) Add legacy uprobe support to libbpf to complement recently merged legacy kprobe support, from Andrii Nakryiko. 4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky. 5) Support saving the register state in verifier when spilling <8byte bounded scalar to the stack, from Martin Lau. 6) Add libbpf opt-in for stricter BPF program section name handling as part of libbpf 1.0 effort, from Andrii Nakryiko. 7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov. 8) Fix skel_internal.h to propagate errno if the loader indicates an internal error, from Kumar Kartikeya Dwivedi. 9) Fix build warnings with -Wcast-function-type so that the option can later be enabled by default for the kernel, from Kees Cook. 10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it otherwise errors out when encountering them, from Toke Høiland-Jørgensen. 11) Teach libbpf to recognize specialized maps (such as for perf RB) and internally remove BTF type IDs when creating them, from Hengqi Chen. 12) Various fixes and improvements to BPF selftests. ==================== Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01devlink: report maximum number of snapshots with regionsJacob Keller
Each region has an independently configurable number of maximum snapshots. This information is not reported to userspace, making it not very discoverable. Fix this by adding a new DEVLINK_ATTR_REGION_MAX_SNAPSHOST attribute which is used to report this maximum. Ex: $devlink region pci/0000:af:00.0/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.0/device-caps: size 4096 snapshot [] max 10 pci/0000:af:00.1/nvm-flash: size 10485760 snapshot [] max 1 pci/0000:af:00.1/device-caps: size 4096 snapshot [] max 10 This information enables users to understand why a new region command may fail due to having too many existing snapshots. Reported-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/phy/bcm7xxx.c d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations") f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") net/sched/sch_api.c b193e15ac69d ("net: prevent user from passing illegal stab size") 69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers") Both cases trivial - adjacent code additions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30Merge tag 'net-5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, netfilter and bpf. Current release - regressions: - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from interrupt - mdio: revert mechanical patches which broke handling of optional resources - dev_addr_list: prevent address duplication Previous releases - regressions: - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb (NULL deref) - Revert "mac80211: do not use low data rates for data frames with no ack flag", fixing broadcast transmissions - mac80211: fix use-after-free in CCMP/GCMP RX - netfilter: include zone id in tuple hash again, minimize collisions - netfilter: nf_tables: unlink table before deleting it (race -> UAF) - netfilter: log: work around missing softdep backend module - mptcp: don't return sockets in foreign netns - sched: flower: protect fl_walk() with rcu (race -> UAF) - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup - smsc95xx: fix stalled rx after link change - enetc: fix the incorrect clearing of IF_MODE bits - ipv4: fix rtnexthop len when RTA_FLOW is present - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU - e100: fix length calculation & buffer overrun in ethtool::get_regs Previous releases - always broken: - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx - mac80211: drop frames from invalid MAC address in ad-hoc mode - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race -> UAF) - bpf, x86: Fix bpf mapping of atomic fetch implementation - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog - netfilter: ip6_tables: zero-initialize fragment offset - mhi: fix error path in mhi_net_newlink - af_unix: return errno instead of NULL in unix_create1() when over the fs.file-max limit Misc: - bpf: exempt CAP_BPF from checks against bpf_jit_limit - netfilter: conntrack: make max chain length random, prevent guessing buckets by attackers - netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic, defer conntrack walk to work queue (prevent hogging RTNL lock)" * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) af_unix: fix races in sk_peer_pid and sk_peer_cred accesses net: stmmac: fix EEE init issue when paired with EEE capable PHYs net: dev_addr_list: handle first address in __hw_addr_add_ex net: sched: flower: protect fl_walk() with rcu net: introduce and use lock_sock_fast_nested() net: phy: bcm7xxx: Fixed indirect MMD operations net: hns3: disable firmware compatible features when uninstall PF net: hns3: fix always enable rx vlan filter problem after selftest net: hns3: PF enable promisc for VF when mac table is overflow net: hns3: fix show wrong state when add existing uc mac address net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE net: hns3: don't rollback when destroy mqprio fail net: hns3: remove tc enable checking net: hns3: do not allow call hns3_nic_net_open repeatedly ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup net: bridge: mcast: Associate the seqcount with its protecting lock. net: mdio-ipq4019: Fix the error for an optional regs resource net: hns3: fix hclge_dbg_dump_tm_pg() stack usage net: mdio: mscc-miim: Fix the mdio controller af_unix: Return errno instead of NULL in unix_create1(). ...
2021-09-30bpf, xdp, docs: Correct some English grammar and spellingKev Jackson
Header DOC on include/net/xdp.h contained a few English grammer and spelling errors. Signed-off-by: Kev Jackson <foamdino@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/YVVaWmKqA8l9Tm4J@kev-VirtualBox
2021-09-30af_unix: fix races in sk_peer_pid and sk_peer_cred accessesEric Dumazet
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred. In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written. Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected. Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: snmp: inline snmp_get_cpu_field()Eric Dumazet
This trivial function is called ~90,000 times on 256 cpus hosts, when reading /proc/net/netstat. And this number keeps inflating. Inlining it saves many cycles. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30tcp: adjust rcv_ssthresh according to sk_reserved_memWei Wang
When user sets SO_RESERVE_MEM socket option, in order to utilize the reserved memory when in memory pressure state, we adjust rcv_ssthresh according to the available reserved memory for the socket, instead of using 4 * advmss always. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30tcp: adjust sndbuf according to sk_reserved_memWei Wang
If user sets SO_RESERVE_MEM socket option, in order to fully utilize the reserved memory in memory pressure state on the tx path, we modify the logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it when new data is acked. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: add new socket option SO_RESERVE_MEMWei Wang
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30net: introduce and use lock_sock_fast_nested()Paolo Abeni
Syzkaller reported a false positive deadlock involving the nl socket lock and the subflow socket lock: MPTCP: kernel_bind error, err=-98 ============================================ WARNING: possible recursive locking detected 5.15.0-rc1-syzkaller #0 Not tainted -------------------------------------------- syz-executor998/6520 is trying to acquire lock: ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 but task is already holding lock: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-sk_lock-AF_INET); lock(k-sk_lock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor998/6520: #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline] #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 stack backtrace: CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline] check_deadlock kernel/locking/lockdep.c:2987 [inline] validate_chain kernel/locking/lockdep.c:3776 [inline] __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_fast+0x36/0x100 net/core/sock.c:3229 mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 __sock_release net/socket.c:649 [inline] sock_release+0x87/0x1b0 net/socket.c:677 mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900 mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_no_sendpage+0x101/0x150 net/core/sock.c:2980 kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504 kernel_sendpage net/socket.c:3501 [inline] sock_sendpage+0xe5/0x140 net/socket.c:1003 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0xd4/0x140 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x110/0x180 fs/splice.c:936 splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891 do_splice_direct+0x1b3/0x280 fs/splice.c:979 do_sendfile+0xae9/0x1240 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1314 [inline] __se_sys_sendfile64 fs/read_write.c:1300 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f215cb69969 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08 R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 the problem originates from uncorrect lock annotation in the mptcp code and is only visible since commit 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations"), but is present since the port-based endpoint support initial implementation. This patch addresses the issue introducing a nested variant of lock_sock_fast() and using it in the relevant code path. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29Merge tag 'sound-5.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became a slightly large collection of changes, partly because I've been off in the last weeks. Most of changes are small and scattered while a bit big change is found in HD-audio Realtek codec driver; it's a very device-specific fix that has been long wanted, so I decided to pick up although it's in the middle RC. Some highlights: - A new guard ioctl for ALSA rawmidi API to avoid the misuse of the new timestamp framing mode; it's for a regression fix - HD-audio: a revert of the 5.15 change that might work badly, new quirks for Lenovo Legion & co, a follow-up fix for CS8409 - ASoC: lots of SOF-related fixes, fsl component fixes, corrections of mediatek drivers - USB-audio: fix for the PM resume - FireWire: oxfw and motu fixes" * tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits) ALSA: pcsp: Make hrtimer forwarding more robust ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSION ALSA: firewire-motu: fix truncated bytes in message tracepoints ASoC: SOF: trace: Omit error print when waking up trace sleepers ASoC: mediatek: mt8195: remove wrong fixup assignment on HDMITX ASoC: SOF: loader: Re-phrase the missing firmware error to avoid duplication ASoC: SOF: loader: release_firmware() on load failure to avoid batching ALSA: hda/cs8409: Setup Dolphin Headset Mic as Phantom Jack ALSA: pcxhr: "fix" PCXHR_REG_TO_PORT definition ASoC: SOF: imx: imx8m: Bar index is only valid for IRAM and SRAM types ASoC: SOF: imx: imx8: Bar index is only valid for IRAM and SRAM types ASoC: SOF: Fix DSP oops stack dump output contents ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i 15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops. ALSA: usb-audio: Unify mixer resume and reset_resume procedure Revert "ALSA: hda: Drop workaround for a hang at shutdown again" ALSA: oxfw: fix transmission method for Loud models based on OXFW971 ASoC: mediatek: common: handle NULL case in suspend/resume function ASoC: fsl_xcvr: register platform component before registering cpu dai ASoC: fsl_spdif: register platform component before registering cpu dai ASoC: fsl_micfil: register platform component before registering cpu dai ...
2021-09-29mctp: Add tracepoints for tag/key handlingJeremy Kerr
The tag allocation, release and bind events are somewhat opaque outside the kernel; this change adds a few tracepoints to assist in instrumentation and debugging. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: Implement a timeout for tagsJeremy Kerr
Currently, a MCTP (local-eid,remote-eid,tag) tuple is allocated to a socket on send, and only expires when the socket is closed. This change introduces a tag timeout, freeing the tuple after a fixed expiry - currently six seconds. This is greater than (but close to) the max response timeout in upper-layer bindings. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: Add refcounts to mctp_devJeremy Kerr
Currently, we tie the struct mctp_dev lifetime to the underlying struct net_device, and hold/put that device as a proxy for a separate mctp_dev refcount. This works because we're not holding any references to the mctp_dev that are different from the netdev lifetime. In a future change we'll break that assumption though, as we'll need to hold mctp_dev references in a workqueue, which might live past the netdev unregister notification. In order to support that, this change introduces a refcount on the mctp_dev, currently taken by the net_device->mctp_ptr reference, and released on netdev unregister events. We can then use this for future references that might outlast the net device. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29mctp: locking, lifetime and validity changes for sk_keysJeremy Kerr
We will want to invalidate sk_keys in a future change, which will require a boolean flag to mark invalidated items in the socket & net namespace lists. We'll also need to take a reference to keys, held over non-atomic contexts, so we need a refcount on keys also. This change adds a validity flag (currently always true) and refcount to struct mctp_sk_key. With a refcount on the keys, using RCU no longer makes much sense; we have exact indications on the lifetime of keys. So, we also change the RCU list traversal to a locked implementation. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: phy: micrel: Add support for LAN8804 PHYHoratiu Vultur
The LAN8804 PHY has same features as that of LAN8814 PHY except that it doesn't support 1588, SyncE or Q-USGMII. This PHY is found inside the LAN966X switches. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28bpf: Replace callers of BPF_CAST_CALL with proper function typedefKees Cook
In order to keep ahead of cases in the kernel where Control Flow Integrity (CFI) may trip over function call casts, enabling -Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes various warnings and is one of the last places in the kernel triggering this warning. For actual function calls, replace BPF_CAST_CALL() with a typedef, which captures the same details about the given function pointers. This change results in no object code difference. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://github.com/KSPP/linux/issues/20 Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com Link: https://lore.kernel.org/bpf/20210928230946.4062144-3-keescook@chromium.org
2021-09-28bpf: Replace "want address" users of BPF_CAST_CALL with BPF_CALL_IMMKees Cook
In order to keep ahead of cases in the kernel where Control Flow Integrity (CFI) may trip over function call casts, enabling -Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes various warnings and is one of the last places in the kernel triggering this warning. Most places using BPF_CAST_CALL actually just want a void * to perform math on. It's not actually performing a call, so just use a different helper to get the void *, by way of the new BPF_CALL_IMM() helper, which can clean up a common copy/paste idiom as well. This change results in no object code difference. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://github.com/KSPP/linux/issues/20 Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com Link: https://lore.kernel.org/bpf/20210928230946.4062144-2-keescook@chromium.org
2021-09-28octeontx2-pf: Use hardware register for CQE countGeetha sowjanya
Current driver uses software CQ head pointer to poll on CQE header in memory to determine if CQE is valid. Software needs to make sure, that the reads of the CQE do not get re-ordered so much that it ends up with an inconsistent view of the CQE. To ensure that DMB barrier after read to first CQE cacheline and before reading of the rest of the CQE is needed. But having barrier for every CQE read will impact the performance, instead use hardware CQ head and tail pointers to find the valid number of CQEs. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-09-28 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 14 day(s) which contain a total of 11 files changed, 139 insertions(+), 53 deletions(-). The main changes are: 1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk. 2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh. 3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann. 4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi. 5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer. 6) Fix return value handling for struct_ops BPF programs, from Hou Tao. 7) Various fixes to BPF selftests, from Jiri Benc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> ,
2021-09-28net/tls: support SM4 CCM algorithmTianjia Zhang
The IV of CCM mode has special requirements, this patch supports CCM mode of SM4 algorithm. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28xsk: Optimize for aligned caseMagnus Karlsson
Optimize for the aligned case by precomputing the parameter values of the xdp_buff_xsk and xdp_buff structures in the heads array. We can do this as the heads array size is equal to the number of chunks in the umem for the aligned case. Then every entry in this array will reflect a certain chunk/frame and can therefore be prepopulated with the correct values and we can drop the use of the free_heads stack. Note that it is not possible to allocate more buffers than what has been allocated in the aligned case since each chunk can only contain a single buffer. We can unfortunately not do this in the unaligned case as one chunk might contain multiple buffers. In this case, we keep the old scheme of populating a heads entry every time it is used and using the free_heads stack. Also move xp_release() and xp_get_handle() to xsk_buff_pool.h. They were for some reason in xsk.c even though they are buffer pool operations. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-7-magnus.karlsson@gmail.com
2021-09-28xsk: Batched buffer allocation for the poolMagnus Karlsson
Add a new driver interface xsk_buff_alloc_batch() offering batched buffer allocations to improve performance. The new interface takes three arguments: the buffer pool to allocated from, a pointer to an array of struct xdp_buff pointers which will contain pointers to the allocated xdp_buffs, and an unsigned integer specifying the max number of buffers to allocate. The return value is the actual number of buffers that the allocator managed to allocate and it will be in the range 0 <= N <= max, where max is the third parameter to the function. u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max); A second driver interface is also introduced that need to be used in conjunction with xsk_buff_alloc_batch(). It is a helper that sets the size of struct xdp_buff and is used by the NIC Rx irq routine when receiving a packet. This helper sets the three struct members data, data_meta, and data_end. The two first ones is in the xsk_buff_alloc() case set in the allocation routine and data_end is set when a packet is received in the receive irq function. This unfortunately leads to worse performance since the xdp_buff is touched twice with a long time period in between leading to an extra cache miss. Instead, we fill out the xdp_buff with all 3 fields at one single point in time in the driver, when the size of the packet is known. Hence this helper. Note that the driver has to use this helper (or set all three fields itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as before and does not require this. void xsk_buff_set_size(struct xdp_buff *xdp, u32 size); Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
2021-09-28xsk: Get rid of unused entry in struct xdp_buff_xskMagnus Karlsson
Get rid of the unused entry "unaligned" in struct xdp_buff_xsk. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-2-magnus.karlsson@gmail.com
2021-09-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A bit late... I got sidetracked by back-from-vacation routines and conferences. But most of these patches are already a few weeks old and things look more calm on the mailing list than what this pull request would suggest. x86: - missing TLB flush - nested virtualization fixes for SMM (secure boot on nested hypervisor) and other nested SVM fixes - syscall fuzzing fixes - live migration fix for AMD SEV - mirror VMs now work for SEV-ES too - fixes for reset - possible out-of-bounds access in IOAPIC emulation - fix enlightened VMCS on Windows 2022 ARM: - Add missing FORCE target when building the EL2 object - Fix a PMU probe regression on some platforms Generic: - KCSAN fixes selftests: - random fixes, mostly for clang compilation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) selftests: KVM: Explicitly use movq to read xmm registers selftests: KVM: Call ucall_init when setting up in rseq_test KVM: Remove tlbs_dirty KVM: X86: Synchronize the shadow pagetable before link it KVM: X86: Fix missed remote tlb flush in rmap_write_protect() KVM: x86: nSVM: don't copy virt_ext from vmcb12 KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 KVM: x86: nSVM: restore int_vector in svm_clear_vintr kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode KVM: x86: reset pdptrs_from_userspace when exiting smm KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit KVM: nVMX: Filter out all unsupported controls when eVMCS was activated KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs KVM: Clean up benign vcpu->cpu data races when kicking vCPUs ...
2021-09-27Merge tag 'mac80211-for-net-2021-09-27' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes berg says: ==================== Some fixes: * potential use-after-free in CCMP/GCMP RX processing * potential use-after-free in TX A-MSDU processing * revert to low data rates for no-ack as the commit broke other things * limit VHT MCS/NSS in radiotap injection * drop frames with invalid addresses in IBSS mode * check rhashtable_init() return value in mesh * fix potentially unaligned access in mesh * fix late beacon hrtimer handling in hwsim (syzbot) * fix documentation for PTK0 rekeying ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27ptp: clockmatrix: use rsmu driver to access i2c/spi busMin Li
rsmu (Renesas Synchronization Management Unit ) driver is located in drivers/mfd and responsible for creating multiple devices including clockmatrix phc, which will then use the exposed regmap and mutex handle to access i2c/spi bus. Signed-off-by: Min Li <min.li.xe@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27mac80211: Fix Ptk0 rekey documentationAlexander Wetzel
@IEEE80211_KEY_FLAG_GENERATE_IV setting is irrelevant for RX. Move the requirement to the correct section in the PTK0 rekey documentation. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20210924200514.7936-1-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-26Merge tag 'x86-urgent-2021-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Prevent sending the wrong signal when protection keys are enabled and the kernel handles a fault in the vsyscall emulation. - Invoke early_reserve_memory() before invoking e820_memory_setup() which is required to make the Xen dom0 e820 hooks work correctly. - Use the correct data type for the SETZ operand in the EMQCMDS instruction wrapper. - Prevent undefined behaviour to the potential unaligned accesss in the instruction decoder library" * tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses x86/asm: Fix SETZ size enqcmds() build failure x86/setup: Call early_reserve_memory() earlier x86/fault: Fix wrong signal when vsyscall fails with pkey
2021-09-26Merge tag 'irq-urgent-2021-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for interrupt chip drivers: - Work around a bad GIC integration on a Renesas platform which can't handle byte-sized MMIO access - Plug a potential memory leak in the GICv4 driver - Fix a regression in the Armada 370-XP IPI code which was caused by issuing EOI instack of ACK. - A couple of small fixes here and there" * tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic: Work around broken Renesas integration irqchip/renesas-rza1: Use semicolons instead of commas irqchip/gic-v3-its: Fix potential VPE leak on error irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build irqchip/mbigen: Repair non-kernel-doc notation irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent irqchip/armada-370-xp: Fix ack/eoi breakage Documentation: Fix irq-domain.rst build warning
2021-09-26net: prevent user from passing illegal stab size王贇
We observed below report when playing with netlink sock: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10 shift exponent 249 is too large for 32-bit type CPU: 0 PID: 685 Comm: a.out Not tainted Call Trace: dump_stack_lvl+0x8d/0xcf ubsan_epilogue+0xa/0x4e __ubsan_handle_shift_out_of_bounds+0x161/0x182 __qdisc_calculate_pkt_len+0xf0/0x190 __dev_queue_xmit+0x2ed/0x15b0 it seems like kernel won't check the stab log value passing from user, and will use the insane value later to calculate pkt_len. This patch just add a check on the size/cell_log to avoid insane calculation. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "16 patches. Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts, lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache, debug, and pagemap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix uninitialized use in overcommit_policy_handler mm/memory_failure: fix the missing pte_unmap() call kasan: always respect CONFIG_KASAN_STACK sh: pgtable-3level: fix cast to pointer from integer of different size mm/debug: sync up latest migrate_reason to migrate_reason_names mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN mm: fs: invalidate bh_lrus for only cold path lib/zlib_inflate/inffast: check config in C to avoid unused function warning tools/vm/page-types: remove dependency on opt_file for idle page tracking scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error ocfs2: drop acl cache for directories too mm/shmem.c: fix judgment error in shmem_is_huge() xtensa: increase size of gcc stack frame check mm/damon: don't use strnlen() with known-bogus source length kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
2021-09-25Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Thirty-three fixes, I'm afraid. Essentially the build up from the last couple of weeks while I've been dealling with Linux Plumbers conference infrastructure issues. It's mostly the usual assortment of spelling fixes and minor corrections. The only core relevant changes are to the sd driver to reduce the spin up message spew and fix a small memory leak on the freeing path" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (33 commits) scsi: ses: Retry failed Send/Receive Diagnostic commands scsi: target: Fix spelling mistake "CONFLIFT" -> "CONFLICT" scsi: lpfc: Fix gcc -Wstringop-overread warning, again scsi: lpfc: Use correct scnprintf() limit scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn() scsi: core: Remove 'current_tag' scsi: acornscsi: Remove tagged queuing vestiges scsi: fas216: Kill scmd->tag scsi: qla2xxx: Restore initiator in dual mode scsi: ufs: core: Unbreak the reset handler scsi: sd_zbc: Support disks with more than 2**32 logical blocks scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling" scsi: bsg: Fix device unregistration scsi: sd: Make sd_spinup_disk() less noisy scsi: ufs: ufs-pci: Fix Intel LKF link stability scsi: mpt3sas: Clean up some inconsistent indenting scsi: megaraid: Clean up some inconsistent indenting scsi: sr: Fix spelling mistake "does'nt" -> "doesn't" scsi: Remove SCSI CDROM MAINTAINERS entry scsi: megaraid: Fix Coccinelle warning ...
2021-09-25Merge tag 'for-linus-5.15b-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Some minor cleanups and fixes of some theoretical bugs, as well as a fix of a bug introduced in 5.15-rc1" * tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/x86: fix PV trap handling on secondary processors xen/balloon: fix balloon kthread freezing swiotlb-xen: this is PV-only on x86 xen/pci-swiotlb: reduce visibility of symbols PCI: only build xen-pcifront in PV-enabled environments swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests Xen/gntdev: don't ignore kernel unmapping error xen/x86: drop redundant zeroing from cpu_initialize_context()
2021-09-25Merge tag 'erofs-for-5.15-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Two bugfixes to fix the 4KiB blockmap chunk format availability and a dangling pointer usage. There is also a trivial cleanup to clarify compacted_2b if compacted_4b_initial > totalidx. Summary: - fix the dangling pointer use in erofs_lookup tracepoint - fix unsupported chunk format check - zero out compacted_2b if compacted_4b_initial > totalidx" * tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clear compacted_2b if compacted_4b_initial > totalidx erofs: fix misbehavior of unsupported chunk format check erofs: fix up erofs_lookup tracepoint
2021-09-25Merge tag 'char-misc-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.15-rc3. Nothing huge in here, just fixes for a number of small issues that have been reported. These include: - habanalabs race conditions and other bugs fixed - binder driver fixes - fpga driver fixes - coresight build warning fix - nvmem driver fix - comedi memory leak fix - bcm-vk tty race fix - other tiny driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits) comedi: Fix memory leak in compat_insnlist() nvmem: NVMEM_NINTENDO_OTP should depend on WII misc: bcm-vk: fix tty registration race fpga: dfl: Avoid reads to AFU CSRs during enumeration fpga: machxo2-spi: Fix missing error code in machxo2_write_complete() fpga: machxo2-spi: Return an error on failure habanalabs: expose a single cs seq in staged submissions habanalabs: fix wait offset handling habanalabs: rate limit multi CS completion errors habanalabs/gaudi: fix LBW RR configuration habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK" habanalabs: fail collective wait when not supported habanalabs/gaudi: use direct MSI in single mode habanalabs: fix kernel OOPs related to staged cs habanalabs: fix potential race in interrupt wait ioctl mcb: fix error handling in mcb_alloc_bus() misc: genwqe: Fixes DMA mask setting coresight: syscfg: Fix compiler warning nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM binder: make sure fd closes complete ...
2021-09-25Merge tag 'usb-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some USB driver fixes and new device ids for 5.15-rc3. They include: - usb-storage quirk additions - usb-serial new device ids - usb-serial driver fixes - USB roothub registration bugfix to resolve a long-reported issue - usb gadget driver fixes for a large number of small things - dwc2 driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits) USB: serial: option: add device id for Foxconn T99W265 USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter USB: serial: cp210x: add part-number debug printk USB: serial: cp210x: fix dropped characters with CP2102 MAINTAINERS: usb, update Peter Korsgaard's entries usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned() usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c Re-enable UAS for LaCie Rugged USB3-FW with fk quirk USB: serial: option: remove duplicate USB device ID USB: serial: mos7840: remove duplicated 0xac24 device ID arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval usb: gadget: f_uac2: Add missing companion descriptor for feedback EP usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd() xhci: Set HCD flag to defer primary roothub registration usb: core: hcd: Add support for deferring roothub registration usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave usb: dwc3: core: balance phy init and exit Revert "USB: bcma: Add a check for devm_gpiod_get" ...
2021-09-24mm/debug: sync up latest migrate_reason to migrate_reason_namesWeizhao Ouyang
Sync up MR_DEMOTION to migrate_reason_names and add a synch prompt. Link: https://lkml.kernel.org/r/20210921064553.293905-3-o451686892@gmail.com Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Weizhao Ouyang <o451686892@gmail.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Xu <weixugc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-24mm: fs: invalidate bh_lrus for only cold pathMinchan Kim
The kernel test robot reported the regression of fio.write_iops[1] with commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration"). Since lru_add_drain is called frequently, invalidate bh_lrus there could increase bh_lrus cache miss ratio, which needs more IO in the end. This patch moves the bh_lrus invalidation from the hot path( e.g., zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all, lru_cache_disable). Zhengjun Xing confirmed "I test the patch, the regression reduced to -2.9%" [1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/ [2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-24Merge tag 'acpi-5.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert a recent commit related to memory management that turned out to be problematic (Jia He)" * tag 'acpi-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: Add memory semantics to acpi_os_map_memory()"
2021-09-24tcp: tracking packets with CE marks in BW rate sampleYuchung Cheng
In order to track CE marks per rate sample (one round trip), TCP needs a per-skb header field to record the tp->delivered_ce count when the skb was sent. To make space, we replace the "last_in_flight" field which is used exclusively for NV congestion control. The stat needed by NV can be alternatively approximated by existing stats tcp_sock delivered and mss_cache. This patch counts the number of packets delivered which have CE marks in the rate sample, using similar approach of delivery accounting. Cc: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24net: phy: broadcom: Fix PHY_BRCM_IDDQ_SUSPEND definitionFlorian Fainelli
An extraneous number was added during the inclusion of that change, correct that such that we use a single bit as is expected by the PHY driver. Reported-by: Justin Chen <justinpopo6@gmail.com> Fixes: d6da08ed1425 ("net: phy: broadcom: Add IDDQ-SR mode") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24devlink: Delete not used port parameters APIsLeon Romanovsky
There is no in-kernel users for the devlink port parameters API, so let's remove it. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24net: ipv4: Fix rtnexthop len when RTA_FLOW is presentXiao Liang
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct. Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24Merge tag 'irqchip-fixes-5.15-1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Work around a bad GIC integration on a Renesas platform, where the interconnect cannot deal with byte-sized MMIO accesses - Cleanup another Renesas driver abusing the comma operator - Fix a potential GICv4 memory leak on an error path - Make the type of 'size' consistent with the rest of the code in __irq_domain_add() - Fix a regression in the Armada 370-XP IPI path - Fix the build for the obviously unloved goldfish-pic - Some documentation fixes Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
2021-09-24Merge tag 'kvmarm-fixes-5.15-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.15, take #1 - Add missing FORCE target when building the EL2 object - Fix a PMU probe regression on some platforms
2021-09-23Revert "ACPI: Add memory semantics to acpi_os_map_memory()"Jia He
This reverts commit 437b38c51162f8b87beb28a833c4d5dc85fa864e. The memory semantics added in commit 437b38c51162 causes SystemMemory Operation region, whose address range is not described in the EFI memory map to be mapped as NormalNC memory on arm64 platforms (through acpi_os_map_memory() in acpi_ex_system_memory_space_handler()). This triggers the following abort on an ARM64 Ampere eMAG machine, because presumably the physical address range area backing the Opregion does not support NormalNC memory attributes driven on the bus. Internal error: synchronous external abort: 96000410 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+ #462 Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 0.14 02/22/2019 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [...snip...] Call trace: acpi_ex_system_memory_space_handler+0x26c/0x2c8 acpi_ev_address_space_dispatch+0x228/0x2c4 acpi_ex_access_region+0x114/0x268 acpi_ex_field_datum_io+0x128/0x1b8 acpi_ex_extract_from_field+0x14c/0x2ac acpi_ex_read_data_from_field+0x190/0x1b8 acpi_ex_resolve_node_to_value+0x1ec/0x288 acpi_ex_resolve_to_value+0x250/0x274 acpi_ds_evaluate_name_path+0xac/0x124 acpi_ds_exec_end_op+0x90/0x410 acpi_ps_parse_loop+0x4ac/0x5d8 acpi_ps_parse_aml+0xe0/0x2c8 acpi_ps_execute_method+0x19c/0x1ac acpi_ns_evaluate+0x1f8/0x26c acpi_ns_init_one_device+0x104/0x140 acpi_ns_walk_namespace+0x158/0x1d0 acpi_ns_initialize_devices+0x194/0x218 acpi_initialize_objects+0x48/0x50 acpi_init+0xe0/0x498 If the Opregion address range is not present in the EFI memory map there is no way for us to determine the memory attributes to use to map it - defaulting to NormalNC does not work (and it is not correct on a memory region that may have read side-effects) and therefore commit 437b38c51162 should be reverted, which means reverting back to the original behavior whereby address ranges that are mapped using acpi_os_map_memory() default to the safe devicenGnRnE attributes on ARM64 if the mapped address range is not defined in the EFI memory map. Fixes: 437b38c51162 ("ACPI: Add memory semantics to acpi_os_map_memory()") Signed-off-by: Jia He <justin.he@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>