aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-01-05kallsyms: Make module_kallsyms_on_each_symbol generally availableJiri Olsa
commit 73feb8d5fa3b755bb51077c0aabfb6aa556fd498 upstream. Making module_kallsyms_on_each_symbol generally available, so it can be used outside CONFIG_LIVEPATCH option in following changes. Rather than adding another ifdef option let's make the function generally available (when CONFIG_KALLSYMS and CONFIG_MODULES options are defined). Cc: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221025134148.3300700-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-05device property: Allow const parameter to dev_fwnode()Andy Shevchenko
commit b295d484b97081feba72b071ffcb72fb4638ccfd upstream. It's not fully correct to take a const parameter pointer to a struct and return a non-const pointer to a member of that struct. Instead, introduce a const version of the dev_fwnode() API which takes and returns const pointers and use it where it's applicable. With this, convert dev_fwnode() to be a macro wrapper on top of const and non-const APIs that chooses one based on the type. Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Fixes: aade55c86033 ("device property: Add const qualifier to device_get_match_data() parameter") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lore.kernel.org/r/20221004092129.19412-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-05spi: Constify spi parameters of chip select APIsGeert Uytterhoeven
commit d2f19eec510424caa55ea949f016ddabe2d8173a upstream. The "spi" parameters of spi_get_chipselect() and spi_get_csgpiod() can be const. Fixes: 303feb3cc06ac066 ("spi: Add APIs in spi core to set/get spi->chip_select and spi->cs_gpiod") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/b112de79e7a1e9095a3b6ff22b639f39e39d7748.1678704562.git.geert+renesas@glider.be Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-05block: renumber QUEUE_FLAG_HW_WCChristoph Hellwig
[ Upstream commit 02d374f3418df577c850f0cd45c3da9245ead547 ] For the QUEUE_FLAG_HW_WC to actually work, it needs to have a separate number from QUEUE_FLAG_FUA, doh. Fixes: 43c9835b144c ("block: don't allow enabling a cache on devices that don't support it") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231226081524.180289-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-05spi: Add APIs in spi core to set/get spi->chip_select and spi->cs_gpiodAmit Kumar Mahapatra
[ Upstream commit 303feb3cc06ac0665d0ee9c1414941200e60e8a3 ] Supporting multi-cs in spi core and spi controller drivers would require the chip_select & cs_gpiod members of struct spi_device to be an array. But changing the type of these members to array would break the spi driver functionality. To make the transition smoother introduced four new APIs to get/set the spi->chip_select & spi->cs_gpiod and replaced all spi->chip_select and spi->cs_gpiod references in spi core with the API calls. While adding multi-cs support in further patches the chip_select & cs_gpiod members of the spi_device structure would be converted to arrays & the "idx" parameter of the APIs would be used as array index i.e., spi->chip_select[idx] & spi->cs_gpiod[idx] respectively. Suggested-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> Reviewed-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/20230119185342.2093323-2-amit.kumar-mahapatra@amd.com Signed-off-by: Mark Brown <broonie@kernel.org> Stable-dep-of: fc70d643a2f6 ("spi: atmel: Fix clock issue when using devices with different polarities") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-05linux/export: Ensure natural alignment of kcrctab arrayHelge Deller
[ Upstream commit 753547de0daecbdbd1af3618987ddade325d9aaa ] The ___kcrctab section holds an array of 32-bit CRC values. Add a .balign 4 to tell the linker the correct memory alignment. Fixes: f3304ecd7f06 ("linux/export: use inline assembler to populate symbol CRCs") Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-05spi: Introduce spi_get_device_match_data() helperAndy Shevchenko
[ Upstream commit aea672d054a21782ed8450c75febb6ba3c208ca4 ] The proposed spi_get_device_match_data() helper is for retrieving a driver data associated with the ID in an ID table. First, it tries to get driver data of the device enumerated by firmware interface (usually Device Tree or ACPI). If none is found it falls back to the SPI ID table matching. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221020195421.10482-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Stable-dep-of: ee4d79055aee ("iio: imu: adis16475: add spi_device_id table") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-05ksmbd: fix racy issue from using ->d_parent and ->d_nameNamjae Jeon
[ Upstream commit 74d7970febf7e9005375aeda0df821d2edffc9f7 ] Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child() to lock stable parent while underlying rename racy. Introduce vfs_path_parent_lookup helper to avoid out of share access and export vfs functions like the following ones to use vfs_path_parent_lookup(). - rename __lookup_hash() to lookup_one_qstr_excl(). - export lookup_one_qstr_excl(). - export getname_kernel() and putname(). vfs_path_parent_lookup() is used for parent lookup of destination file using absolute pathname given from FILE_RENAME_INFORMATION request. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-05fs: introduce lock_rename_child() helperAl Viro
[ Upstream commit 9bc37e04823b5280dd0f22b6680fc23fe81ca325 ] Pass the dentry of a source file and the dentry of a destination directory to lock parent inodes for rename. As soon as this function returns, ->d_parent of the source file dentry is stable and inodes are properly locked for calling vfs-rename. This helper is needed for ksmbd server. rename request of SMB protocol has to rename an opened file, no matter which directory it's in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01mm/damon/core: make damon_start() waits until kdamond_fn() startsSeongJae Park
commit 6376a824595607e99d032a39ba3394988b4fce96 upstream. The cleanup tasks of kdamond threads including reset of corresponding DAMON context's ->kdamond field and decrease of global nr_running_ctxs counter is supposed to be executed by kdamond_fn(). However, commit 0f91d13366a4 ("mm/damon: simplify stop mechanism") made neither damon_start() nor damon_stop() ensure the corresponding kdamond has started the execution of kdamond_fn(). As a result, the cleanup can be skipped if damon_stop() is called fast enough after the previous damon_start(). Especially the skipped reset of ->kdamond could cause a use-after-free. Fix it by waiting for start of kdamond_fn() execution from damon_start(). Link: https://lkml.kernel.org/r/20231208175018.63880-1-sj@kernel.org Fixes: 0f91d13366a4 ("mm/damon: simplify stop mechanism") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: Changbin Du <changbin.du@intel.com> Cc: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-01dm thin metadata: Fix ABBA deadlock by resetting dm_bufio_clientLi Lingfeng
[ Upstream commit d48300120627a1cb98914738fff38b424625b8ad ] As described in commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata"), ABBA deadlocks will be triggered because shrinker_rwsem currently needs to held by dm_pool_abort_metadata() as a side-effect of thin-pool metadata operation failure. The following three problem scenarios have been noticed: 1) Described by commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata") 2) shrinker_rwsem and throttle->lock P1(drop cache) P2(kworker) drop_caches_sysctl_handler drop_slab shrink_slab down_read(&shrinker_rwsem) - LOCK A do_shrink_slab super_cache_scan prune_icache_sb dispose_list evict ext4_evict_inode ext4_clear_inode ext4_discard_preallocations ext4_mb_load_buddy_gfp ext4_mb_init_cache ext4_wait_block_bitmap __ext4_error ext4_handle_error ext4_commit_super ... dm_submit_bio do_worker throttle_work_update down_write(&t->lock) -- LOCK B process_deferred_bios commit metadata_operation_failed dm_pool_abort_metadata dm_block_manager_create dm_bufio_client_create register_shrinker down_write(&shrinker_rwsem) -- LOCK A thin_map thin_bio_map thin_defer_bio_with_throttle throttle_lock down_read(&t->lock) - LOCK B 3) shrinker_rwsem and wait_on_buffer P1(drop cache) P2(kworker) drop_caches_sysctl_handler drop_slab shrink_slab down_read(&shrinker_rwsem) - LOCK A do_shrink_slab ... ext4_wait_block_bitmap __ext4_error ext4_handle_error jbd2_journal_abort jbd2_journal_update_sb_errno jbd2_write_superblock submit_bh // LOCK B // RELEASE B do_worker throttle_work_update down_write(&t->lock) - LOCK B process_deferred_bios process_bio commit metadata_operation_failed dm_pool_abort_metadata dm_block_manager_create dm_bufio_client_create register_shrinker register_shrinker_prepared down_write(&shrinker_rwsem) - LOCK A bio_endio wait_on_buffer __wait_on_buffer Fix these by resetting dm_bufio_client without holding shrinker_rwsem. Fixes: 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata") Cc: stable@vger.kernel.org Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiryDavid Howells
[ Upstream commit 39299bdd2546688d92ed9db4948f6219ca1b9542 ] If a key has an expiration time, then when that time passes, the key is left around for a certain amount of time before being collected (5 mins by default) so that EKEYEXPIRED can be returned instead of ENOKEY. This is a problem for DNS keys because we want to redo the DNS lookup immediately at that point. Fix this by allowing key types to be marked such that keys of that type don't have this extra period, but are reclaimed as soon as they expire and turn this on for dns_resolver-type keys. To make this easier to handle, key->expiry is changed to be permanent if TIME64_MAX rather than 0. Furthermore, give such new-style negative DNS results a 1s default expiry if no other expiry time is set rather than allowing it to stick around indefinitely. This shouldn't be zero as ls will follow a failing stat call immediately with a second with AT_SYMLINK_NOFOLLOW added. Fixes: 1a4240f4764a ("DNS: Separate out CIFS DNS Resolver code") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: Wang Lei <wang840925@gmail.com> cc: Jeff Layton <jlayton@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jarkko Sakkinen <jarkko@kernel.org> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: keyrings@vger.kernel.org cc: netdev@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01net/mlx5: Re-organize mlx5_cmd structShay Drory
[ Upstream commit 58db72869a9f8e01910844ca145efc2ea91bbbf9 ] Downstream patch will split mlx5_cmd_init() to probe and reload routines. As a preparation, organize mlx5_cmd struct so that any field that will be used in the reload routine are grouped at new nested struct. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Stable-dep-of: 8f5100da56b3 ("net/mlx5e: Fix a race in command alloc flow") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01net/mlx5: Prevent high-rate FW commands from populating all slotsTariq Toukan
[ Upstream commit 63fbae0a74c3e1df7c20c81e04353ced050d9887 ] Certain connection-based device-offload protocols (like TLS) use per-connection HW objects to track the state, maintain the context, and perform the offload properly. Some of these objects are created, modified, and destroyed via FW commands. Under high connection rate, this type of FW commands might continuously populate all slots of the FW command interface and throttle it, while starving other critical control FW commands. Limit these throttle commands to using only up to a portion (half) of the FW command interface slots. FW commands maximal rate is not hit, and the same high rate is still reached when applying this limitation. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Stable-dep-of: 8f5100da56b3 ("net/mlx5e: Fix a race in command alloc flow") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01bpf: Fix prog_array_map_poke_run map poke updateJiri Olsa
commit 4b7de801606e504e69689df71475d27e35336fb3 upstream. Lee pointed out issue found by syscaller [0] hitting BUG in prog array map poke update in prog_array_map_poke_run function due to error value returned from bpf_arch_text_poke function. There's race window where bpf_arch_text_poke can fail due to missing bpf program kallsym symbols, which is accounted for with check for -EINVAL in that BUG_ON call. The problem is that in such case we won't update the tail call jump and cause imbalance for the next tail call update check which will fail with -EBUSY in bpf_arch_text_poke. I'm hitting following race during the program load: CPU 0 CPU 1 bpf_prog_load bpf_check do_misc_fixups prog_array_map_poke_track map_update_elem bpf_fd_array_map_update_elem prog_array_map_poke_run bpf_arch_text_poke returns -EINVAL bpf_prog_kallsyms_add After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next poke update fails on expected jump instruction check in bpf_arch_text_poke with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run. Similar race exists on the program unload. Fixing this by moving the update to bpf_arch_poke_desc_update function which makes sure we call __bpf_arch_text_poke that skips the bpf address check. Each architecture has slightly different approach wrt looking up bpf address in bpf_arch_text_poke, so instead of splitting the function or adding new 'checkip' argument in previous version, it seems best to move the whole map_poke_run update as arch specific code. [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810 Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Cc: Lee Jones <lee@kernel.org> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-01kasan: disable kasan_non_canonical_hook() for HW tagsArnd Bergmann
commit 17c17567fe510857b18fe01b7a88027600e76ac6 upstream. On arm64, building with CONFIG_KASAN_HW_TAGS now causes a compile-time error: mm/kasan/report.c: In function 'kasan_non_canonical_hook': mm/kasan/report.c:637:20: error: 'KASAN_SHADOW_OFFSET' undeclared (first use in this function) 637 | if (addr < KASAN_SHADOW_OFFSET) | ^~~~~~~~~~~~~~~~~~~ mm/kasan/report.c:637:20: note: each undeclared identifier is reported only once for each function it appears in mm/kasan/report.c:640:77: error: expected expression before ';' token 640 | orig_addr = (addr - KASAN_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT; This was caused by removing the dependency on CONFIG_KASAN_INLINE that used to prevent this from happening. Use the more specific dependency on KASAN_SW_TAGS || KASAN_GENERIC to only ignore the function for hwasan mode. Link: https://lkml.kernel.org/r/20231016200925.984439-1-arnd@kernel.org Fixes: 12ec6a919b0f ("kasan: print the original fault addr when access invalid shadow") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-20mm/mglru: fix underprotected page cacheYu Zhao
commit 081488051d28d32569ebb7c7a23572778b2e7d57 upstream. Unmapped folios accessed through file descriptors can be underprotected. Those folios are added to the oldest generation based on: 1. The fact that they are less costly to reclaim (no need to walk the rmap and flush the TLB) and have less impact on performance (don't cause major PFs and can be non-blocking if needed again). 2. The observation that they are likely to be single-use. E.g., for client use cases like Android, its apps parse configuration files and store the data in heap (anon); for server use cases like MySQL, it reads from InnoDB files and holds the cached data for tables in buffer pools (anon). However, the oldest generation can be very short lived, and if so, it doesn't provide the PID controller with enough time to respond to a surge of refaults. (Note that the PID controller uses weighted refaults and those from evicted generations only take a half of the whole weight.) In other words, for a short lived generation, the moving average smooths out the spike quickly. To fix the problem: 1. For folios that are already on LRU, if they can be beyond the tracking range of tiers, i.e., five accesses through file descriptors, move them to the second oldest generation to give them more time to age. (Note that tiers are used by the PID controller to statistically determine whether folios accessed multiple times through file descriptors are worth protecting.) 2. When adding unmapped folios to LRU, adjust the placement of them so that they are not too close to the tail. The effect of this is similar to the above. On Android, launching 55 apps sequentially: Before After Change workingset_refault_anon 25641024 25598972 0% workingset_refault_file 115016834 106178438 -8% Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com> Tested-by: Kalesh Singh <kaleshsingh@google.com> Cc: T.J. Mercier <tjmercier@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-20cred: switch to using atomic_long_tJens Axboe
commit f8fa5d76925991976b3e7076f9d1052515ec1fca upstream. There are multiple ways to grab references to credentials, and the only protection we have against overflowing it is the memory required to do so. With memory sizes only moving in one direction, let's bump the reference count to 64-bit and move it outside the realm of feasibly overflowing. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-20net: vlan: introduce skb_vlan_eth_hdr()Vladimir Oltean
[ Upstream commit 1f5020acb33f926030f62563c86dffca35c7b701 ] Similar to skb_eth_hdr() introduced in commit 96cc4b69581d ("macvlan: do not assume mac_header is set in macvlan_broadcast()"), let's introduce a skb_vlan_eth_hdr() helper which can be used in TX-only code paths to get to the VLAN header based on skb->data rather than based on the skb_mac_header(skb). We also consolidate the drivers that dereference skb->data to go through this helper. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 9fc95fe95c3e ("net: fec: correct queue selection") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20r8152: add vendor/device ID pair for ASUS USB-C2500Kelly Kane
[ Upstream commit 7037d95a047cd89b1f680eed253c6ab586bef1ed ] The ASUS USB-C2500 is an RTL8156 based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Kelly Kane <kelly@hawknetworks.com> Link: https://lore.kernel.org/r/20231203011712.6314-1-kelly@hawknetworks.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20r8152: add vendor/device ID pair for D-Link DUB-E250Antonio Napolitano
[ Upstream commit 72f93a3136ee18fd59fa6579f84c07e93424681e ] The D-Link DUB-E250 is an RTL8156 based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Antonio Napolitano <anton@polit.no> Link: https://lore.kernel.org/r/CV200KJEEUPC.WPKAHXCQJ05I@mercurius Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 7037d95a047c ("r8152: add vendor/device ID pair for ASUS USB-C2500") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13hugetlb: fix null-ptr-deref in hugetlb_vma_lock_writeMike Kravetz
commit 187da0f8250aa94bd96266096aef6f694e0b4cd2 upstream. The routine __vma_private_lock tests for the existence of a reserve map associated with a private hugetlb mapping. A pointer to the reserve map is in vma->vm_private_data. __vma_private_lock was checking the pointer for NULL. However, it is possible that the low bits of the pointer could be used as flags. In such instances, vm_private_data is not NULL and not a valid pointer. This results in the null-ptr-deref reported by syzbot: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef] CPU: 0 PID: 5048 Comm: syz-executor139 Not tainted 6.6.0-rc7-syzkaller-00142-g88 8cf78c29e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 1 0/09/2023 RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5004 ... Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5753 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718 down_write+0x93/0x200 kernel/locking/rwsem.c:1573 hugetlb_vma_lock_write mm/hugetlb.c:300 [inline] hugetlb_vma_lock_write+0xae/0x100 mm/hugetlb.c:291 __hugetlb_zap_begin+0x1e9/0x2b0 mm/hugetlb.c:5447 hugetlb_zap_begin include/linux/hugetlb.h:258 [inline] unmap_vmas+0x2f4/0x470 mm/memory.c:1733 exit_mmap+0x1ad/0xa60 mm/mmap.c:3230 __mmput+0x12a/0x4d0 kernel/fork.c:1349 mmput+0x62/0x70 kernel/fork.c:1371 exit_mm kernel/exit.c:567 [inline] do_exit+0x9ad/0x2a20 kernel/exit.c:861 __do_sys_exit kernel/exit.c:991 [inline] __se_sys_exit kernel/exit.c:989 [inline] __x64_sys_exit+0x42/0x50 kernel/exit.c:989 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Mask off low bit flags before checking for NULL pointer. In addition, the reserve map only 'belongs' to the OWNER (parent in parent/child relationships) so also check for the OWNER flag. Link: https://lkml.kernel.org/r/20231114012033.259600-1-mike.kravetz@oracle.com Reported-by: syzbot+6ada951e7c0f7bc8a71e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/00000000000078d1e00608d7878b@google.com/ Fixes: bf4916922c60 ("hugetlbfs: extend hugetlb_vma_lock to private VMAs") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Rik van Riel <riel@surriel.com> Cc: Edward Adam Davis <eadavis@qq.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13kprobes: consistent rcu api usage for kretprobe holderJP Kobryn
commit d839a656d0f3caca9f96e9bf912fd394ac6a11bc upstream. It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is RCU-managed, based on the (non-rethook) implementation of get_kretprobe(). The thought behind this patch is to make use of the RCU API where possible when accessing this pointer so that the needed barriers are always in place and to self-document the code. The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes done to the "rp" pointer are changed to make use of the RCU macro for assignment. For the single read, the implementation of get_kretprobe() is simplified by making use of an RCU macro which accomplishes the same, but note that the log warning text will be more generic. I did find that there is a difference in assembly generated between the usage of the RCU macros vs without. For example, on arm64, when using rcu_assign_pointer(), the corresponding store instruction is a store-release (STLR) which has an implicit barrier. When normal assignment is done, a regular store (STR) is found. In the macro case, this seems to be a result of rcu_assign_pointer() using smp_store_release() when the value to write is not NULL. Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/ Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash") Cc: stable@vger.kernel.org Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13rethook: Use __rcu pointer for rethook::handlerMasami Hiramatsu (Google)
commit a1461f1fd6cfdc4b8917c9d4a91e92605d1f28dc upstream. Since the rethook::handler is an RCU-maganged pointer so that it will notice readers the rethook is stopped (unregistered) or not, it should be an __rcu pointer and use appropriate functions to be accessed. This will use appropriate memory barrier when accessing it. OTOH, rethook::data is never changed, so we don't need to check it in get_kretprobe(). NOTE: To avoid sparse warning, rethook::handler is defined by a raw function pointer type with __rcu instead of rethook_handler_t. Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/ Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311241808.rv9ceuAh-lkp@intel.com/ Tested-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13iommu: Avoid more races around device probeRobin Murphy
commit a2e7e59a94269484a83386972ca07c22fd188854 upstream. It turns out there are more subtle races beyond just the main part of __iommu_probe_device() itself running in parallel - the dev_iommu_free() on the way out of an unsuccessful probe can still manage to trip up concurrent accesses to a device's fwspec. Thus, extend the scope of iommu_probe_device_lock() to also serialise fwspec creation and initial retrieval. Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/ Fixes: 01657bc14a39 ("iommu: Avoid races around device probe") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: André Draszik <andre.draszik@linaro.org> Tested-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/16f433658661d7cadfea51e7c65da95826112a2b.1700071477.git.robin.murphy@arm.com Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13net: stmmac: fix FPE events losingJianheng Zhang
[ Upstream commit 37e4b8df27bc68340f3fc80dbb27e3549c7f881c ] The status bits of register MAC_FPE_CTRL_STS are clear on read. Using 32-bit read for MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and dwmac5_fpe_send_mpacket() clear the status bits. Then the stmmac interrupt handler missing FPE event status and leads to FPE handshaking failure and retries. To avoid clear status bits of MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and dwmac5_fpe_send_mpacket(), add fpe_csr to stmmac_fpe_cfg structure to cache the control bits of MAC_FPE_CTRL_STS and to avoid reading MAC_FPE_CTRL_STS in those methods. Fixes: 5a5586112b92 ("net: stmmac: support FPE link partner hand-shaking procedure") Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jianheng Zhang <Jianheng.Zhang@synopsys.com> Link: https://lore.kernel.org/r/CY5PR12MB637225A7CF529D5BE0FBE59CBF81A@CY5PR12MB6372.namprd12.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13hrtimers: Push pending hrtimers away from outgoing CPU earlierThomas Gleixner
[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ] 2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") solved the straight forward CPU hotplug deadlock vs. the scheduler bandwidth timer. Yu discovered a more involved variant where a task which has a bandwidth timer started on the outgoing CPU holds a lock and then gets throttled. If the lock required by one of the CPU hotplug callbacks the hotplug operation deadlocks because the unthrottling timer event is not handled on the dying CPU and can only be recovered once the control CPU reaches the hotplug state which pulls the pending hrtimers from the dead CPU. Solve this by pushing the hrtimers away from the dying CPU in the dying callbacks. Nothing can queue a hrtimer on the dying CPU at that point because all other CPUs spin in stop_machine() with interrupts disabled and once the operation is finished the CPU is marked offline. Reported-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liu Tie <liutie4@huawei.com> Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den
commit b56ebe7c896dc78b5865ec2c4b1dae3c93537517 upstream. commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08mmc: core: add helpers mmc_regulator_enable/disable_vqmmcHeiner Kallweit
[ Upstream commit 8d91f3f8ae57e6292142ca89f322e90fa0d6ac02 ] There's a number of drivers (e.g. dw_mmc, meson-gx, mmci, sunxi) using the same mechanism and a private flag vqmmc_enabled to deal with enabling/disabling the vqmmc regulator. Move this to the core and create new helpers mmc_regulator_enable_vqmmc and mmc_regulator_disable_vqmmc. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/71586432-360f-9b92-17f6-b05a8a971bc2@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Stable-dep-of: 477865af60b2 ("mmc: sdhci-sprd: Fix vqmmc not shutting down after the card was pulled") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08spi: Fix null dereference on suspendMark Hasemeyer
[ Upstream commit bef4a48f4ef798c4feddf045d49e53c8a97d5e37 ] A race condition exists where a synchronous (noqueue) transfer can be active during a system suspend. This can cause a null pointer dereference exception to occur when the system resumes. Example order of events leading to the exception: 1. spi_sync() calls __spi_transfer_message_noqueue() which sets ctlr->cur_msg 2. Spi transfer begins via spi_transfer_one_message() 3. System is suspended interrupting the transfer context 4. System is resumed 6. spi_controller_resume() calls spi_start_queue() which resets cur_msg to NULL 7. Spi transfer context resumes and spi_finalize_current_message() is called which dereferences cur_msg (which is now NULL) Wait for synchronous transfers to complete before suspending by acquiring the bus mutex and setting/checking a suspend flag. Signed-off-by: Mark Hasemeyer <markhas@chromium.org> Link: https://lore.kernel.org/r/20231107144743.v1.1.I7987f05f61901f567f7661763646cb7d7919b528@changeid Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08dma-buf: fix check in dma_resv_add_fenceChristian König
commit 95ba893c9f4feb836ddce627efd0bb6af6667031 upstream. It's valid to add the same fence multiple times to a dma-resv object and we shouldn't need one extra slot for each. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: a3f7c10a269d5 ("dma-buf/dma-resv: check if the new fence is really later") Cc: stable@vger.kernel.org # v5.19+ Link: https://patchwork.freedesktop.org/patch/msgid/20231115093035.1889-1-christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-03HID: fix HID device resource race between HID core and debugging supportCharles Yi
[ Upstream commit fc43e9c857b7aa55efba9398419b14d9e35dcc7d ] hid_debug_events_release releases resources bound to the HID device instance. hid_device_release releases the underlying HID device instance potentially before hid_debug_events_release has completed releasing debug resources bound to the same HID device instance. Reference count to prevent the HID device instance from being torn down preemptively when HID debugging support is used. When count reaches zero, release core resources of HID device instance using hiddev_free. The crash: [ 120.728477][ T4396] kernel BUG at lib/list_debug.c:53! [ 120.728505][ T4396] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 120.739806][ T4396] Modules linked in: bcmdhd dhd_static_buf 8822cu pcie_mhi r8168 [ 120.747386][ T4396] CPU: 1 PID: 4396 Comm: hidt_bridge Not tainted 5.10.110 #257 [ 120.754771][ T4396] Hardware name: Rockchip RK3588 EVB4 LP4 V10 Board (DT) [ 120.761643][ T4396] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--) [ 120.768338][ T4396] pc : __list_del_entry_valid+0x98/0xac [ 120.773730][ T4396] lr : __list_del_entry_valid+0x98/0xac [ 120.779120][ T4396] sp : ffffffc01e62bb60 [ 120.783126][ T4396] x29: ffffffc01e62bb60 x28: ffffff818ce3a200 [ 120.789126][ T4396] x27: 0000000000000009 x26: 0000000000980000 [ 120.795126][ T4396] x25: ffffffc012431000 x24: ffffff802c6d4e00 [ 120.801125][ T4396] x23: ffffff8005c66f00 x22: ffffffc01183b5b8 [ 120.807125][ T4396] x21: ffffff819df2f100 x20: 0000000000000000 [ 120.813124][ T4396] x19: ffffff802c3f0700 x18: ffffffc01d2cd058 [ 120.819124][ T4396] x17: 0000000000000000 x16: 0000000000000000 [ 120.825124][ T4396] x15: 0000000000000004 x14: 0000000000003fff [ 120.831123][ T4396] x13: ffffffc012085588 x12: 0000000000000003 [ 120.837123][ T4396] x11: 00000000ffffbfff x10: 0000000000000003 [ 120.843123][ T4396] x9 : 455103d46b329300 x8 : 455103d46b329300 [ 120.849124][ T4396] x7 : 74707572726f6320 x6 : ffffffc0124b8cb5 [ 120.855124][ T4396] x5 : ffffffffffffffff x4 : 0000000000000000 [ 120.861123][ T4396] x3 : ffffffc011cf4f90 x2 : ffffff81fee7b948 [ 120.867122][ T4396] x1 : ffffffc011cf4f90 x0 : 0000000000000054 [ 120.873122][ T4396] Call trace: [ 120.876259][ T4396] __list_del_entry_valid+0x98/0xac [ 120.881304][ T4396] hid_debug_events_release+0x48/0x12c [ 120.886617][ T4396] full_proxy_release+0x50/0xbc [ 120.891323][ T4396] __fput+0xdc/0x238 [ 120.895075][ T4396] ____fput+0x14/0x24 [ 120.898911][ T4396] task_work_run+0x90/0x148 [ 120.903268][ T4396] do_exit+0x1bc/0x8a4 [ 120.907193][ T4396] do_group_exit+0x8c/0xa4 [ 120.911458][ T4396] get_signal+0x468/0x744 [ 120.915643][ T4396] do_signal+0x84/0x280 [ 120.919650][ T4396] do_notify_resume+0xd0/0x218 [ 120.924262][ T4396] work_pending+0xc/0x3f0 [ Rahul Rameshbabu <sergeantsagara@protonmail.com>: rework changelog ] Fixes: cd667ce24796 ("HID: use debugfs for events/reports dumping") Signed-off-by: Charles Yi <be286@163.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28tracing: Have trace_event_file have ref countersSteven Rostedt (Google)
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream. The following can crash the kernel: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&- The above commands: 1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5 The above causes a crash! BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50 What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor. But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug. To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor. Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28net: ethtool: Fix documentation of ethtool_sprintf()Andrew Lunn
commit f55d8e60f10909dbc5524e261041e1d28d7d20d8 upstream. This function takes a pointer to a pointer, unlike sprintf() which is passed a plain pointer. Fix up the documentation to make this clear. Fixes: 7888fe53b706 ("ethtool: Add common function for filling out strings") Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Justin Stitt <justinstitt@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20231028192511.100001-1-andrew@lunn.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28lsm: fix default return value for inode_getsecctxOndrej Mosnacek
commit b36995b8609a5a8fe5cf259a1ee768fcaed919f8 upstream. -EOPNOTSUPP is the return value that implements a "no-op" hook, not 0. Without this fix having only the BPF LSM enabled (with no programs attached) can cause uninitialized variable reads in nfsd4_encode_fattr(), because the BPF hook returns 0 without touching the 'ctxlen' variable and the corresponding 'contextlen' variable in nfsd4_encode_fattr() remains uninitialized, yet being treated as valid based on the 0 return value. Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28lsm: fix default return value for vm_enough_memoryOndrej Mosnacek
commit 866d648059d5faf53f1cd960b43fe8365ad93ea7 upstream. 1 is the return value that implements a "no-op" hook, not 0. Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28fs: add ctime accessors infrastructureJeff Layton
commit 9b6304c1d53745c300b86f202d0dcff395e2d2db upstream. struct timespec64 has unused bits in the tv_nsec field that can be used for other purposes. In future patches, we're going to change how the inode->i_ctime is accessed in certain inodes in order to make use of them. In order to do that safely though, we'll need to eradicate raw accesses of the inode->i_ctime field from the kernel. Add new accessor functions for the ctime that we use to replace them. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Message-Id: <20230705185812.579118-2-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28mmc: Add quirk MMC_QUIRK_BROKEN_CACHE_FLUSH for Micron eMMC Q2J54ABean Huo
commit ed9009ad300c0f15a3ecfe9613547b1962bde02c upstream. Micron MTFC4GACAJCN eMMC supports cache but requires that flush cache operation be allowed only after a write has occurred. Otherwise, the cache flush command or subsequent commands will time out. Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Rafael Beims <rafael.beims@toradex.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231030224809.59245-1-beanhuo@iokpp.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28mm/damon: implement a function for max nr_accesses safe calculationSeongJae Park
commit 35f5d94187a6a3a8df2cba54beccca1c2379edb8 upstream. Patch series "avoid divide-by-zero due to max_nr_accesses overflow". The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Avoid the divide-by-zero by implementing a function that handles the corner case (first patch), and replaces the vulnerable direct max nr_accesses calculations (remaining patches). Note that the patches for the replacements are divided for broken commits, to make backporting on required tres easier. Especially, the last patch is for a patch that not yet merged into the mainline but in mm tree. This patch (of 4): The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Implement a function that handles the corner case. Note that this commit is not fixing the real issue since this is only introducing the safe function that will replaces the problematic divisions. The replacements will be made by followup commits, to make backporting on stable series easier. Link: https://lkml.kernel.org/r/20231019194924.100347-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231019194924.100347-2-sj@kernel.org Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28proc: sysctl: prevent aliased sysctls from getting passed to initKrister Johansen
commit 8001f49394e353f035306a45bcf504f06fca6355 upstream. The code that checks for unknown boot options is unaware of the sysctl alias facility, which maps bootparams to sysctl values. If a user sets an old value that has a valid alias, a message about an invalid parameter will be printed during boot, and the parameter will get passed to init. Fix by checking for the existence of aliased parameters in the unknown boot parameter code. If an alias exists, don't return an error or pass the value to init. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Cc: stable@vger.kernel.org Fixes: 0a477e1ae21b ("kernel/sysctl: support handling command line aliases") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28SUNRPC: Fix RPC client cleaned up the freed pipefs dentriesfelix
[ Upstream commit bfca5fb4e97c46503ddfc582335917b0cc228264 ] RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir() workqueue,which takes care about pipefs superblock locking. In some special scenarios, when kernel frees the pipefs sb of the current client and immediately alloctes a new pipefs sb, rpc_remove_pipedir function would misjudge the existence of pipefs sb which is not the one it used to hold. As a result, the rpc_remove_pipedir would clean the released freed pipefs dentries. To fix this issue, rpc_remove_pipedir should check whether the current pipefs sb is consistent with the original pipefs sb. This error can be catched by KASAN: ========================================================= [ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200 [ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503 [ 250.500549] Workqueue: events rpc_free_client_work [ 250.501001] Call Trace: [ 250.502880] kasan_report+0xb6/0xf0 [ 250.503209] ? dget_parent+0x195/0x200 [ 250.503561] dget_parent+0x195/0x200 [ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10 [ 250.504384] rpc_rmdir_depopulate+0x1b/0x90 [ 250.504781] rpc_remove_client_dir+0xf5/0x150 [ 250.505195] rpc_free_client_work+0xe4/0x230 [ 250.505598] process_one_work+0x8ee/0x13b0 ... [ 22.039056] Allocated by task 244: [ 22.039390] kasan_save_stack+0x22/0x50 [ 22.039758] kasan_set_track+0x25/0x30 [ 22.040109] __kasan_slab_alloc+0x59/0x70 [ 22.040487] kmem_cache_alloc_lru+0xf0/0x240 [ 22.040889] __d_alloc+0x31/0x8e0 [ 22.041207] d_alloc+0x44/0x1f0 [ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140 [ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110 [ 22.042459] rpc_create_client_dir+0x34/0x150 [ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0 [ 22.043284] rpc_client_register+0x136/0x4e0 [ 22.043689] rpc_new_client+0x911/0x1020 [ 22.044057] rpc_create_xprt+0xcb/0x370 [ 22.044417] rpc_create+0x36b/0x6c0 ... [ 22.049524] Freed by task 0: [ 22.049803] kasan_save_stack+0x22/0x50 [ 22.050165] kasan_set_track+0x25/0x30 [ 22.050520] kasan_save_free_info+0x2b/0x50 [ 22.050921] __kasan_slab_free+0x10e/0x1a0 [ 22.051306] kmem_cache_free+0xa5/0x390 [ 22.051667] rcu_core+0x62c/0x1930 [ 22.051995] __do_softirq+0x165/0x52a [ 22.052347] [ 22.052503] Last potentially related work creation: [ 22.052952] kasan_save_stack+0x22/0x50 [ 22.053313] __kasan_record_aux_stack+0x8e/0xa0 [ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0 [ 22.054209] dentry_free+0xb2/0x140 [ 22.054540] __dentry_kill+0x3be/0x540 [ 22.054900] shrink_dentry_list+0x199/0x510 [ 22.055293] shrink_dcache_parent+0x190/0x240 [ 22.055703] do_one_tree+0x11/0x40 [ 22.056028] shrink_dcache_for_umount+0x61/0x140 [ 22.056461] generic_shutdown_super+0x70/0x590 [ 22.056879] kill_anon_super+0x3a/0x60 [ 22.057234] rpc_kill_sb+0x121/0x200 Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines") Signed-off-by: felix <fuzhen5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28sched/core: Optimize in_task() and in_interrupt() a bitFinn Thain
[ Upstream commit 87c3a5893e865739ce78aa7192d36011022e0af7 ] Except on x86, preempt_count is always accessed with READ_ONCE(). Repeated invocations in macros like irq_count() produce repeated loads. These redundant instructions appear in various fast paths. In the one shown below, for example, irq_count() is evaluated during kernel entry if !tick_nohz_full_cpu(smp_processor_id()). 0001ed0a <irq_enter_rcu>: 1ed0a: 4e56 0000 linkw %fp,#0 1ed0e: 200f movel %sp,%d0 1ed10: 0280 ffff e000 andil #-8192,%d0 1ed16: 2040 moveal %d0,%a0 1ed18: 2028 0008 movel %a0@(8),%d0 1ed1c: 0680 0001 0000 addil #65536,%d0 1ed22: 2140 0008 movel %d0,%a0@(8) 1ed26: 082a 0001 000f btst #1,%a2@(15) 1ed2c: 670c beqs 1ed3a <irq_enter_rcu+0x30> 1ed2e: 2028 0008 movel %a0@(8),%d0 1ed32: 2028 0008 movel %a0@(8),%d0 1ed36: 2028 0008 movel %a0@(8),%d0 1ed3a: 4e5e unlk %fp 1ed3c: 4e75 rts This patch doesn't prevent the pointless btst and beqs instructions above, but it does eliminate 2 of the 3 pointless move instructions here and elsewhere. On x86, preempt_count is per-cpu data and the problem does not arise presumably because the compiler is free to optimize more effectively. This patch was tested on m68k and x86. I was expecting no changes to object code for x86 and mostly that's what I saw. However, there were a few places where code generation was perturbed for some reason. The performance issue addressed here is minor on uniprocessor m68k. I got a 0.01% improvement from this patch for a simple "find /sys -false" benchmark. For architectures and workloads susceptible to cache line bounce the improvement is expected to be larger. The only SMP architecture I have is x86, and as x86 unaffected I have not done any further measurements. Fixes: 15115830c887 ("preempt: Cleanup the macro maze a bit") Signed-off-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/0a403120a682a525e6db2d81d1a3ffcc137c3742.1694756831.git.fthain@linux-m68k.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28pwm: Fix double shift bugDan Carpenter
[ Upstream commit d27abbfd4888d79dd24baf50e774631046ac4732 ] These enums are passed to set/test_bit(). The set/test_bit() functions take a bit number instead of a shifted value. Passing a shifted value is a double shift bug like doing BIT(BIT(1)). The double shift bug doesn't cause a problem here because we are only checking 0 and 1 but if the value was 5 or above then it can lead to a buffer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28string.h: add array-wrappers for (v)memdup_user()Philipp Stanner
[ Upstream commit 313ebe47d75558511aa1237b6e35c663b5c0ec6f ] Currently, user array duplications are sometimes done without an overflow check. Sometimes the checks are done manually; sometimes the array size is calculated with array_size() and sometimes by calculating n * size directly in code. Introduce wrappers for arrays for memdup_user() and vmemdup_user() to provide a standardized and safe way for duplicating user arrays. This is both for new code as well as replacing usage of (v)memdup_user() in existing code that uses, e.g., n * size to calculate array sizes. Suggested-by: David Airlie <airlied@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Zack Rusin <zackr@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230920123612.16914-3-pstanner@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28vsock: read from socket's error queueArseniy Krasnov
[ Upstream commit 49dbe25adac42d3e06f65d1420946bec65896222 ] This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28workqueue: Provide one lock class key per work_on_cpu() callsiteFrederic Weisbecker
[ Upstream commit 265f3ed077036f053981f5eea0b5b43e7c5b39ff ] All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28lib/generic-radix-tree.c: Don't overflow in peek()Kent Overstreet
[ Upstream commit 9492261ff2460252cf2d8de89cdf854c7e2b28a0 ] When we started spreading new inode numbers throughout most of the 64 bit inode space, that triggered some corner case bugs, in particular some integer overflows related to the radix tree code. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20virtio/vsock: replace virtio_vsock_pkt with sk_buffBobby Eshleman
[ Upstream commit 71dc9ec9ac7d3eee785cdc986c3daeb821381e20 ] This commit changes virtio/vsock to use sk_buff instead of virtio_vsock_pkt. Beyond better conforming to other net code, using sk_buff allows vsock to use sk_buff-dependent features in the future (such as sockmap) and improves throughput. This patch introduces the following performance changes: Tool: Uperf Env: Phys Host + L1 Guest Payload: 64k Threads: 16 Test Runs: 10 Type: SOCK_STREAM Before: commit b7bfaa761d760 ("Linux 6.2-rc3") Before ------ g2h: 16.77Gb/s h2g: 10.56Gb/s After ----- g2h: 21.04Gb/s h2g: 10.76Gb/s Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 3a5cc90a4d17 ("vsock/virtio: remove socket from connected/bound list on shutdown") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20Fix termination state for idr_for_each_entry_ul()NeilBrown
[ Upstream commit e8ae8ad479e2d037daa33756e5e72850a7bd37a9 ] The comment for idr_for_each_entry_ul() states after normal termination @entry is left with the value NULL This is not correct in the case where UINT_MAX has an entry in the idr. In that case @entry will be non-NULL after termination. No current code depends on the documentation being correct, but to save future code we should fix it. Also fix idr_for_each_entry_continue_ul(). While this is not documented as leaving @entry as NULL, the mellanox driver appears to depend on it doing so. So make that explicit in the documentation as well as in the code. Fixes: e33d2b74d805 ("idr: fix overflow case for idr_for_each_entry_ul()") Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mi <chrism@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20filemap: add filemap_get_folios_tag()Vishal Moola (Oracle)
[ Upstream commit 247f9e1feef4e57911510c8f82348efb4491ea0e ] This is the equivalent of find_get_pages_range_tag(), except for folios instead of pages. One noteable difference is filemap_get_folios_tag() does not take in a maximum pages argument. It instead tries to fill a folio batch and stops either once full (15 folios) or reaching the end of the search range. The new function supports large folios, the initial function did not since all callers don't use large folios. Link: https://lkml.kernel.org/r/20230104211448.4804-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: c5d3f9b7649a ("f2fs: compress: fix deadloop in f2fs_write_cache_pages()") Signed-off-by: Sasha Levin <sashal@kernel.org>