linux.git - Linux kernel

Age	Commit message (Collapse)	Author
2023-10-25	i2c: mux: Avoid potential false error message in i2c_mux_add_adapter	Heiner Kallweit
	[ Upstream commit b13e59e74ff71a1004e0508107e91e9a84fd7388 ] I2C_CLASS_DEPRECATED is a flag and not an actual class. There's nothing speaking against both, parent and child, having I2C_CLASS_DEPRECATED set. Therefore exclude it from the check. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	btrfs: initialize start_slot in btrfs_log_prealloc_extents	Josef Bacik
	[ Upstream commit b4c639f699349880b7918b861e1bd360442ec450 ] Jens reported a compiler warning when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’: fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used uninitialized [-Wmaybe-uninitialized] 4828 \| ret = copy_items(trans, inode, dst_path, path, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4829 \| start_slot, ins_nr, 1, 0); \| ~~~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here 4725 \| int start_slot; \| ^~~~~~~~~~ The compiler is incorrect, as we only use this code when ins_len > 0, and when ins_len > 0 we have start_slot properly initialized. However we generally find the -Wmaybe-uninitialized warnings valuable, so initialize start_slot to get rid of the warning. Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1	Filipe Manana
	[ Upstream commit 1bf76df3fee56d6637718e267f7c34ed70d0c7dc ] When running a delayed tree reference, if we find a ref count different from 1, we return -EIO. This isn't an IO error, as it indicates either a bug in the delayed refs code or a memory corruption, so change the error code from -EIO to -EUCLEAN. Also tag the branch as 'unlikely' as this is not expected to ever happen, and change the error message to print the tree block's bytenr without the parenthesis (and there was a missing space between the 'block' word and the opening parenthesis), for consistency as that's the style we used everywhere else. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	btrfs: prevent transaction block reserve underflow when starting transaction	Filipe Manana
	[ Upstream commit a7ddeeb079505961355cf0106154da0110f1fdff ] When starting a transaction, with a non-zero number of items, we reserve metadata space for that number of items and for delayed refs by doing a call to btrfs_block_rsv_add(), with the transaction block reserve passed as the block reserve argument. This reserves metadata space and adds it to the transaction block reserve. Later we migrate the space we reserved for delayed references from the transaction block reserve into the delayed refs block reserve, by calling btrfs_migrate_to_delayed_refs_rsv(). btrfs_migrate_to_delayed_refs_rsv() decrements the number of bytes to migrate from the source block reserve, and this however may result in an underflow in case the space added to the transaction block reserve ended up being used by another task that has not reserved enough space for its own use - examples are tasks doing reflinks or hole punching because they end up calling btrfs_replace_file_extents() -> btrfs_drop_extents() and may need to modify/COW a variable number of leaves/paths, so they keep trying to use space from the transaction block reserve when they need to COW an extent buffer, and may end up trying to use more space then they have reserved (1 unit/path only for removing file extent items). This can be avoided by simply reserving space first without adding it to the transaction block reserve, then add the space for delayed refs to the delayed refs block reserve and finally add the remaining reserved space to the transaction block reserve. This also makes the code a bit shorter and simpler. So just do that. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	fs-writeback: do not requeue a clean inode having skipped pages	Chunhai Guo
	[ Upstream commit be049c3a088d512187407b7fd036cecfab46d565 ] When writing back an inode and performing an fsync on it concurrently, a deadlock issue may arise as shown below. In each writeback iteration, a clean inode is requeued to the wb->b_dirty queue due to non-zero pages_skipped, without anything actually being written. This causes an infinite loop and prevents the plug from being flushed, resulting in a deadlock. We now avoid requeuing the clean inode to prevent this issue. wb_writeback fsync (inode-Y) blk_start_plug(&plug) for (;;) { iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback filemap_fdatawrite __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL do_writepages f2fs_write_data_pages __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL f2fs_write_cache_pages f2fs_write_single_data_page f2fs_do_write_data_page f2fs_outplace_write_data f2fs_update_data_blkaddr f2fs_wait_on_page_writeback wait_on_page_writeback // wait for f2fs node page-X iter i: progress = __writeback_inodes_wb(wb, work) . writeback_sb_inodes . __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE . . do_writepages . . f2fs_write_data_pages . . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0) . . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1 . if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC \| I_SYNC_QUEUED . total_wrote++; // total_wrote = 1 . requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped if (progress) // progress = 1 continue; iter i+1: queue_io // similar process with iter i, infinite for-loop ! } blk_finish_plug(&plug) // flush plug won't be called Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230916045131.957929-1-guochunhai@vivo.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	ARM: dts: ti: omap: Fix noisy serial with overrun-throttle-ms for mapphone	Tony Lindgren
	[ Upstream commit 5ad37b5e30433afa7a5513e3eb61f69fa0976785 ] On mapphone devices we may get lots of noise on the micro-USB port in debug uart mode until the phy-cpcap-usb driver probes. Let's limit the noise by using overrun-throttle-ms. Note that there is also a related separate issue where the charger cable connected may cause random sysrq requests until phy-cpcap-usb probes that still remains. Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Cc: Carl Philipp Klemm <philipp@uvos.xyz> Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Pavel Machek <pavel@ucw.cz> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	pwr-mlxbf: extend Kconfig to include gpio-mlxbf3 dependency	David Thompson
	[ Upstream commit 82f07f1acf417b81e793145c167dd5e156024de4 ] The BlueField power handling driver (pwr-mlxbf.c) provides functionality for both BlueField-2 and BlueField-3 based platforms. This driver also depends on the SoC-specific BlueField GPIO driver, whether gpio-mlxbf2 or gpio-mlxbf3. This patch extends the Kconfig definition to include the dependency on the gpio-mlxbf3 driver, if applicable. Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20230823133743.31275-1-davthompson@nvidia.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	iio: adc: ad7192: Correct reference voltage	Alisa-Dariana Roman
	[ Upstream commit 7e7dcab620cd6d34939f615cac63fc0ef7e81c72 ] The avdd and the reference voltage are two different sources but the reference voltage was assigned according to the avdd supply. Add vref regulator structure and set the reference voltage according to the vref supply from the devicetree. In case vref supply is missing, reference voltage is set according to the avdd supply for compatibility with old devicetrees. Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging") Signed-off-by: Alisa-Dariana Roman <alisa.roman@analog.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230924152149.41884-1-alisadariana@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	iio: adc: ad7192: Simplify using devm_regulator_get_enable()	Matti Vaittinen
	[ Upstream commit 1ccef2e6e9205e209ad958d2e591bcca60981007 ] Use devm_regulator_get_enable() instead of open coded get, enable, add-action-to-disable-at-detach - pattern. Also drop the seemingly unused struct member 'dvdd'. Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Link: https://lore.kernel.org/r/9719c445c095d3d308e2fc9f4f93294f5806c41c.1660934107.git.mazziesaccount@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Stable-dep-of: 7e7dcab620cd ("iio: adc: ad7192: Correct reference voltage") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	iio: cros_ec: fix an use-after-free in cros_ec_sensors_push_data()	Tzung-Bi Shih
	[ Upstream commit 7771c8c80d62ad065637ef74ed2962983f6c5f6d ] cros_ec_sensors_push_data() reads `indio_dev->active_scan_mask` and calls iio_push_to_buffers_with_timestamp() without making sure the `indio_dev` stays in buffer mode. There is a race if `indio_dev` exits buffer mode right before cros_ec_sensors_push_data() accesses them. An use-after-free on `indio_dev->active_scan_mask` was observed. The call trace: [...] _find_next_bit cros_ec_sensors_push_data cros_ec_sensorhub_event blocking_notifier_call_chain cros_ec_irq_thread It was caused by a race condition: one thread just freed `active_scan_mask` at [1]; while another thread tried to access the memory at [2]. Fix it by calling iio_device_claim_buffer_mode() to ensure the `indio_dev` can't exit buffer mode during cros_ec_sensors_push_data(). [1]: https://elixir.bootlin.com/linux/v6.5/source/drivers/iio/industrialio-buffer.c#L1189 [2]: https://elixir.bootlin.com/linux/v6.5/source/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c#L198 Cc: stable@vger.kernel.org Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO") Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20230829030622.1571852-1-tzungbi@kernel.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	iio: core: introduce iio_device_{claim\|release}_buffer_mode() APIs	Nuno Sá
	[ Upstream commit 0a8565425afd8ba0e1a0ea73e21da119ee6dacea ] These APIs are analogous to iio_device_claim_direct_mode() and iio_device_release_direct_mode() but, as the name suggests, with the logic flipped. While this looks odd enough, it will have at least two users (in following changes) and it will be important to move the IIO mlock to the private struct. Signed-off-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20221012151620.1725215-2-nuno.sa@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Stable-dep-of: 7771c8c80d62 ("iio: cros_ec: fix an use-after-free in cros_ec_sensors_push_data()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	serial: 8250_omap: Fix errors with no_console_suspend	Tony Lindgren
	[ Upstream commit 560706eff7c8e5621b0d63afe0866e0e1906e87e ] We now get errors on system suspend if no_console_suspend is set as reported by Thomas. The errors started with commit 20a41a62618d ("serial: 8250_omap: Use force_suspend and resume for system suspend"). Let's fix the issue by checking for console_suspend_enabled in the system suspend and resume path. Note that with this fix the checks for console_suspend_enabled in omap8250_runtime_suspend() become useless. We now keep runtime PM usage count for an attached kernel console starting with commit bedb404e91bb ("serial: 8250_port: Don't use power management for kernel console"). Fixes: 20a41a62618d ("serial: 8250_omap: Use force_suspend and resume for system suspend") Cc: stable <stable@kernel.org> Cc: Udit Kumar <u-kumar1@ti.com> Reported-by: Thomas Richard <thomas.richard@bootlin.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Tested-by: Thomas Richard <thomas.richard@bootlin.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Link: https://lore.kernel.org/r/20230926061319.15140-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	serial: 8250: omap: Fix imprecise external abort for omap_8250_pm()	Tony Lindgren
	[ Upstream commit 398cecc24846e867b9f90a0bd22730e3df6b05be ] We must idle the uart only after serial8250_unregister_port(). Otherwise unbinding the uart via sysfs while doing cat on the port produces an imprecise external abort: mem_serial_in from omap_8250_pm+0x44/0xf4 omap_8250_pm from uart_hangup+0xe0/0x194 uart_hangup from __tty_hangup.part.0+0x37c/0x3a8 __tty_hangup.part.0 from uart_remove_one_port+0x9c/0x22c uart_remove_one_port from serial8250_unregister_port+0x60/0xe8 serial8250_unregister_port from omap8250_remove+0x6c/0xd0 omap8250_remove from platform_remove+0x28/0x54 Turns out the driver needs to have runtime PM functional before the driver probe calls serial8250_register_8250_port(). And it needs runtime PM after driver remove calls serial8250_unregister_port(). On probe, we need to read registers before registering the port in omap_serial_fill_features_erratas(). We do that with custom uart_read() already. On remove, after serial8250_unregister_port(), we need to write to the uart registers to idle the device. Let's add a custom uart_write() for that. Currently the uart register access depends on port->membase to be initialized, which won't work after serial8250_unregister_port(). Let's use priv->membase instead, and use it for runtime PM related functions to remove the dependency to port->membase for early and late register access. Note that during use, we need to check for a valid port in the runtime PM related functions. This is needed for the optional wakeup configuration. We now need to set the drvdata a bit earlier so it's available for the runtime PM functions. With the port checks in runtime PM functions, the old checks for priv in omap8250_runtime_suspend() and omap8250_runtime_resume() functions are no longer needed and are removed. Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230508082014.23083-3-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 560706eff7c8 ("serial: 8250_omap: Fix errors with no_console_suspend") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	serial: Reduce spinlocked portion of uart_rs485_config()	Lukas Wunner
	[ Upstream commit 8679328eb859d06a1984ab48d90ac35d11bbcaf1 ] Commit 44b27aec9d96 ("serial: core, 8250: set RS485 termination GPIO in serial core") enabled support for RS485 termination GPIOs behind i2c expanders by setting the GPIO outside of the critical section protected by the port spinlock. Access to the i2c expander may sleep, which caused a splat with the port spinlock held. Commit 7c7f9bc986e6 ("serial: Deassert Transmit Enable on probe in driver-specific way") erroneously regressed that by spinlocking the GPIO manipulation again. Fix by moving uart_rs485_config() (the function manipulating the GPIO) outside of the spinlocked section and acquiring the spinlock inside of uart_rs485_config() for the invocation of ->rs485_config() only. This gets us one step closer to pushing the spinlock down into the ->rs485_config() callbacks which actually need it. (Some callbacks do not want to be spinlocked because they perform sleepable register accesses, see e.g. sc16is7xx_config_rs485().) Stack trace for posterity: Voluntary context switch within RCU read-side critical section! WARNING: CPU: 0 PID: 56 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch Call trace: rcu_note_context_switch __schedule schedule schedule_timeout wait_for_completion_timeout bcm2835_i2c_xfer __i2c_transfer i2c_transfer i2c_transfer_buffer_flags regmap_i2c_write _regmap_raw_write_impl _regmap_bus_raw_write _regmap_write _regmap_update_bits regmap_update_bits_base pca953x_gpio_set_value gpiod_set_raw_value_commit gpiod_set_value_nocheck gpiod_set_value_cansleep uart_rs485_config uart_add_one_port pl011_register_port pl011_probe Fixes: 7c7f9bc986e6 ("serial: Deassert Transmit Enable on probe in driver-specific way") Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v6.1+ Link: https://lore.kernel.org/r/f3a35967c28b32f3c6432d0aa5936e6a9908282d.1695307688.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	serial: Rename uart_change_speed() to uart_change_line_settings()	Ilpo Järvinen
	[ Upstream commit 826736a6c7c8c3185bfb10e03c10d03d53d6cf94 ] uart_change_speed() changes more than just speed so rename it to more generic uart_change_line_settings(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230309080923.11778-6-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8679328eb859 ("serial: Reduce spinlocked portion of uart_rs485_config()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	serial: Move uart_change_speed() earlier	Ilpo Järvinen
	[ Upstream commit 8e90cf29aef77b59ed6a6f6466add2af79621f26 ] Move uart_change_speed() earlier to get rid of its forward declaration. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230309080923.11778-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8679328eb859 ("serial: Reduce spinlocked portion of uart_rs485_config()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub	Javier Carrasco
	[ Upstream commit e59e38158c61162f2e8beb4620df21a1585117df ] The USB2412 is a 2-Port USB 2.0 hub controller that provides a reset pin and a single 3v3 powre source, which makes it suitable to be controlled by the onboard_hub driver. This hub has the same reset timings as USB2514/2517 and the same onboard hub specific-data can be reused for USB2412. Signed-off-by: Javier Carrasco <javier.carrasco@wolfvision.net> Cc: stable <stable@kernel.org> Acked-by: Matthias Kaehlcke <mka@chromium.org> Link: https://lore.kernel.org/r/20230911-topic-2412_onboard_hub-v1-1-7704181ddfff@wolfvision.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	usb: misc: onboard_usb_hub: add Genesys Logic GL3523 hub support	Anand Moon
	[ Upstream commit d97b4b35adcecd4b747d3e1c262e10e4a093cefa ] Genesys Logic GL3523 is a 4-port USB 3.1 hub that has a reset pin to toggle and a 5.0V core supply exported though an integrated LDO is available for powering it. Add the support for this hub, for controlling the reset pin and the core power supply. Signed-off-by: Anand Moon <linux.amoon@gmail.com> [m.felsch@pengutronix.de: include review feedback & port to 6.4] Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Link: https://lore.kernel.org/r/20230623142228.4069084-2-m.felsch@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: e59e38158c61 ("usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	usb: misc: onboard_usb_hub: add Genesys Logic GL852G hub support	Anand Moon
	[ Upstream commit db7cab26c3d1382ec85d8cadf642f57250edea58 ] Genesys Logic GL852G is a 4-port USB 2.0 STT hub that has a reset pin to toggle and a 5.0V core supply exported though an integrated LDO is available for powering it. Add the support for this hub, for controlling the reset pin and the core power supply. Signed-off-by: Anand Moon <linux.amoon@gmail.com> Acked-by: Matthias Kaehlcke <mka@chromium.org> Link: https://lore.kernel.org/r/20230118044418.875-5-linux.amoon@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: e59e38158c61 ("usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	usb: misc: onboard_usb_hub: add Genesys Logic GL850G hub support	Icenowy Zheng
	[ Upstream commit 9bae996ffa28ac03b6d95382a2a082eb219e745a ] Genesys Logic GL850G is a 4-port USB 2.0 STT hub that has a reset pin to toggle and a 3.3V core supply exported (although an integrated LDO is available for powering it with 5V). Add the support for this hub, for controlling the reset pin and the core power supply. Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Acked-by: Matthias Kaehlcke <mka@chromium.org> Link: https://lore.kernel.org/r/20221206055228.306074-4-uwu@icenowy.me Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: e59e38158c61 ("usb: misc: onboard_hub: add support for Microchip USB2412 USB 2.0 hub") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and ↵	Juntong Deng
	hugetlb_reparenting_test.sh that may cause error [ Upstream commit bbe246f875d064ecfb872fe4f66152e743dfd22d ] According to the awk manual, the -e option does not need to be specified in front of 'program' (unless you need to mix program-file). The redundant -e option can cause error when users use awk tools other than gawk (for example, mawk does not support the -e option). Error Example: awk: not an option: -e Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25	net: check for altname conflicts when changing netdev's netns	Jakub Kicinski
	commit 7663d522099ecc464512164e660bc771b2ff7b64 upstream. It's currently possible to create an altname conflicting with an altname or real name of another device by creating it in another netns and moving it over: [ ~]$ ip link add dev eth0 type dummy [ ~]$ ip netns add test [ ~]$ ip -netns test link add dev ethX netns test type dummy [ ~]$ ip -netns test link property add dev ethX altname eth0 [ ~]$ ip -netns test link set dev ethX netns 1 [ ~]$ ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff ... 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff altname eth0 Create a macro for walking the altnames, this hopefully makes it clearer that the list we walk contains only altnames. Which is otherwise not entirely intuitive. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: fix ifname in netlink ntf during netns move	Jakub Kicinski
	commit 311cca40661f428b7aa114fb5af578cfdbe3e8b6 upstream. dev_get_valid_name() overwrites the netdev's name on success. This makes it hard to use in prepare-commit-like fashion, where we do validation first, and "commit" to the change later. Factor out a helper which lets us save the new name to a buffer. Use it to fix the problem of notification on netns move having incorrect name: 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff [ ~]# ip link set dev eth0 netns 1 name eth1 ip monitor inside netns: Deleted inet eth0 Deleted inet6 eth0 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7 Name is reported as eth1 in old netns for ifindex 5, already renamed. Fixes: d90310243fd7 ("net: device name allocation cleanups") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: avoid UAF on deleted altname	Jakub Kicinski
	commit 1a83f4a7c156fa6bbd6b530e89fa3270bf3d9d1b upstream. Altnames are accessed under RCU (dev_get_by_name_rcu()) but freed by kfree() with no synchronization point. Each node has one or two allocations (node and a variable-size name, sometimes the name is netdev->name). Adding rcu_heads here is a bit tedious. Besides most code which unlists the names already has rcu barriers - so take the simpler approach of adding synchronize_rcu(). Note that the one on the unregistration path (which matters more) is removed by the next fix. Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: pktgen: Fix interface flags printing	Gavrilov Ilia
	commit 1d30162f35c7a73fc2f8cdcdcdbd690bedb99d1a upstream. Device flags are displayed incorrectly: 1) The comparison (i == F_FLOW_SEQ) is always false, because F_FLOW_SEQ is equal to (1 << FLOW_SEQ_SHIFT) == 2048, and the maximum value of the 'i' variable is (NR_PKT_FLAG - 1) == 17. It should be compared with FLOW_SEQ_SHIFT. 2) Similarly to the F_IPSEC flag. 3) Also add spaces to the print end of the string literal "spi:%u" to prevent the output from merging with the flag that follows. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 99c6d3d20d62 ("pktgen: Remove brute-force printing of flags") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: phy: bcm7xxx: Add missing 16nm EPHY statistics	Florian Fainelli
	commit 6200e00e112ce2d17b066a20dd2476d9aecbefa6 upstream. The .probe() function would allocate the necessary space and ensure that the library call sizes the number of statistics but the callbacks necessary to fetch the name and values were not wired up. Reported-by: Justin Chen <justin.chen@broadcom.com> Fixes: f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231017205119.416392-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	netfilter: nf_tables: revert do not remove elements if set backend ↵	Pablo Neira Ayuso
	implements .abort commit f86fb94011aeb3b26337fc22204ca726aeb8bc24 upstream. nf_tables_abort_release() path calls nft_set_elem_destroy() for NFT_MSG_NEWSETELEM which releases the element, however, a reference to the element still remains in the working copy. Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	netfilter: nf_tables: do not remove elements if set backend implements .abort	Pablo Neira Ayuso
	commit ebd032fa881882fef2acb9da1bbde48d8233241d upstream. pipapo set backend maintains two copies of the datastructure, removing the elements from the copy that is going to be discarded slows down the abort path significantly, from several minutes to few seconds after this patch. Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	netlink: Correct offload_xstats size	Christoph Paasch
	commit 503930f8e113edc86f92b767efb4ea57bdffffb2 upstream. rtnl_offload_xstats_get_size_hw_s_info_one() conditionalizes the size-computation for IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED based on whether or not the device has offload_xstats enabled. However, rtnl_offload_xstats_fill_hw_s_info_one() is adding the u8 for that field uncondtionally. syzkaller triggered a WARNING in rtnl_stats_get due to this: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 754 at net/core/rtnetlink.c:5982 rtnl_stats_get+0x2f4/0x300 Modules linked in: CPU: 0 PID: 754 Comm: syz-executor148 Not tainted 6.6.0-rc2-g331b78eb12af #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:rtnl_stats_get+0x2f4/0x300 net/core/rtnetlink.c:5982 Code: ff ff 89 ee e8 7d 72 50 ff 83 fd a6 74 17 e8 33 6e 50 ff 4c 89 ef be 02 00 00 00 e8 86 00 fa ff e9 7b fe ff ff e8 1c 6e 50 ff <0f> 0b eb e5 e8 73 79 7b 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffc900006837c0 EFLAGS: 00010293 RAX: ffffffff81cf7f24 RBX: ffff8881015d9000 RCX: ffff888101815a00 RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6 RBP: 00000000ffffffa6 R08: ffffffff81cf7f03 R09: 0000000000000001 R10: ffff888101ba47b9 R11: ffff888101815a00 R12: ffff8881017dae00 R13: ffff8881017dad00 R14: ffffc90000683ab8 R15: ffffffff83c1f740 FS: 00007fbc22dbc740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000046 CR3: 000000010264e003 CR4: 0000000000170ef0 Call Trace: <TASK> rtnetlink_rcv_msg+0x677/0x710 net/core/rtnetlink.c:6480 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2545 netlink_unicast+0x430/0x500 net/netlink/af_netlink.c:1342 netlink_sendmsg+0x4fc/0x620 net/netlink/af_netlink.c:1910 sock_sendmsg+0xa8/0xd0 net/socket.c:730 ____sys_sendmsg+0x22a/0x320 net/socket.c:2541 ___sys_sendmsg+0x143/0x190 net/socket.c:2595 __x64_sys_sendmsg+0xd8/0x150 net/socket.c:2624 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fbc22e8d6a9 Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc4320e778 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004007d0 RCX: 00007fbc22e8d6a9 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000004007d0 R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc4320e898 R13: 00007ffc4320e8a8 R14: 00000000004004a0 R15: 00007fbc22fa5a80 </TASK> ---[ end trace 0000000000000000 ]--- Which didn't happen prior to commit bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head") as the skb always was large enough. Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20231013041448.8229-1-cpaasch@apple.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	netfilter: nft_set_rbtree: .deactivate fails if element has expired	Pablo Neira Ayuso
	commit d111692a59c1470ae530cbb39bcf0346c950ecc7 upstream. This allows to remove an expired element which is not possible in other existing set backends, this is more noticeable if gc-interval is high so expired elements remain in the tree. On-demand gc also does not help in this case, because this is delete element path. Return NULL if element has expired. Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	selftests: netfilter: Run nft_audit.sh in its own netns	Phil Sutter
	commit 2e2d9c7d4d37d74873583d7b0c94eac8b6869486 upstream. Don't mess with the host's firewall ruleset. Since audit logging is not per-netns, add an initial delay of a second so other selftests' netns cleanups have a chance to finish. Fixes: e8dbde59ca3f ("selftests: netfilter: Test nf_tables audit logging") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	selftests: openvswitch: Catch cases where the tests are killed	Aaron Conole
	commit af846afad5ca1c1a24d320adf9e48255e97db84e upstream. In case of fatal signal, or early abort at least cleanup the current test case. Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section	Geert Uytterhoeven
	commit 2915240eddba96b37de4c7e9a3d0ac6f9548454b upstream. When CONFIG_IPV6=n, and building with W=1: In file included from include/trace/define_trace.h:102, from include/trace/events/neigh.h:255, from net/core/net-traces.c:51: include/trace/events/neigh.h: In function ‘trace_event_raw_event_neigh_create’: include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable] 42 \| struct in6_addr pin6; \| ^~~~ include/trace/trace_events.h:402:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’ 402 \| { assign; } \ \| ^~~~~~ include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’ 44 \| PARAMS(assign), \ \| ^~~~~~ include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’ 23 \| TRACE_EVENT(neigh_create, \| ^~~~~~~~~~~ include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’ 41 \| TP_fast_assign( \| ^~~~~~~~~~~~~~ In file included from include/trace/define_trace.h:103, from include/trace/events/neigh.h:255, from net/core/net-traces.c:51: include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’: include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable] 42 \| struct in6_addr pin6; \| ^~~~ include/trace/perf.h:51:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’ 51 \| { assign; } \ \| ^~~~~~ include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’ 44 \| PARAMS(assign), \ \| ^~~~~~ include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’ 23 \| TRACE_EVENT(neigh_create, \| ^~~~~~~~~~~ include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’ 41 \| TP_fast_assign( \| ^~~~~~~~~~~~~~ Indeed, the variable pin6 is declared and initialized unconditionally, while it is only used and needlessly re-initialized when support for IPv6 is enabled. Fix this by dropping the unused variable initialization, and moving the variable declaration inside the existing section protected by a check for CONFIG_IPV6. Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve	Pedro Tammela
	commit a13b67c9a015c4e21601ef9aa4ec9c5d972df1b4 upstream. Christian Theune says: I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script, leaving me with a non-functional uplink on a remote router. A 'rt' curve cannot be used as a inner curve (parent class), but we were allowing such configurations since the qdisc was introduced. Such configurations would trigger a UAF as Budimir explains: The parent will have vttree_insert() called on it in init_vf(), but will not have vttree_remove() called on it in update_vf() because it does not have the HFSC_FSC flag set. The qdisc always assumes that inner classes have the HFSC_FSC flag set. This is by design as it doesn't make sense 'qdisc wise' for an 'rt' curve to be an inner curve. Budimir's original patch disallows users to add classes with a 'rt' parent, but this is too strict as it breaks users that have been using 'rt' as a inner class. Another approach, taken by this patch, is to upgrade the inner 'rt' into a 'sc', warning the user in the process. It avoids the UAF reported by Budimir while also being more permissive to bad scripts/users/code using 'rt' as a inner class. Users checking the `tc class ls [...]` or `tc class get [...]` dumps would observe the curve change and are potentially breaking with this change. v1->v2: https://lore.kernel.org/all/20231013151057.2611860-1-pctammela@mojatatu.com/ - Correct 'Fixes' tag and merge with revert (Jakub) Cc: Christian Theune <ct@flyingcircus.io> Cc: Budimir Markovic <markovicbudimir@gmail.com> Fixes: b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve") Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20231017143602.3191556-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	bonding: Return pointer to data after pull on skb	Jiri Wiesner
	commit d93f3f992780af4a21e6c1ab86946b7c5602f1b9 upstream. Since 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers"), header offsets used to compute a hash in bond_xmit_hash() are relative to skb->data and not skb->head. If the tail of the header buffer of an skb really needs to be advanced and the operation is successful, the pointer to the data must be returned (and not a pointer to the head of the buffer). Fixes: 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers") Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: dsa: bcm_sf2: Fix possible memory leak in bcm_sf2_mdio_register()	Jinjie Ruan
	commit 61b40cefe51af005c72dbdcf975a3d166c6e6406 upstream. In bcm_sf2_mdio_register(), the class_find_device() will call get_device() to increment reference count for priv->master_mii_bus->dev if of_mdio_find_bus() succeeds. If mdiobus_alloc() or mdiobus_register() fails, it will call get_device() twice without decrement reference count for the device. And it is the same if bcm_sf2_mdio_register() succeeds but fails in bcm_sf2_sw_probe(), or if bcm_sf2_sw_probe() succeeds. If the reference count has not decremented to zero, the dev related resource will not be freed. So remove the get_device() in bcm_sf2_mdio_register(), and call put_device() if mdiobus_alloc() or mdiobus_register() fails and in bcm_sf2_mdio_unregister() to solve the issue. And as Simon suggested, unwind from errors for bcm_sf2_mdio_register() and just return 0 if it succeeds to make it cleaner. Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231011032419.2423290-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	i40e: prevent crash on probe if hw registers have invalid values	Michal Schmidt
	commit fc6f716a5069180c40a8c9b63631e97da34f64a3 upstream. The hardware provides the indexes of the first and the last available queue and VF. From the indexes, the driver calculates the numbers of queues and VFs. In theory, a faulty device might say the last index is smaller than the first index. In that case, the driver's calculation would underflow, it would attempt to write to non-existent registers outside of the ioremapped range and crash. I ran into this not by having a faulty device, but by an operator error. I accidentally ran a QE test meant for i40e devices on an ice device. The test used 'echo i40e > /sys/...ice PCI device.../driver_override', bound the driver to the device and crashed in one of the wr32 calls in i40e_clear_hw. Add checks to prevent underflows in the calculations of num_queues and num_vfs. With this fix, the wrong device probing reports errors and returns a failure without crashing. Fixes: 838d41d92a90 ("i40e: clear all queues and interrupts") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20231011233334.336092-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	octeon_ep: update BQL sent bytes before ringing doorbell	Shinas Rasheed
	commit a0ca6b9dfef0b3cc83aa8bb485ed61a018f84982 upstream. Sometimes Tx is completed immediately after doorbell is updated, which causes Tx completion routing to update completion bytes before the same packet bytes are updated in sent bytes in transmit function, hence hitting BUG_ON() in dql_completed(). To avoid this, update BQL sent bytes before ringing doorbell. Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Link: https://lore.kernel.org/r/20231017105030.2310966-1-srasheed@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: usb: smsc95xx: Fix an error code in smsc95xx_reset()	Dan Carpenter
	commit c53647a5df9e66dd9fedf240198e1fe50d88c286 upstream. Return a negative error code instead of success. Fixes: 2f7ca802bdae ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/147927f0-9ada-45cc-81ff-75a19dd30b76@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr	Eric Dumazet
	commit 195374d893681da43a39796e53b30ac4f20400c4 upstream. syzbot reported a data-race while accessing nh->nh_saddr_genid [1] Add annotations, but leave the code lazy as intended. [1] BUG: KCSAN: data-race in fib_select_path / fib_select_path write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1: fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline] fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline] fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0: fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline] fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x959d3217 -> 0x959d3218 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	tun: prevent negative ifindex	Eric Dumazet
	commit cbfbfe3aee718dc4c3c837f5d2463170ee59d78c upstream. After commit 956db0a13b47 ("net: warn about attempts to register negative ifindex") syzbot is able to trigger the following splat. Negative ifindex are not supported. WARNING: CPU: 1 PID: 6003 at net/core/dev.c:9596 dev_index_reserve+0x104/0x210 Modules linked in: CPU: 1 PID: 6003 Comm: syz-executor926 Not tainted 6.6.0-rc4-syzkaller-g19af4a4ed414 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : dev_index_reserve+0x104/0x210 lr : dev_index_reserve+0x100/0x210 sp : ffff800096a878e0 x29: ffff800096a87930 x28: ffff0000d04380d0 x27: ffff0000d04380f8 x26: ffff0000d04380f0 x25: 1ffff00012d50f20 x24: 1ffff00012d50f1c x23: dfff800000000000 x22: ffff8000929c21c0 x21: 00000000ffffffea x20: ffff0000d04380e0 x19: ffff800096a87900 x18: ffff800096a874c0 x17: ffff800084df5008 x16: ffff80008051f9c4 x15: 0000000000000001 x14: 1fffe0001a087198 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff0000d41c9bc0 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffff800091763d88 x4 : 0000000000000000 x3 : ffff800084e04748 x2 : 0000000000000001 x1 : 00000000fead71c7 x0 : 0000000000000000 Call trace: dev_index_reserve+0x104/0x210 register_netdevice+0x598/0x1074 net/core/dev.c:10084 tun_set_iff+0x630/0xb0c drivers/net/tun.c:2850 __tun_chr_ioctl+0x788/0x2af8 drivers/net/tun.c:3118 tun_chr_ioctl+0x38/0x4c drivers/net/tun.c:3403 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:857 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595 irq event stamp: 11348 hardirqs last enabled at (11347): [<ffff80008a716574>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline] hardirqs last enabled at (11347): [<ffff80008a716574>] _raw_spin_unlock_irqrestore+0x38/0x98 kernel/locking/spinlock.c:194 hardirqs last disabled at (11348): [<ffff80008a627820>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:436 softirqs last enabled at (11138): [<ffff8000887ca53c>] spin_unlock_bh include/linux/spinlock.h:396 [inline] softirqs last enabled at (11138): [<ffff8000887ca53c>] release_sock+0x15c/0x1b0 net/core/sock.c:3531 softirqs last disabled at (11136): [<ffff8000887ca41c>] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (11136): [<ffff8000887ca41c>] release_sock+0x3c/0x1b0 net/core/sock.c:3518 Fixes: fb7589a16216 ("tun: Add ability to create tun device with given index") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20231016180851.3560092-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	tcp: Fix listen() warning with v4-mapped-v6 address.	Kuniyuki Iwashima
	commit 8702cf12e6ba91616a72d684e90357977972991b upstream. syzbot reported a warning [0] introduced by commit c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address."). After the cited commit, a v4 socket's address matches the corresponding v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa. During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses bhash and conflicts properly without checking bhash2 so that we need not check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in inet_bind2_bucket_match_addr(). However, the repro shows that we need to check that in a no-conflict case. The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls listen() for the first socket: from socket import * s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s1.bind(('127.0.0.1', 0)) s2 = socket(AF_INET6) s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1])) s1.listen() The second socket should belong to the first socket's tb2, but the second bind() creates another tb2 bucket because inet_bind2_bucket_find() returns NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the corresponding v4 address tb2. bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X) Then, listen() for the first socket calls inet_csk_get_port(), where the v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered. To avoid that, we need to check if v4-mapped-v6 sk address matches with the corresponding v4 address tb2 in inet_bind2_bucket_match(). The same checks are needed in inet_bind2_bucket_addr_match() too, so we can move all checks there and call it from inet_bind2_bucket_match(). Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr and not of sockets in the bucket. This could be refactored later by defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending ::ffff: when creating v4 tb2. [0]: WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Modules linked in: CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31 RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000 RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38 RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200 R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100 FS: 00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256 __inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217 inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239 __sys_listen+0x194/0x270 net/socket.c:1866 __do_sys_listen net/socket.c:1875 [inline] __se_sys_listen net/socket.c:1873 [inline] __x64_sys_listen+0x53/0x80 net/socket.c:1873 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3a5bce3af9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9 RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001 R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Fixes: c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.") Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb	Eric Dumazet
	commit f921a4a5bffa8a0005b190fb9421a7fc1fd716b6 upstream. In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()") we allowed to send an skb regardless of TSQ limits being hit if rtx queue was empty or had a single skb, in order to better fill the pipe when/if TX completions were slow. Then later, commit 75c119afe14f ("tcp: implement rb-tree based retransmit queue") accidentally removed the special case for one skb in rtx queue. Stefan Wahren reported a regression in single TCP flow throughput using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt"). This last commit only made the regression more visible, because it locked the TCP flow on a particular behavior where TSQ prevented two skbs being pushed downstream, adding silences on the wire between each TSO packet. Many thanks to Stefan for his invaluable help ! Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/ Reported-by: Stefan Wahren <wahrenst@gmx.net> Tested-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	tcp: fix excessive TLP and RACK timeouts from HZ rounding	Neal Cardwell
	commit 1c2709cfff1dedbb9591e989e2f001484208d914 upstream. We discovered from packet traces of slow loss recovery on kernels with the default HZ=250 setting (and min_rtt < 1ms) that after reordering, when receiving a SACKed sequence range, the RACK reordering timer was firing after about 16ms rather than the desired value of roughly min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the exact same issue. This commit fixes the TLP transmit timer and RACK reordering timer floor calculation to more closely match the intended 2ms floor even on kernels with HZ=250. It does this by adding in a new TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies, instead of the current approach of converting to jiffies and then adding th TCP_TIMEOUT_MIN value of 2 jiffies. Our testing has verified that on kernels with HZ=1000, as expected, this does not produce significant changes in behavior, but on kernels with the default HZ=250 the latency improvement can be large. For example, our tests show that for HZ=250 kernels at low RTTs this fix roughly halves the latency for the RACK reorder timer: instead of mostly firing at 16ms it mostly fires at 8ms. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: bb4d991a28cc ("tcp: adjust tail loss probe timeout") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231015174700.2206872-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: rfkill: gpio: prevent value glitch during probe	Josua Mayer
	commit b2f750c3a80b285cd60c9346f8c96bd0a2a66cde upstream. When either reset- or shutdown-gpio have are initially deasserted, e.g. after a reboot - or when the hardware does not include pull-down, there will be a short toggle of both IOs to logical 0 and back to 1. It seems that the rfkill default is unblocked, so the driver should not glitch to output low during probe. It can lead e.g. to unexpected lte modem reconnect: [1] root@localhost:~# dmesg \| grep "usb 2-1" [ 2.136124] usb 2-1: new SuperSpeed USB device number 2 using xhci-hcd [ 21.215278] usb 2-1: USB disconnect, device number 2 [ 28.833977] usb 2-1: new SuperSpeed USB device number 3 using xhci-hcd The glitch has been discovered on an arm64 board, now that device-tree support for the rfkill-gpio driver has finally appeared :). Change the flags for devm_gpiod_get_optional from GPIOD_OUT_LOW to GPIOD_ASIS to avoid any glitches. The rfkill driver will set the intended value during rfkill_sync_work. Fixes: 7176ba23f8b5 ("net: rfkill: add generic gpio rfkill driver") Signed-off-by: Josua Mayer <josua@solid-run.com> Link: https://lore.kernel.org/r/20231004163928.14609-1-josua@solid-run.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: ipv6: fix return value check in esp_remove_trailer	Ma Ke
	commit dad4e491e30b20f4dc615c9da65d2142d703b5c2 upstream. In esp_remove_trailer(), to avoid an unexpected result returned by pskb_trim, we should check the return value of pskb_trim(). Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	net: ipv4: fix return value check in esp_remove_trailer	Ma Ke
	commit 513f61e2193350c7a345da98559b80f61aec4fa6 upstream. In esp_remove_trailer(), to avoid an unexpected result returned by pskb_trim, we should check the return value of pskb_trim(). Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	wifi: cfg80211: use system_unbound_wq for wiphy work	Johannes Berg
	commit 91d20ab9d9ca035527af503d00e1e30d6c375f2a upstream. Since wiphy work items can run pretty much arbitrary code in the stack/driver, it can take longer to run all of this, so we shouldn't be using system_wq via schedule_work(). Also, we lock the wiphy (which is the reason this exists), so use system_unbound_wq. Reported-and-tested-by: Kalle Valo <kvalo@kernel.org> Fixes: a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	xfrm: interface: use DEV_STATS_INC()	Eric Dumazet
	commit f7c4e3e5d4f6609b4725a97451948ca2e425379a upstream. syzbot/KCSAN reported data-races in xfrm whenever dev->stats fields are updated. It appears all of these updates can happen from multiple cpus. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. BUG: KCSAN: data-race in xfrmi_xmit / xfrmi_xmit read-write to 0xffff88813726b160 of 8 bytes by task 23986 on cpu 1: xfrmi_xmit+0x74e/0xb20 net/xfrm/xfrm_interface_core.c:583 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1581 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0x74a/0x850 net/ipv4/ip_output.c:230 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:293 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:432 dst_output include/net/dst.h:458 [inline] ip_local_out net/ipv4/ip_output.c:127 [inline] ip_send_skb+0x72/0xe0 net/ipv4/ip_output.c:1487 udp_send_skb+0x6a4/0x990 net/ipv4/udp.c:963 udp_sendmsg+0x1249/0x12d0 net/ipv4/udp.c:1246 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:840 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2540 ___sys_sendmsg net/socket.c:2594 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2680 __do_sys_sendmmsg net/socket.c:2709 [inline] __se_sys_sendmmsg net/socket.c:2706 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2706 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff88813726b160 of 8 bytes by task 23987 on cpu 0: xfrmi_xmit+0x74e/0xb20 net/xfrm/xfrm_interface_core.c:583 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1581 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0x74a/0x850 net/ipv4/ip_output.c:230 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:293 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:432 dst_output include/net/dst.h:458 [inline] ip_local_out net/ipv4/ip_output.c:127 [inline] ip_send_skb+0x72/0xe0 net/ipv4/ip_output.c:1487 udp_send_skb+0x6a4/0x990 net/ipv4/udp.c:963 udp_sendmsg+0x1249/0x12d0 net/ipv4/udp.c:1246 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:840 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2540 ___sys_sendmsg net/socket.c:2594 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2680 __do_sys_sendmmsg net/socket.c:2709 [inline] __se_sys_sendmmsg net/socket.c:2706 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2706 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000000010d7 -> 0x00000000000010d8 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 23987 Comm: syz-executor.5 Not tainted 6.5.0-syzkaller-10885-g0468be89b3fa #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25	xfrm: fix a data-race in xfrm_gen_index()	Eric Dumazet
	commit 3e4bc23926b83c3c67e5f61ae8571602754131a6 upstream. xfrm_gen_index() mutual exclusion uses net->xfrm.xfrm_policy_lock. This means we must use a per-netns idx_generator variable, instead of a static one. Alternative would be to use an atomic variable. syzbot reported: BUG: KCSAN: data-race in xfrm_sk_policy_insert / xfrm_sk_policy_insert write to 0xffffffff87005938 of 4 bytes by task 29466 on cpu 0: xfrm_gen_index net/xfrm/xfrm_policy.c:1385 [inline] xfrm_sk_policy_insert+0x262/0x640 net/xfrm/xfrm_policy.c:2347 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffffff87005938 of 4 bytes by task 29460 on cpu 1: xfrm_sk_policy_insert+0x13e/0x640 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00006ad8 -> 0x00006b18 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 29460 Comm: syz-executor.1 Not tainted 6.5.0-rc5-syzkaller-00243-g9106536c1aa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Fixes: 1121994c803f ("netns xfrm: policy insertion in netns") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>