aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-07random: don't reset crng_init_cnt on urandom_read()Jann Horn
At the moment, urandom_read() (used for /dev/urandom) resets crng_init_cnt to zero when it is called at crng_init<2. This is inconsistent: We do it for /dev/urandom reads, but not for the equivalent getrandom(GRND_INSECURE). (And worse, as Jason pointed out, we're only doing this as long as maxwarn>0.) crng_init_cnt is only read in crng_fast_load(); it is relevant at crng_init==0 for determining when to switch to crng_init==1 (and where in the RNG state array to write). As far as I understand: - crng_init==0 means "we have nothing, we might just be returning the same exact numbers on every boot on every machine, we don't even have non-cryptographic randomness; we should shove every bit of entropy we can get into the RNG immediately" - crng_init==1 means "well we have something, it might not be cryptographic, but at least we're not gonna return the same data every time or whatever, it's probably good enough for TCP and ASLR and stuff; we now have time to build up actual cryptographic entropy in the input pool" - crng_init==2 means "this is supposed to be cryptographically secure now, but we'll keep adding more entropy just to be sure". The current code means that if someone is pulling data from /dev/urandom fast enough at crng_init==0, we'll keep resetting crng_init_cnt, and we'll never make forward progress to crng_init==1. It seems to be intended to prevent an attacker from bruteforcing the contents of small individual RNG inputs on the way from crng_init==0 to crng_init==1, but that's misguided; crng_init==1 isn't supposed to provide proper cryptographic security anyway, RNG users who care about getting secure RNG output have to wait until crng_init==2. This code was inconsistent, and it probably made things worse - just get rid of it. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: avoid superfluous call to RDRAND in CRNG extractionJason A. Donenfeld
RDRAND is not fast. RDRAND is actually quite slow. We've known this for a while, which is why functions like get_random_u{32,64} were converted to use batching of our ChaCha-based CRNG instead. Yet CRNG extraction still includes a call to RDRAND, in the hot path of every call to get_random_bytes(), /dev/urandom, and getrandom(2). This call to RDRAND here seems quite superfluous. CRNG is already extracting things based on a 256-bit key, based on good entropy, which is then reseeded periodically, updated, backtrack-mutated, and so forth. The CRNG extraction construction is something that we're already relying on to be secure and solid. If it's not, that's a serious problem, and it's unlikely that mixing in a measly 32 bits from RDRAND is going to alleviate things. And in the case where the CRNG doesn't have enough entropy yet, we're already initializing the ChaCha key row with RDRAND in crng_init_try_arch_early(). Removing the call to RDRAND improves performance on an i7-11850H by 370%. In other words, the vast majority of the work done by extract_crng() prior to this commit was devoted to fetching 32 bits of RDRAND. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: early initialization of ChaCha constantsDominik Brodowski
Previously, the ChaCha constants for the primary pool were only initialized in crng_initialize_primary(), called by rand_initialize(). However, some randomness is actually extracted from the primary pool beforehand, e.g. by kmem_cache_create(). Therefore, statically initialize the ChaCha constants for the primary pool. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: <linux-crypto@vger.kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: use IS_ENABLED(CONFIG_NUMA) instead of ifdefsJason A. Donenfeld
Rather than an awkward combination of ifdefs and __maybe_unused, we can ensure more source gets parsed, regardless of the configuration, by using IS_ENABLED for the CONFIG_NUMA conditional code. This makes things cleaner and easier to follow. I've confirmed that on !CONFIG_NUMA, we don't wind up with excess code by accident; the generated object file is the same. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: harmonize "crng init done" messagesDominik Brodowski
We print out "crng init done" for !TRUST_CPU, so we should also print out the same for TRUST_CPU. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: mix bootloader randomness into poolJason A. Donenfeld
If we're trusting bootloader randomness, crng_fast_load() is called by add_hwgenerator_randomness(), which sets us to crng_init==1. However, usually it is only called once for an initial 64-byte push, so bootloader entropy will not mix any bytes into the input pool. So it's conceivable that crng_init==1 when crng_initialize_primary() is called later, but then the input pool is empty. When that happens, the crng state key will be overwritten with extracted output from the empty input pool. That's bad. In contrast, if we're not trusting bootloader randomness, we call crng_slow_load() *and* we call mix_pool_bytes(), so that later crng_initialize_primary() isn't drawing on nothing. In order to prevent crng_initialize_primary() from extracting an empty pool, have the trusted bootloader case mirror that of the untrusted bootloader case, mixing the input into the pool. [linux@dominikbrodowski.net: rewrite commit message] Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: do not throw away excess input to crng_fast_loadJason A. Donenfeld
When crng_fast_load() is called by add_hwgenerator_randomness(), we currently will advance to crng_init==1 once we've acquired 64 bytes, and then throw away the rest of the buffer. Usually, that is not a problem: When add_hwgenerator_randomness() gets called via EFI or DT during setup_arch(), there won't be any IRQ randomness. Therefore, the 64 bytes passed by EFI exactly matches what is needed to advance to crng_init==1. Usually, DT seems to pass 64 bytes as well -- with one notable exception being kexec, which hands over 128 bytes of entropy to the kexec'd kernel. In that case, we'll advance to crng_init==1 once 64 of those bytes are consumed by crng_fast_load(), but won't continue onward feeding in bytes to progress to crng_init==2. This commit fixes the issue by feeding any leftover bytes into the next phase in add_hwgenerator_randomness(). [linux@dominikbrodowski.net: rewrite commit message] Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: do not re-init if crng_reseed completes before primary initJason A. Donenfeld
If the bootloader supplies sufficient material and crng_reseed() is called very early on, but not too early that wqs aren't available yet, then we might transition to crng_init==2 before rand_initialize()'s call to crng_initialize_primary() made. Then, when crng_initialize_primary() is called, if we're trusting the CPU's RDRAND instructions, we'll needlessly reinitialize the RNG and emit a message about it. This is mostly harmless, as numa_crng_init() will allocate and then free what it just allocated, and excessive calls to invalidate_batched_entropy() aren't so harmful. But it is funky and the extra message is confusing, so avoid the re-initialization all together by checking for crng_init < 2 in crng_initialize_primary(), just as we already do in crng_reseed(). Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: fix crash on multiple early calls to add_bootloader_randomness()Dominik Brodowski
Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, multiple calls to add_bootloader_randomness() are broken and can cause a NULL pointer dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical problem, as qemu on arm64 may provide bootloader entropy via EFI and via devicetree. On the first call to add_hwgenerator_randomness(), crng_fast_load() is executed, and if the seed is long enough, crng_init will be set to 1. On subsequent calls to add_bootloader_randomness() and then to add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead, wait_event_interruptible() and then credit_entropy_bits() will be called. If the entropy count for that second seed is large enough, that proceeds to crng_reseed(). However, both wait_event_interruptible() and crng_reseed() depends (at least in numa_crng_init()) on workqueues. Therefore, test whether system_wq is already initialized, which is a sufficient indicator that workqueue_init_early() has progressed far enough. If we wind up hitting the !system_wq case, we later want to do what would have been done there when wqs are up, so set a flag, and do that work later from the rand_initialize() call. Reported-by: Ivan T. Ivanov <iivanov@suse.de> Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness") Cc: stable@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> [Jason: added crng_need_done state and related logic.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: do not sign extend bytes for rotation when mixingJason A. Donenfeld
By using `char` instead of `unsigned char`, certain platforms will sign extend the byte when `w = rol32(*bytes++, input_rotate)` is called, meaning that bit 7 is overrepresented when mixing. This isn't a real problem (unless the mixer itself is already broken) since it's still invertible, but it's not quite correct either. Fix this by using an explicit unsigned type. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: use BLAKE2s instead of SHA1 in extractionJason A. Donenfeld
This commit addresses one of the lower hanging fruits of the RNG: its usage of SHA1. BLAKE2s is generally faster, and certainly more secure, than SHA1, which has [1] been [2] really [3] very [4] broken [5]. Additionally, the current construction in the RNG doesn't use the full SHA1 function, as specified, and allows overwriting the IV with RDRAND output in an undocumented way, even in the case when RDRAND isn't set to "trusted", which means potential malicious IV choices. And its short length means that keeping only half of it secret when feeding back into the mixer gives us only 2^80 bits of forward secrecy. In other words, not only is the choice of hash function dated, but the use of it isn't really great either. This commit aims to fix both of these issues while also keeping the general structure and semantics as close to the original as possible. Specifically: a) Rather than overwriting the hash IV with RDRAND, we put it into BLAKE2's documented "salt" and "personal" fields, which were specifically created for this type of usage. b) Since this function feeds the full hash result back into the entropy collector, we only return from it half the length of the hash, just as it was done before. This increases the construction's forward secrecy from 2^80 to a much more comfortable 2^128. c) Rather than using the raw "sha1_transform" function alone, we instead use the full proper BLAKE2s function, with finalization. This also has the advantage of supplying 16 bytes at a time rather than SHA1's 10 bytes, which, in addition to having a faster compression function to begin with, means faster extraction in general. On an Intel i7-11850H, this commit makes initial seeding around 131% faster. BLAKE2s itself has the nice property of internally being based on the ChaCha permutation, which the RNG is already using for expansion, so there shouldn't be any issue with newness, funkiness, or surprising CPU behavior, since it's based on something already in use. [1] https://eprint.iacr.org/2005/010.pdf [2] https://www.iacr.org/archive/crypto2005/36210017/36210017.pdf [3] https://eprint.iacr.org/2015/967.pdf [4] https://shattered.io/static/shattered.pdf [5] https://www.usenix.org/system/files/sec20-leurent.pdf Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07lib/crypto: blake2s: include as built-inJason A. Donenfeld
In preparation for using blake2s in the RNG, we change the way that it is wired-in to the build system. Instead of using ifdefs to select the right symbol, we use weak symbols. And because ARM doesn't need the generic implementation, we make the generic one default only if an arch library doesn't need it already, and then have arch libraries that do need it opt-in. So that the arch libraries can remain tristate rather than bool, we then split the shash part from the glue code. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: fix data race on crng init timeEric Biggers
_extract_crng() does plain loads of crng->init_time and crng_global_init_time, which causes undefined behavior if crng_reseed() and RNDRESEEDCRNG modify these corrently. Use READ_ONCE() and WRITE_ONCE() to make the behavior defined. Don't fix the race on crng->init_time by protecting it with crng->lock, since it's not a problem for duplicate reseedings to occur. I.e., the lockless access with READ_ONCE() is fine. Fixes: d848e5f8e1eb ("random: add new ioctl RNDRESEEDCRNG") Fixes: e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: fix data race on crng_node_poolEric Biggers
extract_crng() and crng_backtrack_protect() load crng_node_pool with a plain load, which causes undefined behavior if do_numa_crng_init() modifies it concurrently. Fix this by using READ_ONCE(). Note: as per the previous discussion https://lore.kernel.org/lkml/20211219025139.31085-1-ebiggers@kernel.org/T/#u, READ_ONCE() is believed to be sufficient here, and it was requested that it be used here instead of smp_load_acquire(). Also change do_numa_crng_init() to set crng_node_pool using cmpxchg_release() instead of mb() + cmpxchg(), as the former is sufficient here but is more lightweight. Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07irq: remove unused flags argument from __handle_irq_event_percpu()Sebastian Andrzej Siewior
The __IRQF_TIMER bit from the flags argument was used in add_interrupt_randomness() to distinguish the timer interrupt from other interrupts. This is no longer the case. Remove the flags argument from __handle_irq_event_percpu(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: remove unused irq_flags argument from add_interrupt_randomness()Sebastian Andrzej Siewior
Since commit ee3e00e9e7101 ("random: use registers from interrupted code for CPU's w/o a cycle counter") the irq_flags argument is no longer used. Remove unused irq_flags. Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Liu <wei.liu@kernel.org> Cc: linux-hyperv@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07random: document add_hwgenerator_randomness() with other input functionsMark Brown
The section at the top of random.c which documents the input functions available does not document add_hwgenerator_randomness() which might lead a reader to overlook it. Add a brief note about it. Signed-off-by: Mark Brown <broonie@kernel.org> [Jason: reorganize position of function in doc comment and also document add_bootloader_randomness() while we're at it.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07MAINTAINERS: add git tree for random.cJason A. Donenfeld
This is handy not just for humans, but also so that the 0-day bot can automatically test posted mailing list patches against the right tree. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-05Merge tag 'net-5.16-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski" "Networking fixes, including fixes from bpf, and WiFi. One last pull request, turns out some of the recent fixes did more harm than good. Current release - regressions: - Revert "xsk: Do not sleep in poll() when need_wakeup set", made the problem worse - Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register", broke EPROBE_DEFER handling - Revert "net: usb: r8152: Add MAC pass-through support for more Lenovo Docks", broke setups without a Lenovo dock Current release - new code bugs: - selftests: set amt.sh executable Previous releases - regressions: - batman-adv: mcast: don't send link-local multicast to mcast routers Previous releases - always broken: - ipv4/ipv6: check attribute length for RTA_FLOW / RTA_GATEWAY - sctp: hold endpoint before calling cb in sctp_transport_lookup_process - mac80211: mesh: embed mesh_paths and mpp_paths into ieee80211_if_mesh to avoid complicated handling of sub-object allocation failures - seg6: fix traceroute in the presence of SRv6 - tipc: fix a kernel-infoleak in __tipc_sendmsg()" * tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) selftests: set amt.sh executable Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks" sfc: The RX page_ring is optional iavf: Fix limit of total number of queues to active queues of VF i40e: Fix incorrect netdev's real number of RX/TX queues i40e: Fix for displaying message regarding NVM version i40e: fix use-after-free in i40e_sync_filters_subtask() i40e: Fix to not show opcode msg on unsuccessful VF MAC change ieee802154: atusb: fix uninit value in atusb_set_extended_addr mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh mac80211: initialize variable have_higher_than_11mbit sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc netrom: fix copying in user data in nr_setsockopt udp6: Use Segment Routing Header for dest address if present icmp: ICMPV6: Examine invoking packet for Segment Route Headers. seg6: export get_srh() for ICMP handling Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register" ipv6: Do cleanup if attribute validation fails in multipath route ipv6: Continue processing multipath route even if gateway attribute is invalid net/fsl: Remove leftover definition in xgmac_mdio ...
2022-01-05selftests: set amt.sh executableTaehee Yoo
amt.sh test script will not work because it doesn't have execution permission. So, it adds execution permission. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: c08e8baea78e ("selftests: add amt interface selftest script") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220105144436.13415-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"Aaron Ma
This reverts commit f77b83b5bbab53d2be339184838b19ed2c62c0a5. This change breaks multiple usb to ethernet dongles attached on Lenovo USB hub. Fixes: f77b83b5bbab ("net: usb: r8152: Add MAC passthrough support for more Lenovo Docks") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Link: https://lore.kernel.org/r/20220105155102.8557-1-aaron.ma@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05Merge tag 'gpio-fixes-for-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "Here are two last fixes for this release cycle from the GPIO subsystem: - fix irq offset calculation in gpio-aspeed-sgpio - update the MAINTAINERS entry for gpio-brcmstb" * tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: update gpio-brcmstb maintainers gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler
2022-01-05Merge tag 'ieee802154-for-net-2022-01-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2022-01-05 Below I have a last minute fix for the atusb driver. Pavel fixes a KASAN uninit report for the driver. This version is the minimal impact fix to ease backporting. A bigger rework of the driver to avoid potential similar problems is ongoing and will come through net-next when ready. * tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: ieee802154: atusb: fix uninit value in atusb_set_extended_addr ==================== Link: https://lore.kernel.org/r/20220105153914.512305-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-05Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-01-04 This series contains updates to i40e and iavf drivers. Mateusz adjusts displaying of failed VF MAC message when the failure is expected as well as modifying an NVM info message to not confuse the user for i40e. Di Zhu fixes a use-after-free issue MAC filters for i40e. Jedrzej fixes an issue with misreporting of Rx and Tx queues during reinitialization for i40e. Karen correct checking of channel queue configuration to occur against active queues for iavf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04sfc: The RX page_ring is optionalMartin Habets
The RX page_ring is an optional feature that improves performance. When allocation fails the driver can still function, but possibly with a lower bandwidth. Guard against dereferencing a NULL page_ring. Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reported-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Link: https://lore.kernel.org/r/164111288276.5798.10330502993729113868.stgit@palantir17.mph.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-04iavf: Fix limit of total number of queues to active queues of VFKaren Sornek
In the absence of this validation, if the user requests to configure queues more than the enabled queues, it results in sending the requested number of queues to the kernel stack (due to the asynchronous nature of VF response), in which case the stack might pick a queue to transmit that is not enabled and result in Tx hang. Fix this bug by limiting the total number of queues allocated for VF to active queues of VF. Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf") Signed-off-by: Ashwin Vijayavel <ashwin.vijayavel@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-04i40e: Fix incorrect netdev's real number of RX/TX queuesJedrzej Jagielski
There was a wrong queues representation in sysfs during driver's reinitialization in case of online cpus number is less than combined queues. It was caused by stopped NetworkManager, which is responsible for calling vsi_open function during driver's initialization. In specific situation (ex. 12 cpus online) there were 16 queues in /sys/class/net/<iface>/queues. In case of modifying queues with value higher, than number of online cpus, then it caused write errors and other errors. Add updating of sysfs's queues representation during driver initialization. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Lukasz Cieplicki <lukaszx.cieplicki@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-04i40e: Fix for displaying message regarding NVM versionMateusz Palczewski
When loading the i40e driver, it prints a message like: 'The driver for the device detected a newer version of the NVM image v1.x than expected v1.y. Please install the most recent version of the network driver.' This is misleading as the driver is working as expected. Fix that by removing the second part of message and changing it from dev_info to dev_dbg. Fixes: 4fb29bddb57f ("i40e: The driver now prints the API version in error message") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-04i40e: fix use-after-free in i40e_sync_filters_subtask()Di Zhu
Using ifconfig command to delete the ipv6 address will cause the i40e network card driver to delete its internal mac_filter and i40e_service_task kernel thread will concurrently access the mac_filter. These two processes are not protected by lock so causing the following use-after-free problems. print_address_description+0x70/0x360 ? vprintk_func+0x5e/0xf0 kasan_report+0x1b2/0x330 i40e_sync_vsi_filters+0x4f0/0x1850 [i40e] i40e_sync_filters_subtask+0xe3/0x130 [i40e] i40e_service_task+0x195/0x24c0 [i40e] process_one_work+0x3f5/0x7d0 worker_thread+0x61/0x6c0 ? process_one_work+0x7d0/0x7d0 kthread+0x1c3/0x1f0 ? kthread_park+0xc0/0xc0 ret_from_fork+0x35/0x40 Allocated by task 2279810: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc_trace+0xf3/0x1e0 i40e_add_filter+0x127/0x2b0 [i40e] i40e_add_mac_filter+0x156/0x190 [i40e] i40e_addr_sync+0x2d/0x40 [i40e] __hw_addr_sync_dev+0x154/0x210 i40e_set_rx_mode+0x6d/0xf0 [i40e] __dev_set_rx_mode+0xfb/0x1f0 __dev_mc_add+0x6c/0x90 igmp6_group_added+0x214/0x230 __ipv6_dev_mc_inc+0x338/0x4f0 addrconf_join_solict.part.7+0xa2/0xd0 addrconf_dad_work+0x500/0x980 process_one_work+0x3f5/0x7d0 worker_thread+0x61/0x6c0 kthread+0x1c3/0x1f0 ret_from_fork+0x35/0x40 Freed by task 2547073: __kasan_slab_free+0x130/0x180 kfree+0x90/0x1b0 __i40e_del_filter+0xa3/0xf0 [i40e] i40e_del_mac_filter+0xf3/0x130 [i40e] i40e_addr_unsync+0x85/0xa0 [i40e] __hw_addr_sync_dev+0x9d/0x210 i40e_set_rx_mode+0x6d/0xf0 [i40e] __dev_set_rx_mode+0xfb/0x1f0 __dev_mc_del+0x69/0x80 igmp6_group_dropped+0x279/0x510 __ipv6_dev_mc_dec+0x174/0x220 addrconf_leave_solict.part.8+0xa2/0xd0 __ipv6_ifa_notify+0x4cd/0x570 ipv6_ifa_notify+0x58/0x80 ipv6_del_addr+0x259/0x4a0 inet6_addr_del+0x188/0x260 addrconf_del_ifaddr+0xcc/0x130 inet6_ioctl+0x152/0x190 sock_do_ioctl+0xd8/0x2b0 sock_ioctl+0x2e5/0x4c0 do_vfs_ioctl+0x14e/0xa80 ksys_ioctl+0x7c/0xa0 __x64_sys_ioctl+0x42/0x50 do_syscall_64+0x98/0x2c0 entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Di Zhu <zhudi2@huawei.com> Signed-off-by: Rui Zhang <zhangrui182@huawei.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-04i40e: Fix to not show opcode msg on unsuccessful VF MAC changeMateusz Palczewski
Hide i40e opcode information sent during response to VF in case when untrusted VF tried to change MAC on the VF interface. This is implemented by adding an additional parameter 'hide' to the response sent to VF function that hides the display of error information, but forwards the error code to VF. Previously it was not possible to send response with some error code to VF without displaying opcode information. Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Reviewed-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-04ieee802154: atusb: fix uninit value in atusb_set_extended_addrPavel Skripkin
Alexander reported a use of uninitialized value in atusb_set_extended_addr(), that is caused by reading 0 bytes via usb_control_msg(). Fix it by validating if the number of bytes transferred is actually correct, since usb_control_msg() may read less bytes, than was requested by caller. Fail log: BUG: KASAN: uninit-cmp in ieee802154_is_valid_extended_unicast_addr include/linux/ieee802154.h:310 [inline] BUG: KASAN: uninit-cmp in atusb_set_extended_addr drivers/net/ieee802154/atusb.c:1000 [inline] BUG: KASAN: uninit-cmp in atusb_probe.cold+0x29f/0x14db drivers/net/ieee802154/atusb.c:1056 Uninit value used in comparison: 311daa649a2003bd stack handle: 000000009a2003bd ieee802154_is_valid_extended_unicast_addr include/linux/ieee802154.h:310 [inline] atusb_set_extended_addr drivers/net/ieee802154/atusb.c:1000 [inline] atusb_probe.cold+0x29f/0x14db drivers/net/ieee802154/atusb.c:1056 usb_probe_interface+0x314/0x7f0 drivers/usb/core/driver.c:396 Fixes: 7490b008d123 ("ieee802154: add support for atusb transceiver") Reported-by: Alexander Potapenko <glider@google.com> Acked-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20220104182806.7188-1-paskripkin@gmail.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-01-04Merge tag 'mac80211-for-net-2022-01-04' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more changes: - mac80211: initialize a variable to avoid using it uninitialized - mac80211 mesh: put some data structures into the container to fix bugs with and not have to deal with allocation failures * tag 'mac80211-for-net-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh mac80211: initialize variable have_higher_than_11mbit ==================== Link: https://lore.kernel.org/r/20220104144449.64937-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-04mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_meshPavel Skripkin
Syzbot hit NULL deref in rhashtable_free_and_destroy(). The problem was in mesh_paths and mpp_paths being NULL. mesh_pathtbl_init() could fail in case of memory allocation failure, but nobody cared, since ieee80211_mesh_init_sdata() returns void. It led to leaving 2 pointers as NULL. Syzbot has found null deref on exit path, but it could happen anywhere else, because code assumes these pointers are valid. Since all ieee80211_*_setup_sdata functions are void and do not fail, let's embedd mesh_paths and mpp_paths into parent struct to avoid adding error handling on higher levels and follow the pattern of others setup_sdata functions Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable") Reported-and-tested-by: syzbot+860268315ba86ea6b96b@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20211230195547.23977-1-paskripkin@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04mac80211: initialize variable have_higher_than_11mbitTom Rix
Clang static analysis reports this warnings mlme.c:5332:7: warning: Branch condition evaluates to a garbage value have_higher_than_11mbit) ^~~~~~~~~~~~~~~~~~~~~~~ have_higher_than_11mbit is only set to true some of the time in ieee80211_get_rates() but is checked all of the time. So have_higher_than_11mbit needs to be initialized to false. Fixes: 5d6a1b069b7f ("mac80211: set basic rates earlier") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20211223162848.3243702-1-trix@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-01-04sch_qfq: prevent shift-out-of-bounds in qfq_init_qdiscEric Dumazet
tx_queue_len can be set to ~0U, we need to be more careful about overflows. __fls(0) is undefined, as this report shows: UBSAN: shift-out-of-bounds in net/sched/sch_qfq.c:1430:24 shift exponent 51770272 is too large for 32-bit type 'int' CPU: 0 PID: 25574 Comm: syz-executor.0 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x201/0x2d8 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x494/0x530 lib/ubsan.c:330 qfq_init_qdisc+0x43f/0x450 net/sched/sch_qfq.c:1430 qdisc_create+0x895/0x1430 net/sched/sch_api.c:1253 tc_modify_qdisc+0x9d9/0x1e20 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0xaea/0xe60 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x5b9/0x910 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmsg+0x280/0x370 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04netrom: fix copying in user data in nr_setsockoptChristoph Hellwig
This code used to copy in an unsigned long worth of data before the sockptr_t conversion, so restore that. Fixes: a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04Merge branch 'srv6-traceroute'David S. Miller
Andrew Lunn says: ==================== Fix traceroute in the presence of SRv6 When using SRv6 the destination IP address in the IPv6 header is not always the true destination, it can be a router along the path that SRv6 is using. When ICMP reports an error, e.g, time exceeded, which is what traceroute uses, it included the packet which invoked the error into the ICMP message body. Upon receiving such an ICMP packet, the invoking packet is examined and an attempt is made to find the socket which sent the packet, so the error can be reported. Lookup is performed using the source and destination address. If the intermediary router IP address from the IP header is used, the lookup fails. It is necessary to dig into the header and find the true destination address in the Segment Router header, SRH. v2: Play games with the skb->network_header rather than clone the skb v3: Move helpers into seg6.c v4: Move short helper into header file. Rework getting SRH destination address v5: Fix comment to describe function, not caller Patch 1 exports a helper which can find the SRH in a packet Patch 2 does the actual examination of the invoking packet Patch 3 makes use of the results when trying to find the socket. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04udp6: Use Segment Routing Header for dest address if presentAndrew Lunn
When finding the socket to report an error on, if the invoking packet is using Segment Routing, the IPv6 destination address is that of an intermediate router, not the end destination. Extract the ultimate destination address from the segment address. This change allows traceroute to function in the presence of Segment Routing. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04icmp: ICMPV6: Examine invoking packet for Segment Route Headers.Andrew Lunn
RFC8754 says: ICMP error packets generated within the SR domain are sent to source nodes within the SR domain. The invoking packet in the ICMP error message may contain an SRH. Since the destination address of a packet with an SRH changes as each segment is processed, it may not be the destination used by the socket or application that generated the invoking packet. For the source of an invoking packet to process the ICMP error message, the ultimate destination address of the IPv6 header may be required. The following logic is used to determine the destination address for use by protocol-error handlers. * Walk all extension headers of the invoking IPv6 packet to the routing extension header preceding the upper-layer header. - If routing header is type 4 Segment Routing Header (SRH) o The SID at Segment List[0] may be used as the destination address of the invoking packet. Mangle the skb so the network header points to the invoking packet inside the ICMP packet. The seg6 helpers can then be used on the skb to find any segment routing headers. If found, mark this fact in the IPv6 control block of the skb, and store the offset into the packet of the SRH. Then restore the skb back to its old state. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04seg6: export get_srh() for ICMP handlingAndrew Lunn
An ICMP error message can contain in its message body part of an IPv6 packet which invoked the error. Such a packet might contain a segment router header. Export get_srh() so the ICMP code can make use of it. Since his changes the scope of the function from local to global, add the seg6_ prefix to keep the namespace clean. And move it into seg6.c so it is always available, not just when IPV6_SEG6_LWTUNNEL is enabled. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-03Merge tag 'batadv-net-pullrequest-20220103' of ↵Jakub Kicinski
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - avoid sending link-local multicast to multicast routers, by Linus Lüssing * tag 'batadv-net-pullrequest-20220103' of git://git.open-mesh.org/linux-merge: batman-adv: mcast: don't send link-local multicast to mcast routers ==================== Link: https://lore.kernel.org/r/20220103171203.1124980-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-03Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in ↵Florian Fainelli
__fixed_phy_register" This reverts commit b45396afa4177f2b1ddfeff7185da733fade1dc3 ("net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register") since it prevents any system that uses a fixed PHY without a GPIO descriptor from properly working: [ 5.971952] brcm-systemport 9300000.ethernet: failed to register fixed PHY [ 5.978854] brcm-systemport: probe of 9300000.ethernet failed with error -22 [ 5.986047] brcm-systemport 9400000.ethernet: failed to register fixed PHY [ 5.992947] brcm-systemport: probe of 9400000.ethernet failed with error -22 Fixes: b45396afa417 ("net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220103193453.1214961-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-03ipv6: Do cleanup if attribute validation fails in multipath routeDavid Ahern
As Nicolas noted, if gateway validation fails walking the multipath attribute the code should jump to the cleanup to free previously allocated memory. Fixes: 1ff15a710a86 ("ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route") Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20220103170555.94638-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-03ipv6: Continue processing multipath route even if gateway attribute is invalidDavid Ahern
ip6_route_multipath_del loop continues processing the multipath attribute even if delete of a nexthop path fails. For consistency, do the same if the gateway attribute is invalid. Fixes: 1ff15a710a86 ("ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route") Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20220103171911.94739-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-03MAINTAINERS: update gpio-brcmstb maintainersGregory Fong
Add Doug and Florian as maintainers for gpio-brcmstb, and remove myself. Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-01-03gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handlerSteven Lee
Each aspeed sgpio bank has 64 gpio pins(32 input pins and 32 output pins). The hwirq base for each sgpio bank should be multiples of 64 rather than multiples of 32. Signed-off-by: Steven Lee <steven_lee@aspeedtech.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-01-02Linux 5.16-rc8Linus Torvalds
2022-01-02Merge tag 'perf-tools-fixes-for-v5.16-2022-01-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix TUI exit screen refresh race condition in 'perf top'. - Fix parsing of Intel PT VM time correlation arguments. - Honour CPU filtering command line request of a script's switch events in 'perf script'. - Fix printing of switch events in Intel PT python script. - Fix duplicate alias events list printing in 'perf list', noticed on heterogeneous arm64 systems. - Fix return value of ids__new(), users expect NULL for failure, not ERR_PTR(-ENOMEM). * tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf top: Fix TUI exit screen refresh race condition perf pmu: Fix alias events list perf scripts python: intel-pt-events.py: Fix printing of switch events perf script: Fix CPU filtering of a script's switch events perf intel-pt: Fix parsing of VM time correlation arguments perf expr: Fix return value of ids__new()
2022-01-02net/fsl: Remove leftover definition in xgmac_mdioMarkus Koch
commit 26eee0210ad7 ("net/fsl: fix a bug in xgmac_mdio") fixed a bug in the QorIQ mdio driver but left the (now unused) incorrect bit definition for MDIO_DATA_BSY in the code. This commit removes it. Signed-off-by: Markus Koch <markus@notsyncing.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-02Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Better input validation for compat ioctls and a documentation bugfix for 5.16" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Docs: Fixes link to I2C specification i2c: validate user data in compat ioctl