aboutsummaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2022-05-18random: use proper jiffies comparison macroJason A. Donenfeld
This expands to exactly the same code that it replaces, but makes things consistent by using the same macro for jiffy comparisons throughout. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18random: remove ratelimiting for in-kernel unseeded randomnessJason A. Donenfeld
The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the kernel warns about all unseeded randomness or just the first instance. There's some complicated rate limiting and comparison to the previous caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled, developers still don't see all the messages or even an accurate count of how many were missed. This is the result of basically parallel mechanisms aimed at accomplishing more or less the same thing, added at different points in random.c history, which sort of compete with the first-instance-only limiting we have now. It turns out, however, that nobody cares about the first unseeded randomness instance of in-kernel users. The same first user has been there for ages now, and nobody is doing anything about it. It isn't even clear that anybody _can_ do anything about it. Most places that can do something about it have switched over to using get_random_bytes_wait() or wait_for_random_bytes(), which is the right thing to do, but there is still much code that needs randomness sometimes during init, and as a geeneral rule, if you're not using one of the _wait functions or the readiness notifier callback, you're bound to be doing it wrong just based on that fact alone. So warning about this same first user that can't easily change is simply not an effective mechanism for anything at all. Users can't do anything about it, as the Kconfig text points out -- the problem isn't in userspace code -- and kernel developers don't or more often can't react to it. Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM is set, so that developers can debug things need be, or if it isn't set, don't show a warning at all. At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting random.ratelimit_disable=1 on by default, since if you care about one you probably care about the other too. And we can clean up usage around the related urandom_warning ratelimiter as well (whose behavior isn't changing), so that it properly counts missed messages after the 10 message threshold is reached. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18random: move initialization out of reseeding hot pathJason A. Donenfeld
Initialization happens once -- by way of credit_init_bits() -- and then it never happens again. Therefore, it doesn't need to be in crng_reseed(), which is a hot path that is called multiple times. It also doesn't make sense to have there, as initialization activity is better associated with initialization routines. After the prior commit, crng_reseed() now won't be called by multiple concurrent callers, which means that we can safely move the "finialize_init" logic into crng_init_bits() unconditionally. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18random: avoid initializing twice in credit raceJason A. Donenfeld
Since all changes of crng_init now go through credit_init_bits(), we can fix a long standing race in which two concurrent callers of credit_init_bits() have the new bit count >= some threshold, but are doing so with crng_init as a lower threshold, checked outside of a lock, resulting in crng_reseed() or similar being called twice. In order to fix this, we can use the original cmpxchg value of the bit count, and only change crng_init when the bit count transitions from below a threshold to meeting the threshold. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18random: use symbolic constants for crng_init statesJason A. Donenfeld
crng_init represents a state machine, with three states, and various rules for transitions. For the longest time, we've been managing these with "0", "1", and "2", and expecting people to figure it out. To make the code more obvious, replace these with proper enum values representing the transition, and then redocument what each of these states mean. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18siphash: use one source of truth for siphash permutationsJason A. Donenfeld
The SipHash family of permutations is currently used in three places: - siphash.c itself, used in the ordinary way it was intended. - random32.c, in a construction from an anonymous contributor. - random.c, as part of its fast_mix function. Each one of these places reinvents the wheel with the same C code, same rotation constants, and same symmetry-breaking constants. This commit tidies things up a bit by placing macros for the permutations and constants into siphash.h, where each of the three .c users can access them. It also leaves a note dissuading more users of them from emerging. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18random: help compiler out with fast_mix() by using simpler argumentsJason A. Donenfeld
Now that fast_mix() has more than one caller, gcc no longer inlines it. That's fine. But it also doesn't handle the compound literal argument we pass it very efficiently, nor does it handle the loop as well as it could. So just expand the code to spell out this function so that it generates the same code as it did before. Performance-wise, this now behaves as it did before the last commit. The difference in actual code size on x86 is 45 bytes, which is less than a cache line. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18random: do not use input pool from hard IRQsJason A. Donenfeld
Years ago, a separate fast pool was added for interrupts, so that the cost associated with taking the input pool spinlocks and mixing into it would be avoided in places where latency is critical. However, one oversight was that add_input_randomness() and add_disk_randomness() still sometimes are called directly from the interrupt handler, rather than being deferred to a thread. This means that some unlucky interrupts will be caught doing a blake2s_compress() call and potentially spinning on input_pool.lock, which can also be taken by unprivileged users by writing into /dev/urandom. In order to fix this, add_timer_randomness() now checks whether it is being called from a hard IRQ and if so, just mixes into the per-cpu IRQ fast pool using fast_mix(), which is much faster and can be done lock-free. A nice consequence of this, as well, is that it means hard IRQ context FPU support is likely no longer useful. The entropy estimation algorithm used by add_timer_randomness() is also somewhat different than the one used for add_interrupt_randomness(). The former looks at deltas of deltas of deltas, while the latter just waits for 64 interrupts for one bit or for one second since the last bit. In order to bridge these, and since add_interrupt_randomness() runs after an add_timer_randomness() that's called from hard IRQ, we add to the fast pool credit the related amount, and then subtract one to account for add_interrupt_randomness()'s contribution. A downside of this, however, is that the num argument is potentially attacker controlled, which puts a bit more pressure on the fast_mix() sponge to do more than it's really intended to do. As a mitigating factor, the first 96 bits of input aren't attacker controlled (a cycle counter followed by zeros), which means it's essentially two rounds of siphash rather than one, which is somewhat better. It's also not that much different from add_interrupt_randomness()'s use of the irq stack instruction pointer register. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Filipe Manana <fdmanana@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-16random: order timer entropy functions below interrupt functionsJason A. Donenfeld
There are no code changes here; this is just a reordering of functions, so that in subsequent commits, the timer entropy functions can call into the interrupt ones. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-15random: do not pretend to handle premature next security modelJason A. Donenfeld
Per the thread linked below, "premature next" is not considered to be a realistic threat model, and leads to more serious security problems. "Premature next" is the scenario in which: - Attacker compromises the current state of a fully initialized RNG via some kind of infoleak. - New bits of entropy are added directly to the key used to generate the /dev/urandom stream, without any buffering or pooling. - Attacker then, somehow having read access to /dev/urandom, samples RNG output and brute forces the individual new bits that were added. - Result: the RNG never "recovers" from the initial compromise, a so-called violation of what academics term "post-compromise security". The usual solutions to this involve some form of delaying when entropy gets mixed into the crng. With Fortuna, this involves multiple input buckets. With what the Linux RNG was trying to do prior, this involves entropy estimation. However, by delaying when entropy gets mixed in, it also means that RNG compromises are extremely dangerous during the window of time before the RNG has gathered enough entropy, during which time nonces may become predictable (or repeated), ephemeral keys may not be secret, and so forth. Moreover, it's unclear how realistic "premature next" is from an attack perspective, if these attacks even make sense in practice. Put together -- and discussed in more detail in the thread below -- these constitute grounds for just doing away with the current code that pretends to handle premature next. I say "pretends" because it wasn't doing an especially great job at it either; should we change our mind about this direction, we would probably implement Fortuna to "fix" the "problem", in which case, removing the pretend solution still makes sense. This also reduces the crng reseed period from 5 minutes down to 1 minute. The rationale from the thread might lead us toward reducing that even further in the future (or even eliminating it), but that remains a topic of a future commit. At a high level, this patch changes semantics from: Before: Seed for the first time after 256 "bits" of estimated entropy have been accumulated since the system booted. Thereafter, reseed once every five minutes, but only if 256 new "bits" have been accumulated since the last reseeding. After: Seed for the first time after 256 "bits" of estimated entropy have been accumulated since the system booted. Thereafter, reseed once every minute. Most of this patch is renaming and removing: POOL_MIN_BITS becomes POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(), crng_reseed() loses its "force" parameter since it's now always true, the drain_entropy() function no longer has any use so it's removed, entropy estimation is skipped if we've already init'd, the various notifiers for "low on entropy" are now only active prior to init, and finally, some documentation comments are cleaned up here and there. Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ Cc: Theodore Ts'o <tytso@mit.edu> Cc: Nadia Heninger <nadiah@cs.ucsd.edu> Cc: Tom Ristenpart <ristenpart@cornell.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13random: use first 128 bits of input as fast initJason A. Donenfeld
Before, the first 64 bytes of input, regardless of how entropic it was, would be used to mutate the crng base key directly, and none of those bytes would be credited as having entropy. Then 256 bits of credited input would be accumulated, and only then would the rng transition from the earlier "fast init" phase into being actually initialized. The thinking was that by mixing and matching fast init and real init, an attacker who compromised the fast init state, considered easy to do given how little entropy might be in those first 64 bytes, would then be able to bruteforce bits from the actual initialization. By keeping these separate, bruteforcing became impossible. However, by not crediting potentially creditable bits from those first 64 bytes of input, we delay initialization, and actually make the problem worse, because it means the user is drawing worse random numbers for a longer period of time. Instead, we can take the first 128 bits as fast init, and allow them to be credited, and then hold off on the next 128 bits until they've accumulated. This is still a wide enough margin to prevent bruteforcing the rng state, while still initializing much faster. Then, rather than trying to piecemeal inject into the base crng key at various points, instead just extract from the pool when we need it, for the crng_init==0 phase. Performance may even be better for the various inputs here, since there are likely more calls to mix_pool_bytes() then there are to get_random_bytes() during this phase of system execution. Since the preinit injection code is gone, bootloader randomness can then do something significantly more straight forward, removing the weird system_wq hack in hwgenerator randomness. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13random: do not use batches when !crng_ready()Jason A. Donenfeld
It's too hard to keep the batches synchronized, and pointless anyway, since in !crng_ready(), we're updating the base_crng key really often, where batching only hurts. So instead, if the crng isn't ready, just call into get_random_bytes(). At this stage nothing is performance critical anyhow. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13random: mix in timestamps and reseed on system restoreJason A. Donenfeld
Since the RNG loses freshness with system suspend/hibernation, when we resume, immediately reseed using whatever data we can, which for this particular case is the various timestamps regarding system suspend time, in addition to more generally the RDSEED/RDRAND/RDTSC values that happen whenever the crng reseeds. On systems that suspend and resume automatically all the time -- such as Android -- we skip the reseeding on suspend resumption, since that could wind up being far too busy. This is the same trade-off made in WireGuard. In addition to reseeding upon resumption always mix into the pool these various stamps on every power notification event. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13random: vary jitter iterations based on cycle counter speedJason A. Donenfeld
Currently, we do the jitter dance if two consecutive reads to the cycle counter return different values. If they do, then we consider the cycle counter to be fast enough that one trip through the scheduler will yield one "bit" of credited entropy. If those two reads return the same value, then we assume the cycle counter is too slow to show meaningful differences. This methodology is flawed for a variety of reasons, one of which Eric posted a patch to fix in [1]. The issue that patch solves is that on a system with a slow counter, you might be [un]lucky and read the counter _just_ before it changes, so that the second cycle counter you read differs from the first, even though there's usually quite a large period of time in between the two. For example: | real time | cycle counter | | --------- | ------------- | | 3 | 5 | | 4 | 5 | | 5 | 5 | | 6 | 5 | | 7 | 5 | <--- a | 8 | 6 | <--- b | 9 | 6 | <--- c If we read the counter at (a) and compare it to (b), we might be fooled into thinking that it's a fast counter, when in reality it is not. The solution in [1] is to also compare counter (b) to counter (c), on the theory that if the counter is _actually_ slow, and (a)!=(b), then certainly (b)==(c). This helps solve this particular issue, in one sense, but in another sense, it mostly functions to disallow jitter entropy on these systems, rather than simply taking more samples in that case. Instead, this patch takes a different approach. Right now we assume that a difference in one set of consecutive samples means one "bit" of credited entropy per scheduler trip. We can extend this so that a difference in two sets of consecutive samples means one "bit" of credited entropy per /two/ scheduler trips, and three for three, and four for four. In other words, we can increase the amount of jitter "work" we require for each "bit", depending on how slow the cycle counter is. So this patch takes whole bunch of samples, sees how many of them are different, and divides to find the amount of work required per "bit", and also requires that at least some minimum of them are different in order to attempt any jitter entropy. Note that this approach is still far from perfect. It's not a real statistical estimate on how much these samples vary; it's not a real-time analysis of the relevant input data. That remains a project for another time. However, it makes the same (partly flawed) assumptions as the code that's there now, so it's probably not worse than the status quo, and it handles the issue Eric mentioned in [1]. But, again, it's probably a far cry from whatever a really robust version of this would be. [1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/ https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/ Cc: Eric Biggers <ebiggers@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13random: insist on random_get_entropy() existing in order to simplifyJason A. Donenfeld
All platforms are now guaranteed to provide some value for random_get_entropy(). In case some bug leads to this not being so, we print a warning, because that indicates that something is really very wrong (and likely other things are impacted too). This should never be hit, but it's a good and cheap way of finding out if something ever is problematic. Since we now have viable fallback code for random_get_entropy() on all platforms, which is, in the worst case, not worse than jiffies, we can count on getting the best possible value out of it. That means there's no longer a use for using jiffies as entropy input. It also means we no longer have a reason for doing the round-robin register flow in the IRQ handler, which was always of fairly dubious value. Instead we can greatly simplify the IRQ handler inputs and also unify the construction between 64-bits and 32-bits. We now collect the cycle counter and the return address, since those are the two things that matter. Because the return address and the irq number are likely related, to the extent we mix in the irq number, we can just xor it into the top unchanging bytes of the return address, rather than the bottom changing bytes of the cycle counter as before. Then, we can do a fixed 2 rounds of SipHash/HSipHash. Finally, we use the same construction of hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're not actually discarding any entropy, since that entropy is carried through until the next time. And more importantly, it lets us do the same sponge-like construction everywhere. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-08Merge tag 'sound-5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became slightly larger as I've been off in the last weeks. The majority of changes here is about ASoC, fixes for dmaengine and for addressing issues reported by CI, as well as other device-specific small fixes. Also, fixes for FireWire core stack and the usual HD-audio quirks are included" * tag 'sound-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits) ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback ASoC: ops: Validate input values in snd_soc_put_volsw_range() ASoC: dmaengine: Restore NULL prepare_slave_config() callback ASoC: atmel: mchp-pdmc: set prepare_slave_config ASoC: max98090: Generate notifications on changes for custom control ASoC: max98090: Reject invalid values in custom control put() ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers firewire: core: extend card->lock in fw_core_handle_bus_reset firewire: remove check of list iterator against head past the loop body firewire: fix potential uaf in outbound_phy_packet_callback() ASoC: rt9120: Correct the reg 0x09 size to one byte ALSA: hda/realtek: Enable mute/micmute LEDs support for HP Laptops ALSA: hda/realtek: Fix mute led issue on thinkpad with cs35l41 s-codec ASoC: meson: axg-card: Fix nonatomic links ASoC: meson: axg-tdm-interface: Fix formatters in trigger" ASoC: soc-ops: fix error handling ASoC: meson: Fix event generation for G12A tohdmi mux ASoC: meson: Fix event generation for AUI CODEC mux ASoC: meson: Fix event generation for AUI ACODEC mux ...
2022-05-08ataflop: use a statically allocated error countersWilly Tarreau
This is the last driver making use of fd_request->error_count, which is easy to get wrong as was shown in floppy.c. We don't need to keep it there, it can be moved to the atari_floppy_struct instead, so let's do this. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Minh Yuan <yuanmingbuaa@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-08floppy: use a statically allocated error counterWilly Tarreau
Interrupt handler bad_flp_intr() may cause a UAF on the recently freed request just to increment the error count. There's no point keeping that one in the request anyway, and since the interrupt handler uses a static pointer to the error which cannot be kept in sync with the pending request, better make it use a static error counter that's reset for each new request. This reset now happens when entering redo_fd_request() for a new request via set_next_request(). One initial concern about a single error counter was that errors on one floppy drive could be reported on another one, but this problem is not real given that the driver uses a single drive at a time, as that PC-compatible controllers also have this limitation by using shared signals. As such the error count is always for the "current" drive. Reported-by: Minh Yuan <yuanmingbuaa@gmail.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Tested-by: Denis Efremov <efremov@linux.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-07Merge tag 'gpio-fixes-for-v5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix the bounds check for the 'gpio-reserved-ranges' device property in gpiolib-of - drop the assignment of the pwm base number in gpio-mvebu (this was missed by the patch doing it globally for all pwm drivers) - fix the fwnode assignment (use own fwnode, not the parent's one) for the GPIO irqchip in gpio-visconti - update the irq_stat field before checking the trigger field in gpio-pca953x - update GPIO entry in MAINTAINERS * tag 'gpio-fixes-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) gpio: visconti: Fix fwnode of GPIO IRQ MAINTAINERS: update the GPIO git tree entry gpio: mvebu: drop pwm base assignment gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
2022-05-07Merge tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A single revert for a change that isn't needed in 5.18, and a small series for s390/dasd" * tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-block: s390/dasd: Use kzalloc instead of kmalloc/memset s390/dasd: Fix read inconsistency for ESE DASD devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: prevent double format of tracks for ESE devices s390/dasd: fix data corruption for ESE devices Revert "block: release rq qos structures for queue without disk"
2022-05-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A few recent regressions in rxe's multicast code, and some old driver bugs: - Error case unwind bug in rxe for rkeys - Dot not call netdev functions under a spinlock in rxe multicast code - Use the proper BH lock type in rxe multicast code - Fix idrma deadlock and crash - Add a missing flush to drain irdma QPs when in error - Fix high userspace latency in irdma during destroy due to synchronize_rcu() - Rare race in siw MPA processing" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/rxe: Change mcg_lock to a _bh lock RDMA/rxe: Do not call dev_mc_add/del() under a spinlock RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Fix possible crash due to NULL netdev in notifier RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/rxe: Recheck the MR in when generating a READ reply RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core() RDMA/rxe: Fix "Replace mr by rkey in responder resources"
2022-05-06Merge tag 'mmc-v5.18-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull mmc fixes from Ulf Hansson: "MMC core: - Fix initialization for eMMC's HS200/HS400 mode MMC host: - sdhci-msm: Reset GCC_SDCC_BCR register to prevent timeout issues - sunxi-mmc: Fix DMA descriptors allocated above 32 bits" * tag 'mmc-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13
2022-05-06Merge tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "A pretty quiet week, one fbdev, msm, kconfig, and two amdgpu fixes, about what I'd expect for rc6. fbdev: - hotunplugging fix amdgpu: - Fix a xen dom0 regression on APUs - Fix a potential array overflow if a receiver were to send an erroneous audio channel count msm: - lockdep fix. it6505: - kconfig fix" * tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 drm/bridge: ite-it6505: add missing Kconfig option select fbdev: Make fb_release() return -ENODEV if fbdev was unregistered drm/msm/dp: remove fail safe mode related code
2022-05-06gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)Puyou Lu
When one port's input state get inverted (eg. from low to hight) after pca953x_irq_setup but before setting irq_mask (by some other driver such as "gpio-keys"), the next inversion of this port (eg. from hight to low) will not be triggered any more (because irq_stat is not updated at the first time). Issue should be fixed after this commit. Fixes: 89ea8bbe9c3e ("gpio: pca953x.c: add interrupt handling capability") Signed-off-by: Puyou Lu <puyou.lu@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-05s390/dasd: Use kzalloc instead of kmalloc/memsetHaowen Bai
Use kzalloc rather than duplicating its implementation, which makes code simple and easy to understand. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20220505141733.1989450-6-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-05s390/dasd: Fix read inconsistency for ESE DASD devicesJan Höppner
Read requests that return with NRF error are partially completed in dasd_eckd_ese_read(). The function keeps track of the amount of processed bytes and the driver will eventually return this information back to the block layer for further processing via __dasd_cleanup_cqr() when the request is in the final stage of processing (from the driver's perspective). For this, blk_update_request() is used which requires the number of bytes to complete the request. As per documentation the nr_bytes parameter is described as follows: "number of bytes to complete for @req". This was mistakenly interpreted as "number of bytes _left_ for @req" leading to new requests with incorrect data length. The consequence are inconsistent and completely wrong read requests as data from random memory areas are read back. Fix this by correctly specifying the amount of bytes that should be used to complete the request. Fixes: 5e6bdd37c552 ("s390/dasd: fix data corruption for thin provisioned devices") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20220505141733.1989450-5-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-05s390/dasd: Fix read for ESE with blksize < 4kJan Höppner
When reading unformatted tracks on ESE devices, the corresponding memory areas are simply set to zero for each segment. This is done incorrectly for blocksizes < 4096. There are two problems. First, the increment of dst is done using the counter of the loop (off), which is increased by blksize every iteration. This leads to a much bigger increment for dst as actually intended. Second, the increment of dst is done before the memory area is set to 0, skipping a significant amount of bytes of memory. This leads to illegal overwriting of memory and ultimately to a kernel panic. This is not a problem with 4k blocksize because blk_queue_max_segment_size is set to PAGE_SIZE, always resulting in a single iteration for the inner segment loop (bv.bv_len == blksize). The incorrectly used 'off' value to increment dst is 0 and the correct memory area is used. In order to fix this for blksize < 4k, increment dst correctly using the blksize and only do it at the end of the loop. Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20220505141733.1989450-4-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-05s390/dasd: prevent double format of tracks for ESE devicesStefan Haberland
For ESE devices we get an error for write operations on an unformatted track. Afterwards the track will be formatted and the IO operation restarted. When using alias devices a track might be accessed by multiple requests simultaneously and there is a race window that a track gets formatted twice resulting in data loss. Prevent this by remembering the amount of formatted tracks when starting a request and comparing this number before actually formatting a track on the fly. If the number has changed there is a chance that the current track was finally formatted in between. As a result do not format the track and restart the current IO to check. The number of formatted tracks does not match the overall number of formatted tracks on the device and it might wrap around but this is no problem. It is only needed to recognize that a track has been formatted at all in between. Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Link: https://lore.kernel.org/r/20220505141733.1989450-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-05s390/dasd: fix data corruption for ESE devicesStefan Haberland
For ESE devices we get an error when accessing an unformatted track. The handling of this error will return zero data for read requests and format the track on demand before writing to it. To do this the code needs to distinguish between read and write requests. This is done with data from the blocklayer request. A pointer to the blocklayer request is stored in the CQR. If there is an error on the device an ERP request is built to do error recovery. While the ERP request is mostly a copy of the original CQR the pointer to the blocklayer request is not copied to not accidentally pass it back to the blocklayer without cleanup. This leads to the error that during ESE handling after an ERP request was built it is not possible to determine the IO direction. This leads to the formatting of a track for read requests which might in turn lead to data corruption. Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Link: https://lore.kernel.org/r/20220505141733.1989450-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-06Merge tag 'drm-msm-fixes-2022-04-30' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes single lockdep fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtkzqzxDLp82OaKXVrWd7nWZtkxKsuOK1wOGCDz7qF-dA@mail.gmail.com
2022-05-06Merge tag 'drm-misc-fixes-2022-05-05' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.18-rc6: - Small fix for hot-unplugging fb devices. - Kconfig fix for it6505. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/69e51773-8c6f-4ff7-9a06-5c2922a43999@linux.intel.com
2022-05-06Merge tag 'amd-drm-fixes-5.18-2022-05-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.18-2022-05-04: amdgpu: - Fix a xen dom0 regression on APUs - Fix a potential array overflow if a receiver were to send an erroneous audio channel count Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220504190439.5723-1-alexander.deucher@amd.com
2022-05-05Merge tag 'net-5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, rxrpc and wireguard. Previous releases - regressions: - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() - rds: acquire netns refcount on TCP sockets - rxrpc: enable IPv6 checksums on transport socket - nic: hinic: fix bug of wq out of bound access - nic: thunder: don't use pci_irq_vector() in atomic context - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS flag - nic: mlx5e: - lag, fix use-after-free in fib event handler - fix deadlock in sync reset flow Previous releases - always broken: - tcp: fix insufficient TCP source port randomness - can: grcan: grcan_close(): fix deadlock - nfc: reorder destructive operations in to avoid bugs Misc: - wireguard: improve selftests reliability" * tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) NFC: netlink: fix sleep in atomic bug when firmware download timeout selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer tcp: drop the hash_32() part from the index calculation tcp: increase source port perturb table to 2^16 tcp: dynamically allocate the perturb table used by source ports tcp: add small random increments to the source port tcp: resalt the secret every 10 seconds tcp: use different parts of the port_offset for index and offset secure_seq: use the 64 bits of the siphash for port offset calculation wireguard: selftests: set panic_on_warn=1 from cmdline wireguard: selftests: bump package deps wireguard: selftests: restore support for ccache wireguard: selftests: use newer toolchains to fill out architectures wireguard: selftests: limit parallelism to $(nproc) tests at once wireguard: selftests: make routing loop test non-fatal net/mlx5: Fix matching on inner TTC net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow net/mlx5e: Fix trust state reset in reload net/mlx5e: Avoid checking offload capability in post_parse action ...
2022-05-05gpio: visconti: Fix fwnode of GPIO IRQNobuhiro Iwamatsu
The fwnode of GPIO IRQ must be set to its own fwnode, not the fwnode of the parent IRQ. Therefore, this sets own fwnode instead of the parent IRQ fwnode to GPIO IRQ's. Fixes: 2ad74f40dacc ("gpio: visconti: Add Toshiba Visconti GPIO support") Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-04RDMA/rxe: Change mcg_lock to a _bh lockBob Pearson
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while rxe_recv.c uses _bh spinlocks for the same lock. As there is no case where the mcg_lock can be taken from an IRQ, change these all to bh locks so we don't have confusing mismatched lock types on the same spinlock. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/rxe: Do not call dev_mc_add/del() under a spinlockBob Pearson
These routines were not intended to be called under a spinlock and will throw debugging warnings: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50 CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:warn_bogus_irq_restore+0x2f/0x50 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x75/0x80 rxe_attach_mcast+0x304/0x480 [rdma_rxe] ib_attach_mcast+0x88/0xa0 [ib_core] ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs] ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs] ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs] do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Move them out of the spinlock, it is OK if there is some races setting up the MC reception at the ethernet layer with rbtree lookups. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/siw: Fix a condition race issue in MPA request processingCheng Xu
The calling of siw_cm_upcall and detaching new_cep with its listen_cep should be atomistic semantics. Otherwise siw_reject may be called in a temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep has not being cleared. This fixes a WARN: WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: iw_cm_wq cm_work_handler [iw_cm] RIP: 0010:siw_cep_put+0x125/0x130 [siw] Call Trace: <TASK> siw_reject+0xac/0x180 [siw] iw_cm_reject+0x68/0xc0 [iw_cm] cm_work_handler+0x59d/0xe20 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04Merge tag 'iomm-fixes-v5.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "IOMMU core: - Fix for a regression which could cause NULL-ptr dereferences Arm SMMU: - Fix off-by-one in SMMUv3 SVA TLB invalidation - Disable large mappings to workaround nvidia erratum Intel VT-d: - Handle PCI stop marker messages in IOMMU driver to meet the requirement of I/O page fault handling framework. - Calculate a feasible mask for non-aligned page-selective IOTLB invalidation. Apple DART IOMMU: - Fix potential NULL-ptr dereference - Set module owner" * tag 'iomm-fixes-v5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Make sysfs robust for non-API groups iommu/dart: Add missing module owner to ops structure iommu/dart: check return value after calling platform_get_resource() iommu/vt-d: Drop stop marker messages iommu/vt-d: Calculate mask for non-aligned flushes iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()
2022-05-04Merge tag 'for-linus-5.17-2' of https://github.com/cminyard/linux-ipmiLinus Torvalds
Pull IPMI fixes from Corey Minyard: "Fix some issues that were reported. This has been in for-next for a bit (longer than the times would indicate, I had to rebase to add some text to the headers) and these are fixes that need to go in" * tag 'for-linus-5.17-2' of https://github.com/cminyard/linux-ipmi: ipmi:ipmi_ipmb: Fix null-ptr-deref in ipmi_unregister_smi() ipmi: When handling send message responses, don't process the message
2022-05-04drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNTHarry Wentland
A faulty receiver might report an erroneous channel count. We should guard against reading beyond AUDIO_CHANNELS_COUNT as that would overflow the dpcd_pattern_period array. Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-04drm/amdgpu: do not use passthrough mode in Xen dom0Marek Marczykowski-Górecki
While technically Xen dom0 is a virtual machine too, it does have access to most of the hardware so it doesn't need to be considered a "passthrough". Commit b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough") changed how FB is accessed based on passthrough mode. This breaks amdgpu in Xen dom0 with message like this: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3 While the reason for this failure is unclear, the passthrough mode is not really necessary in Xen dom0 anyway. So, to unbreak booting affected kernels, disable passthrough mode in this case. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985 Fixes: b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough") Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-04iommu: Make sysfs robust for non-API groupsRobin Murphy
Groups created by VFIO backends outside the core IOMMU API should never be passed directly into the API itself, however they still expose their standard sysfs attributes, so we can still stumble across them that way. Take care to consider those cases before jumping into our normal assumptions of a fully-initialised core API group. Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/86ada41986988511a8424e84746dfe9ba7f87573.1651667683.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-04mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHCShaik Sajida Bhanu
Reset GCC_SDCC_BCR register before every fresh initilazation. This will reset whole SDHC-msm controller, clears the previous power control states and avoids, software reset timeout issues as below. [ 5.458061][ T262] mmc1: Reset 0x1 never completed. [ 5.462454][ T262] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5.469065][ T262] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00007202 [ 5.475688][ T262] mmc1: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5.482315][ T262] mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5.488927][ T262] mmc1: sdhci: Present: 0x01f800f0 | Host ctl: 0x00000000 [ 5.495539][ T262] mmc1: sdhci: Power: 0x00000000 | Blk gap: 0x00000000 [ 5.502162][ T262] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000003 [ 5.508768][ T262] mmc1: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5.515381][ T262] mmc1: sdhci: Int enab: 0x00000000 | Sig enab: 0x00000000 [ 5.521996][ T262] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5.528607][ T262] mmc1: sdhci: Caps: 0x362dc8b2 | Caps_1: 0x0000808f [ 5.535227][ T262] mmc1: sdhci: Cmd: 0x00000000 | Max curr: 0x00000000 [ 5.541841][ T262] mmc1: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000 [ 5.548454][ T262] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 [ 5.555079][ T262] mmc1: sdhci: Host ctl2: 0x00000000 [ 5.559651][ T262] mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP----------- [ 5.566621][ T262] mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x6000642c | DLL cfg2: 0x0020a000 [ 5.575465][ T262] mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00010800 | DDR cfg: 0x80040873 [ 5.584658][ T262] mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88218a8 Vndr func3: 0x02626040 Fixes: 0eb0d9f4de34 ("mmc: sdhci-msm: Initial support for Qualcomm chipsets") Signed-off-by: Shaik Sajida Bhanu <quic_c_sbhanu@quicinc.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Konrad Dybcio <konrad.dybcio@somainline.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1650816153-23797-1-git-send-email-quic_c_sbhanu@quicinc.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-05-04mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bitsSamuel Holland
Newer variants of the MMC controller support a 34-bit physical address space by using word addresses instead of byte addresses. However, the code truncates the DMA descriptor address to 32 bits before applying the shift. This breaks DMA for descriptors allocated above the 32-bit limit. Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220424231751.32053-1-samuel@sholland.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-05-04iommu/dart: Add missing module owner to ops structureHector Martin
This is required to make loading this as a module work. Signed-off-by: Hector Martin <marcan@marcan.st> Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver") Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20220502092238.30486-1-marcan@marcan.st Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-04drm/bridge: ite-it6505: add missing Kconfig option selectFabien Parent
The IT6505 is using functions provided by the DRM_DP_HELPER driver. In order to avoid having the bridge enabled but the helper disabled, let's add a select in order to be sure that the DP helper functions are always available. Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver") Signed-off-by: Fabien Parent <fparent@baylibre.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220426141536.274727-1-fparent@baylibre.com
2022-05-04net/mlx5: Fix matching on inner TTCMark Bloch
The cited commits didn't use proper matching on inner TTC as a result distribution of encapsulated packets wasn't symmetric between the physical ports. Fixes: 4c71ce50d2fe ("net/mlx5: Support partial TTC rules") Fixes: 8e25a2bc6687 ("net/mlx5: Lag, add support to create TTC tables for LAG port selection") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5: Avoid double clear or set of sync reset requestedMoshe Shemesh
Double clear of reset requested state can lead to NULL pointer as it will try to delete the timer twice. This can happen for example on a race between abort from FW and pci error or reset. Avoid such case using test_and_clear_bit() to verify only one time reset requested state clear flow. Similarly use test_and_set_bit() to verify only one time reset requested state set flow. Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5: Fix deadlock in sync reset flowMoshe Shemesh
The sync reset flow can lead to the following deadlock when poll_sync_reset() is called by timer softirq and waiting on del_timer_sync() for the same timer. Fix that by moving the part of the flow that waits for the timer to reset_reload_work. It fixes the following kernel Trace: RIP: 0010:del_timer_sync+0x32/0x40 ... Call Trace: <IRQ> mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core] poll_sync_reset.cold+0x36/0x52 [mlx5_core] call_timer_fn+0x32/0x130 __run_timers.part.0+0x180/0x280 ? tick_sched_handle+0x33/0x60 ? tick_sched_timer+0x3d/0x80 ? ktime_get+0x3e/0xa0 run_timer_softirq+0x2a/0x50 __do_softirq+0xe1/0x2d6 ? hrtimer_interrupt+0x136/0x220 irq_exit+0xae/0xb0 smp_apic_timer_interrupt+0x7b/0x140 apic_timer_interrupt+0xf/0x20 </IRQ> Fixes: 3c5193a87b0f ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5e: Fix trust state reset in reloadMoshe Tal
Setting dscp2prio during the driver reload can cause dcb ieee app list to be not empty after the reload finish and as a result to a conflict between the priority trust state reported by the app and the state in the device register. Reset the dcb ieee app list on initialization in case this is conflicting with the register status. Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support") Signed-off-by: Moshe Tal <moshet@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>