Age | Commit message (Collapse) | Author |
|
Fix little inconsistencies in Documentation: make case and spacing
match surrounding text.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
The following commit:
commit 9b7365fc1c82 ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR
interface support")
added several defines related to extended attributes to ext4.h. They were
added within an #ifndef FS_IOC_FSGETXATTR block with the comment:
/* Until the uapi changes get merged for project quota... */
Those uapi changes were merged by this commit:
commit 334e580a6f97 ("fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR
promotion")
so all the definitions needed by ext4 are available in
include/uapi/linux/fs.h. Remove the duplicates from ext4.h.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
This helper, in the spirit of ext4_should_dioread_nolock() et al., replaces
the complex conditional in ext4_set_inode_flags().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
We prevent DAX from being used on inodes which are using ext4's built in
encryption via a check in ext4_set_inode_flags(). We do have what appears
to be an unsafe transition of S_DAX in ext4_set_context(), though, where
S_DAX can get disabled without us doing a proper writeback + invalidate.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:
https://www.spinics.net/lists/linux-xfs/msg09859.html
I actually think we are safe in this case because of the following:
1) You can't encrypt an existing file. Encryption can only be set on an
empty directory, with new inodes in that directory being created with
encryption turned on, so I don't think it's possible to turn encryption on
for a file that has open DAX mmaps or outstanding I/Os.
2) There is no way to turn encryption off on a given file. Once an inode
is encrypted, it stays encrypted for the life of that inode, so we don't
have to worry about the case where we turn encryption off and S_DAX
suddenly turns on.
3) The only way we end up in ext4_set_context() to turn on encryption is
when we are creating a new file in the encrypted directory. This happens
as part of ext4_create() before the inode has been allowed to do any I/O.
Here's the call tree:
ext4_create()
__ext4_new_inode()
ext4_set_inode_flags() // sets S_DAX
fscrypt_inherit_context()
fscrypt_get_encryption_info();
ext4_set_context() // sets EXT4_INODE_ENCRYPT, clears S_DAX
So, I actually think it's safe to transition S_DAX in ext4_set_context()
without any locking, writebacks or invalidations. I've added a
WARN_ON_ONCE() sanity check to make sure that we are notified if we ever
encounter a case where we are encrypting an inode that already has data,
in which case we need to add code to safely transition S_DAX.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
The current code has the potential for data corruption when changing an
inode's journaling mode, as that can result in a subsequent unsafe change
in S_DAX.
I've captured an instance of this data corruption in the following fstest:
https://patchwork.kernel.org/patch/9948377/
Prevent this data corruption from happening by disallowing changes to the
journaling mode if the '-o dax' mount option was used. This means that for
a given filesystem we could have a mix of inodes using either DAX or
data journaling, but whatever state the inodes are in will be held for the
duration of the mount.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
|
|
If an inode has inline data it is currently prevented from using DAX by a
check in ext4_set_inode_flags(). When the inode grows inline data via
ext4_create_inline_data() or removes its inline data via
ext4_destroy_inline_data_nolock(), the value of S_DAX can change.
Currently these changes are unsafe because we don't hold off page faults
and I/O, write back dirty radix tree entries and invalidate all mappings.
There are also issues with mm-level races when changing the value of S_DAX,
as well as issues with the VM_MIXEDMAP flag:
https://www.spinics.net/lists/linux-xfs/msg09859.html
The unsafe transition of S_DAX can reliably cause data corruption, as shown
by the following fstest:
https://patchwork.kernel.org/patch/9948381/
Fix this issue by preventing the DAX mount option from being used on
filesystems that were created to support inline data. Inline data is an
option given to mkfs.ext4.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org
|
|
If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).
If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:
Inode 12, end of extent exceeds allowed value
(logical block 33, physical block 33441, len 7)
This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size. Since this situation is
quite rare, and the window in which the crash must take place is
typically < 30 seconds, in practice this condition will rarely happen.
Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
|
|
Now that we no longer try to reserve metadata blocks for delayed
allocations (which tended to overestimate the required number of
blocks significantly), we really don't need retry allocations when the
disk is very full as aggressively any more.
The only time when it makes sense to retry an allocation is if we have
freshly deleted blocks that will only become available after a
transaction commit. And if we lose that race, it's not worth it to
try more than once.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Switch to the iomap_seek_hole and iomap_seek_data helpers for
implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that
isn't needed any more.
Note that with this patch ext4 will now always depend on the iomap code
instead of only when CONFIG_DAX is enabled, and it requires adding a
call into the extent status tree for iomap_begin as well to properly
deal with delalloc extents.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[More fixes and cleanups by Andreas]
|
|
Report inline data as a IOMAP_F_DATA_INLINE mapping. This allows to use
iomap_seek_hole and iomap_seek_data in ext4_llseek and makes switching
to iomap_fiemap in ext4_fiemap easier.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
Add a new IOMAP_F_DATA_INLINE flag to indicate that a mapping is in a
disk area that contains data as well as metadata. In iomap_fiemap, map
this flag to FIEMAP_EXTENT_DATA_INLINE.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
Replace iomap->blkno, the sector number, with iomap->addr, the disk
offset in bytes. For invalid disk offsets, use the special value
IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK.
This allows to use iomap for mappings which are not block aligned, such
as inline data on ext4.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> # iomap, xfs
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"This contains the following fixes and improvements:
- Avoid dereferencing an unprotected VMA pointer in the fault signal
generation code
- Fix inline asm call constraints for GCC 4.4
- Use existing register variable to retrieve the stack pointer
instead of forcing the compiler to create another indirect access
which results in excessive extra 'mov %rsp, %<dst>' instructions
- Disable branch profiling for the memory encryption code to prevent
an early boot crash
- Fix a sparse warning caused by casting the __user annotation in
__get_user_asm_u64() away
- Fix an off by one error in the loop termination of the error patch
in the x86 sysfs init code
- Add missing CPU IDs to various Intel specific drivers to enable the
functionality on recent hardware
- More (init) constification in the numachip code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use register variable to get stack pointer value
x86/mm: Disable branch profiling in mem_encrypt.c
x86/asm: Fix inline asm call constraints for GCC 4.4
perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
perf/x86/intel/rapl: Add missing CPU IDs
perf/x86/msr: Add missing CPU IDs
perf/x86/intel/cstate: Add missing CPU IDs
x86: Don't cast away the __user in __get_user_asm_u64()
x86/sysfs: Fix off-by-one error in loop termination
x86/mm: Fix fault error path using unsafe vma pointer
x86/numachip: Add const and __initconst to numachip2_clockevent
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"This adds a new timer wheel function which is required for the
conversion of the timer callback function from the 'unsigned long
data' argument to 'struct timer_list *timer'. This conversion has two
benefits:
1) It makes struct timer_list smaller
2) Many callers hand in a pointer to the timer or to the structure
containing the timer, which happens via type casting both at setup
and in the callback. This change gets rid of the typecasts.
Once the conversion is complete, which is planned for 4.15, the old
setup function and the intermediate typecast in the new setup function
go away along with the data field in struct timer_list.
Merging this now into mainline allows a smooth queueing of the actual
conversion in the affected maintainer trees without creating
dependencies"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
um/time: Fixup namespace collision
timer: Prepare to change timer callback argument type
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp/hotplug fixes from Thomas Gleixner:
"This addresses the fallout of the new lockdep mechanism which covers
completions in the CPU hotplug code.
The lockdep splats are false positives, but there is no way to
annotate that reliably. The solution is to split the completions for
CPU up and down, which requires some reshuffling of the failure
rollback handling as well"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Hotplug state fail injection
smp/hotplug: Differentiate the AP completion between up and down
smp/hotplug: Differentiate the AP-work lockdep class between up and down
smp/hotplug: Callback vs state-machine consistency
smp/hotplug: Rewrite AP state machine core
smp/hotplug: Allow external multi-instance rollback
smp/hotplug: Add state diagram
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"The scheduler pull request comes with the following updates:
- Prevent a divide by zero issue by validating the input value of
sysctl_sched_time_avg
- Make task state printing consistent all over the place and have
explicit state characters for IDLE and PARKED so they wont be
displayed as 'D' state which confuses tools"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/sysctl: Check user input value of sysctl_sched_time_avg
sched/debug: Add explicit TASK_PARKED printing
sched/debug: Ignore TASK_IDLE for SysRq-W
sched/debug: Add explicit TASK_IDLE printing
sched/tracing: Use common task-state helpers
sched/tracing: Fix trace_sched_switch task-state printing
sched/debug: Remove unused variable
sched/debug: Convert TASK_state to hex
sched/debug: Implement consistent task-state printing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
- Prevent a division by zero in the perf aux buffer handling
- Sync kernel headers with perf tool headers
- Fix a build failure in the syscalltbl code
- Make the debug messages of perf report --call-graph work correctly
- Make sure that all required perf files are in the MANIFEST for
container builds
- Fix the atrr.exclude kernel handling so it respects the
perf_event_paranoid and the user permissions
- Make perf test on s390x work correctly
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Only update ->aux_wakeup in non-overwrite mode
perf test: Fix vmlinux failure on s390x part 2
perf test: Fix vmlinux failure on s390x
perf tools: Fix syscalltbl build failure
perf report: Fix debug messages with --call-graph option
perf evsel: Fix attr.exclude_kernel setting for default cycles:p
tools include: Sync kernel ABI headers with tooling headers
perf tools: Get all of tools/{arch,include}/ in the MANIFEST
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two fixes for locking:
- Plug a hole the pi_stat->owner serialization which was changed
recently and failed to fixup two usage sites.
- Prevent reordering of the rwsem_has_spinner() check vs the
decrement of rwsem count in up_write() which causes a missed
wakeup"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem-xadd: Fix missed wakeup due to reordering of load
futex: Fix pi_state->owner serialization
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Add a missing NULL pointer check in free_irq()
- Fix a memory leak/memory corruption in the generic irq chip
- Add missing rcu annotations for radix tree access
- Use ffs instead of fls when extracting data from a chip register in
the MIPS GIC irq driver
- Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
end up at the target CPU and not at CPU0
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/generic-chip: Don't replace domain's name
irqdomain: Add __rcu annotations to radix tree accessors
irqchip/mips-gic: Use effective affinity to unmask
irqchip/mips-gic: Fix shifts to extract register fields
genirq: Check __free_irq() return value for NULL
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Thomas Gleixner:
"Two small fixes for objtool:
- Support frame pointer setup via 'lea (%rsp), %rbp' which was not
yet supported and caused build warnings
- Disable unreacahble warnings for GCC4.4 and older to avoid false
positives caused by the compiler itself"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Support unoptimized frame pointer setup
objtool: Skip unreachable warnings for GCC 4.4 and older
|
|
Pull mtd fixes from Boris Brezillon:
- Fix partition alignment check in mtdcore.c
- Fix a buffer overflow in the Atmel NAND driver
* tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtd:
mtd: nand: atmel: fix buffer overflow in atmel_pmecc_user
mtd: Fix partition alignment check on multi-erasesize devices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eight mostly minor fixes for recently discovered issues in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ILLEGAL REQUEST + ASC==27 => target failure
scsi: aacraid: Add a small delay after IOP reset
scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add()
scsi: scsi_transport_fc: set scsi_target_id upon rescan
scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
scsi: aacraid: error: testing array offset 'bus' after use
scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function
scsi: aacraid: Fix 2T+ drives on SmartIOC-2000
|
|
git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform drivers fix from Darren Hart:
"Newly discovered species of fujitsu laptops break some assumptions
about ACPI device pairings.
fujitsu-laptop: Don't oops when FUJ02E3 is not present"
* tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: fujitsu-laptop: Don't oops when FUJ02E3 is not presnt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED fixes from Jacek Anaszewski:
"Four fixes for the as3645a LED flash controller and one update to
MAINTAINERS"
* tag 'led_fixes-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
MAINTAINERS: Add entry for MediaTek PMIC LED driver
as3645a: Unregister indicator LED on device unbind
as3645a: Use integer numbers for parsing LEDs
dt: bindings: as3645a: Use LED number to refer to LEDs
as3645a: Use ams,input-max-microamp as documented in DT bindings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull waitid fix from Al Viro:
"Fix infoleak in waitid()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix infoleak in waitid(2)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We've collected a bunch of isolated fixes, for crashes, user-visible
behaviour or missing bits from other subsystem cleanups from the past.
The overall number is not small but I was not able to make it
significantly smaller. Most of the patches are supposed to go to
stable"
* 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: log csums for all modified extents
Btrfs: fix unexpected result when dio reading corrupted blocks
btrfs: Report error on removing qgroup if del_qgroup_item fails
Btrfs: skip checksum when reading compressed data if some IO have failed
Btrfs: fix kernel oops while reading compressed data
Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
Btrfs: do not backup tree roots when fsync
btrfs: remove BTRFS_FS_QUOTA_DISABLING flag
btrfs: propagate error to btrfs_cmp_data_prepare caller
btrfs: prevent to set invalid default subvolid
Btrfs: send: fix error number for unknown inode types
btrfs: fix NULL pointer dereference from free_reloc_roots()
btrfs: finish ordered extent cleaning if no progress is found
btrfs: clear ordered flag on cleaning up ordered extents
Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO
Btrfs: do not reset bio->bi_ops while writing bio
Btrfs: use the new helper wbc_to_write_flags
|
|
Pull MD fixes from Shaohua Li:
"A few fixes for MD. Mainly fix a problem introduced in 4.13, which we
retry bio for some code paths but not all in some situations"
* tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid5: cap worker count
dm-raid: fix a race condition in request handling
md: fix a race condition for flush request handling
md: separate request handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix CONFIG_PCI=n build error (introduced in v4.14-rc1) (Geert
Uytterhoeven)
- fix a race in sysfs driver_override store/show (Nicolai Stange)
* tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix race condition with driver_override
PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
|
|
git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Regular fixes pull, some amdkfd, amdgpu, etnaviv, sun4i, qxl, tegra
fixes.
I've got an outstanding pull for i915 but it wasn't on an rc2 base so
I wanted to ship these out first, I might get to it before rc3 or I
might not"
* tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/tegra: trace: Fix path to include
qxl: fix framebuffer unpinning
drm/sun4i: cec: Enable back CEC-pin framework
drm/amdkfd: Print event limit messages only once per process
drm/amdkfd: Fix kernel-queue wrapping bugs
drm/amdkfd: Fix incorrect destroy_mqd parameter
drm/radeon: disable hard reset in hibernate for APUs
drm/amdgpu: revert tile table update for oland
etnaviv: fix gem object list corruption
etnaviv: fix submit error path
qxl: fix primary surface handling
drm/amdkfd: check for null dev to avoid a null pointer dereference
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- A comment fix for 'struct iommu_ops'
- Format string fixes for AMD IOMMU, unfortunatly I missed that during
review.
- Limit mediatek physical addresses to 32 bit for v7s to fix a warning
triggered in io-page-table code.
- Fix dma-sync in io-pgtable-arm-v7s code
* tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix comment for iommu_ops.map_sg
iommu/amd: pr_err() strings should end with newlines
iommu/mediatek: Limit the physical address in 32bit for v7s
iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- SPsel register initialisation on reset as the architecture defines
its state as unknown
- Use READ_ONCE when dereferencing pmd_t pointers to avoid race
conditions in page_vma_mapped_walk() (or fast GUP) with concurrent
modifications of the page table
- Avoid invoking the mm fault handling code for kernel addresses (check
against TASK_SIZE) which would otherwise result in calling
might_sleep() in atomic context
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fault: Route pte translation faults via do_translation_fault
arm64: mm: Use READ_ONCE when dereferencing pointer to pte table
arm64: Make sure SPsel is always set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- avoid a warning when compiling with clang
- consider read-only bits in xen-pciback when writing to a BAR
- fix a boot crash of pv-domains
* tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping
xen-pciback: relax BAR sizing write value check
x86/xen: clean up clang build warning
|
|
Pull kvm fixes from Paolo Bonzini:
"Mixed bugfixes. Perhaps the most interesting one is a latent bug that
was finally triggered by PCID support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm/x86: Handle async PF in RCU read-side critical sections
KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
KVM: VMX: use cmpxchg64
KVM: VMX: simplify and fix vmx_vcpu_pi_load
KVM: VMX: avoid double list add with VT-d posted interrupts
KVM: VMX: extract __pi_post_block
KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception
KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
|
|
kernel_waitid() can return a PID, an error or 0. rusage is filled in the first
case and waitid(2) rusage should've been copied out exactly in that case, *not*
whenever kernel_waitid() has not returned an error. Compat variant shares that
braino; none of kernel_wait4() callers do, so the below ought to fix it.
Reported-and-tested-by: Alexander Potapenko <glider@google.com>
Fixes: ce72a16fa705 ("wait4(2)/waitid(2): separate copying rusage to userland")
Cc: stable@vger.kernel.org # v4.13
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Currently we use current_stack_pointer() function to get the value
of the stack pointer register. Since commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
... we have a stack register variable declared. It can be used instead of
current_stack_pointer() function which allows to optimize away some
excessive "mov %rsp, %<dst>" instructions:
-mov %rsp,%rdx
-sub %rdx,%rax
-cmp $0x3fff,%rax
-ja ffffffff810722fd <ist_begin_non_atomic+0x2d>
+sub %rsp,%rax
+cmp $0x3fff,%rax
+ja ffffffff810722fa <ist_begin_non_atomic+0x2a>
Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
and use it instead of the removed function.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some routines in mem_encrypt.c are called very early in the boot process,
e.g. sme_encrypt_kernel(). When CONFIG_TRACE_BRANCH_PROFILING=y is defined
the resulting branch profiling associated with the check to see if SME is
active results in a kernel crash. Disable branch profiling for
mem_encrypt.c by defining DISABLE_BRANCH_PROFILING before including any
header files.
Reported-by: kernel test robot <lkp@01.org>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929162419.6016.53390.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix syscalltbl build failure (Akemi Yagi)
- Fix attr.exclude_kernel setting for default cycles:p, this time for
!root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo)
- Sync kernel ABI headers with tooling headers (Ingo Molnar)
- Remove misleading debug messages with --call-graph option (Mengting Zhang)
- Revert vmlinux symbol resolution patches for s390x (Thomas Richter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull keys fixes from James Morris:
"Notable here is a rewrite of big_key crypto by Jason Donenfeld to
address some issues in the original code.
From Jason's commit log:
"This started out as just replacing the use of crypto/rng with
get_random_bytes_wait, so that we wouldn't use bad randomness at
boot time. But, upon looking further, it appears that there were
even deeper underlying cryptographic problems, and that this seems
to have been committed with very little crypto review. So, I rewrote
the whole thing, trying to keep to the conventions introduced by the
previous author, to fix these cryptographic flaws."
There has been positive review of the new code by Eric Biggers and
Herbert Xu, and it passes basic testing via the keyutils test suite.
Eric also manually tested it.
Generally speaking, we likely need to improve the amount of crypto
review for kernel crypto users including keys (I'll post a note
separately to ksummit-discuss)"
* 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security/keys: rewrite all of big_key crypto
security/keys: properly zero out sensitive key material in big_key
KEYS: use kmemdup() in request_key_auth_new()
KEYS: restrict /proc/keys by credentials at open time
KEYS: reset parent each time before searching key_user_tree
KEYS: prevent KEYCTL_READ on negative key
KEYS: prevent creating a different user's keyrings
KEYS: fix writing past end of user-supplied buffer in keyring_read()
KEYS: fix key refcount leak in keyctl_read_key()
KEYS: fix key refcount leak in keyctl_assume_authority()
KEYS: don't revoke uninstantiated key in request_key_auth_new()
KEYS: fix cred refcount leak in request_key_auth_new()
|
|
We currently route pte translation faults via do_page_fault, which elides
the address check against TASK_SIZE before invoking the mm fault handling
code. However, this can cause issues with the path walking code in
conjunction with our word-at-a-time implementation because
load_unaligned_zeropad can end up faulting in kernel space if it reads
across a page boundary and runs into a page fault (e.g. by attempting to
read from a guard region).
In the case of such a fault, load_unaligned_zeropad has registered a
fixup to shift the valid data and pad with zeroes, however the abort is
reported as a level 3 translation fault and we dispatch it straight to
do_page_fault, despite it being a kernel address. This results in calling
a sleeping function from atomic context:
BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
in_atomic(): 0, irqs_disabled(): 0, pid: 10290
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[...]
[<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
[<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
[<ffffff8e016977f0>] do_page_fault+0x140/0x330
[<ffffff8e01681328>] do_mem_abort+0x54/0xb0
Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
[...]
[<ffffff8e016844fc>] el1_da+0x18/0x78
[<ffffff8e017f399c>] path_parentat+0x44/0x88
[<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
[<ffffff8e017f5044>] filename_create+0x4c/0x128
[<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
[<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
Code: 36380080 d5384100 f9400800 9402566d (d4210000)
---[ end trace 2d01889f2bca9b9f ]---
Fix this by dispatching all translation faults to do_translation_faults,
which avoids invoking the page fault logic for faults on kernel addresses.
Cc: <stable@vger.kernel.org>
Reported-by: Ankit Jain <ankijain@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
On kernels built with support for transparent huge pages, different CPUs
can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
and they must take care to use READ_ONCE to avoid value tearing or caching
of stale values by the compiler. Unfortunately, these functions call into
our pgtable macros, which don't use READ_ONCE, and compiler caching has
been observed to cause the following crash during ext4 writeback:
PC is at check_pte+0x20/0x170
LR is at page_vma_mapped_walk+0x2e0/0x540
[...]
Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
Call trace:
[<ffff000008233328>] check_pte+0x20/0x170
[<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
[<ffff000008234adc>] page_mkclean_one+0xac/0x278
[<ffff000008234d98>] rmap_walk_file+0xf0/0x238
[<ffff000008236e74>] rmap_walk+0x64/0xa0
[<ffff0000082370c8>] page_mkclean+0x90/0xa8
[<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
[<ffff00000832f984>] mpage_submit_page+0x34/0x98
[<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
[<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
[<ffff00000833530c>] ext4_writepages+0x484/0xe30
[<ffff0000081f6ab4>] do_writepages+0x44/0xe8
[<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
[<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
[<ffff000008324310>] ext4_sync_file+0x80/0x4b8
[<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
[<ffff0000082332b4>] SyS_msync+0x194/0x1e8
This is because page_vma_mapped_walk loads the PMD twice before calling
pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
(where it sees a valid table pointer due to a concurrent pmd_populate).
However, the compiler inlines everything and caches the first value in
a register, which is subsequently used in pte_offset_phys which returns
a junk pointer that is later dereferenced when attempting to access the
relevant pte.
This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
that a stale value is not used. Whilst this is a point fix for a known
failure (and simple to backport), a full fix moving all of our page table
accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
page_vma_mapped_walk is in the works for a future kernel release.
Cc: Jon Masters <jcm@redhat.com>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: <stable@vger.kernel.org>
Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Sasha Levin reported a WARNING:
| WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
| rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
| WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
| rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
...
| CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
| 1.10.1-1ubuntu1 04/01/2014
| Call Trace:
...
| RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
| RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
| RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
| RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
| RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
| RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
| R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
| R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
| __schedule+0x201/0x2240 kernel/sched/core.c:3292
| schedule+0x113/0x460 kernel/sched/core.c:3421
| kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
| do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
| async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
| RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
| RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
| RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
| RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
| RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
| R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
| R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
| vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
| sprintf+0xbe/0xf0 lib/vsprintf.c:2386
| proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
| get_link fs/namei.c:1047 [inline]
| link_path_walk+0x1041/0x1490 fs/namei.c:2127
...
This happened when the host hit a page fault, and delivered it as in an
async page fault, while the guest was in an RCU read-side critical
section. The guest then tries to reschedule in kvm_async_pf_task_wait(),
but rcu_preempt_note_context_switch() would treat the reschedule as a
sleep in RCU read-side critical section, which is not allowed (even in
preemptible RCU). Thus the WARN.
To cure this, make kvm_async_pf_task_wait() go to the halt path if the
PF happens in a RCU read-side critical section.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
------------[ cut here ]------------
WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G W OE 4.13.0+ #17
RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
Call Trace:
? emulator_read_emulated+0x15/0x20 [kvm]
? segmented_read+0xae/0xf0 [kvm]
vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
x86_emulate_instruction+0x733/0x810 [kvm]
vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
? kvm_arch_vcpu_load+0x62/0x230 [kvm]
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.
This should actually work all the time, making vmx_inject_page_fault_nested
totally unnecessary. However, that's not working yet, so this patch can work
around the issue in the meanwhile.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
System will hang if user set sysctl_sched_time_avg to 0:
[root@XXX ~]# sysctl kernel.sched_time_avg_ms=0
Stack traceback for pid 0
0xffff883f6406c600 0 0 1 3 R 0xffff883f6406cf50 *swapper/3
ffff883f7ccc3ae8 0000000000000018 ffffffff810c4dd0 0000000000000000
0000000000017800 ffff883f7ccc3d78 0000000000000003 ffff883f7ccc3bf8
ffffffff810c4fc9 ffff883f7ccc3c08 00000000810c5043 ffff883f7ccc3c08
Call Trace:
<IRQ> [<ffffffff810c4dd0>] ? update_group_capacity+0x110/0x200
[<ffffffff810c4fc9>] ? update_sd_lb_stats+0x109/0x600
[<ffffffff810c5507>] ? find_busiest_group+0x47/0x530
[<ffffffff810c5b84>] ? load_balance+0x194/0x900
[<ffffffff810ad5ca>] ? update_rq_clock.part.83+0x1a/0xe0
[<ffffffff810c6d42>] ? rebalance_domains+0x152/0x290
[<ffffffff810c6f5c>] ? run_rebalance_domains+0xdc/0x1d0
[<ffffffff8108a75b>] ? __do_softirq+0xfb/0x320
[<ffffffff8108ac85>] ? irq_exit+0x125/0x130
[<ffffffff810b3a17>] ? scheduler_ipi+0x97/0x160
[<ffffffff81052709>] ? smp_reschedule_interrupt+0x29/0x30
[<ffffffff8173a1be>] ? reschedule_interrupt+0x6e/0x80
<EOI> [<ffffffff815bc83c>] ? cpuidle_enter_state+0xcc/0x230
[<ffffffff815bc80c>] ? cpuidle_enter_state+0x9c/0x230
[<ffffffff815bc9d7>] ? cpuidle_enter+0x17/0x20
[<ffffffff810cd6dc>] ? cpu_startup_entry+0x38c/0x420
[<ffffffff81053373>] ? start_secondary+0x173/0x1e0
Because divide-by-zero error happens in function:
update_group_capacity()
update_cpu_capacity()
scale_rt_capacity()
{
...
total = sched_avg_period() + delta;
used = div_u64(avg, total);
...
}
To fix this issue, check user input value of sysctl_sched_time_avg, keep
it unchanged when hitting invalid input, and set the minimum limit of
sysctl_sched_time_avg to 1 ms.
Reported-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: ethan.kernel@gmail.com
Cc: keescook@chromium.org
Cc: mcgrof@kernel.org
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1504504774-18253-1-git-send-email-ethan.zhao@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The kernel test bot (run by Xiaolong Ye) reported that the following commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
is causing double faults in a kernel compiled with GCC 4.4.
Linus subsequently diagnosed the crash pattern and the buggy commit and found that
the issue is with this code:
register unsigned int __asm_call_sp asm("esp");
#define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)
Even on a 64-bit kernel, it's using ESP instead of RSP. That causes GCC
to produce the following bogus code:
ffffffff8147461d: 89 e0 mov %esp,%eax
ffffffff8147461f: 4c 89 f7 mov %r14,%rdi
ffffffff81474622: 4c 89 fe mov %r15,%rsi
ffffffff81474625: ba 20 00 00 00 mov $0x20,%edx
ffffffff8147462a: 89 c4 mov %eax,%esp
ffffffff8147462c: e8 bf 52 05 00 callq ffffffff814c98f0 <copy_user_generic_unrolled>
Despite the absurdity of it backing up and restoring the stack pointer
for no reason, the bug is actually the fact that it's only backing up
and restoring the lower 32 bits of the stack pointer. The upper 32 bits
are getting cleared out, corrupting the stack pointer.
So change the '__asm_call_sp' register variable to be associated with
the actual full-size stack pointer.
This also requires changing the __ASM_SEL() macro to be based on the
actual compiled arch size, rather than the CONFIG value, because
CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso).
Otherwise Clang fails to build the kernel because it complains about the
use of a 64-bit register (RSP) in a 32-bit file.
Reported-and-Bisected-and-Tested-by: kernel test robot <xiaolong.ye@intel.com>
Diagnosed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it
its own print state because it will not in fact get woken by regular
wakeups and is a long-term state.
This requires moving TASK_PARKED into the TASK_REPORT mask, and since
that latter needs to be a contiguous bitmask, we need to shuffle the
bits around a bit.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Markus reported that tasks in TASK_IDLE state are reported by SysRq-W,
which results in undesirable clutter.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Markus reported that kthreads that idle using TASK_IDLE instead of
TASK_INTERRUPTIBLE are reported in as TASK_UNINTERRUPTIBLE and things
like htop mark those red.
This is undesirable, so add an explicit state for TASK_IDLE.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|