aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-21x86/platform/UV: Add adjustable set memory block size functionmike.travis@hpe.com
Add a new function to "adjust" the current fixed UV memory block size of 2GB so it can be changed to a different physical boundary. This is out of necessity so arch dependent code can accommodate specific BIOS requirements which can align these new PMEM modules at less than the default boundaries. A "set order" type of function was used to insure that the memory block size will be a power of two value without requiring a validity check. 64GB was chosen as the upper limit for memory block size values to accommodate upcoming 4PB systems which have 6 more bits of physical address space (46 becoming 52). Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Andrew Banman <andrew.banman@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: jgross@suse.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/20180524201711.609546602@stormcage.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/build: Remove unnecessary preparation for purgatoryMasahiro Yamada
kexec-purgatory.c is properly generated when Kbuild descend into the arch/x86/purgatory/. Thus the 'archprepare' target is redundant. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1529401422-28838-3-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21Revert "kexec/purgatory: Add clean-up for purgatory directory"Masahiro Yamada
Reverts the following commit: b0108f9e93d0 ("kexec: purgatory: add clean-up for purgatory directory") ... which incorrectly stated that the kexec-purgatory.c and purgatory.ro files were not removed after 'make mrproper'. In fact, they are. You can confirm it after reverting it. $ make mrproper $ touch arch/x86/purgatory/kexec-purgatory.c $ touch arch/x86/purgatory/purgatory.ro $ make mrproper CLEAN arch/x86/purgatory $ ls arch/x86/purgatory/ entry64.S Makefile purgatory.c setup-x86_64.S stack.S string.c This is obvious from the build system point of view. arch/x86/Makefile adds 'arch/x86' to core-y. Hence 'make clean' descends like this: arch/x86/Kbuild -> arch/x86/purgatory/Makefile Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/1529401422-28838-2-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/xen: Add call of speculative_store_bypass_ht_init() to PV pathsJuergen Gross
Commit: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") ... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence. speculative_store_bypass_ht_init() needs to be called on each CPU for PV guests, too. Reported-by: Brian Woods <brian.woods@amd.com> Tested-by: Brian Woods <brian.woods@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD") Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-20x86: Call fixup_exception() before notify_die() in math_error()Siarhei Liakh
fpu__drop() has an explicit fwait which under some conditions can trigger a fixable FPU exception while in kernel. Thus, we should attempt to fixup the exception first, and only call notify_die() if the fixup failed just like in do_general_protection(). The original call sequence incorrectly triggers KDB entry on debug kernels under particular FPU-intensive workloads. Andy noted, that this makes the whole conditional irq enable thing even more inconsistent, but fixing that it outside the scope of this. Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Borislav Petkov" <bpetkov@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/DM5PR11MB201156F1CAB2592B07C79A03B17D0@DM5PR11MB2011.namprd11.prod.outlook.com
2018-06-09x86/intel_rdt: Enable CMT and MBM on new Skylake steppingTony Luck
New stepping of Skylake has fixes for cache occupancy and memory bandwidth monitoring. Update the code to enable these by default on newer steppings. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org # v4.14 Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: https://lkml.kernel.org/r/20180608160732.9842-1-tony.luck@intel.com
2018-06-06x86/apic/vector: Print APIC control bits in debugfsThomas Gleixner
Extend the debugability of the vector management by adding the state bits to the debugfs output. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.908136099@linutronix.de
2018-06-06genirq/affinity: Defer affinity setting if irq chip is busyThomas Gleixner
The case that interrupt affinity setting fails with -EBUSY can be handled in the kernel completely by using the already available generic pending infrastructure. If a irq_chip::set_affinity() fails with -EBUSY, handle it like the interrupts for which irq_chip::set_affinity() can only be invoked from interrupt context. Copy the new affinity mask to irq_desc::pending_mask and set the affinity pending bit. The next raised interrupt for the affected irq will check the pending bit and try to set the new affinity from the handler. This avoids that -EBUSY is returned when an affinity change is requested from user space and the previous change has not been cleaned up. The new affinity will take effect when the next interrupt is raised from the device. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
2018-06-06x86/platform/uv: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special uv_ack_apic() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de
2018-06-06x86/ioapic: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of directly invoking ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de
2018-06-06irq_remapping: Use apic_ack_irq()Thomas Gleixner
To address the EBUSY fail of interrupt affinity settings in case that the previous setting has not been cleaned up yet, use the new apic_ack_irq() function instead of the special ir_ack_apic_edge() implementation which is merily a wrapper around ack_APIC_irq(). Preparatory change for the real fix Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.555716895@linutronix.de
2018-06-06x86/apic: Provide apic_ack_irq()Thomas Gleixner
apic_ack_edge() is explicitely for handling interrupt affinity cleanup when interrupt remapping is not available or disable. Remapped interrupts and also some of the platform specific special interrupts, e.g. UV, invoke ack_APIC_irq() directly. To address the issue of failing an affinity update with -EBUSY the delayed affinity mechanism can be reused, but ack_APIC_irq() does not handle that. Adding this to ack_APIC_irq() is not possible, because that function is also used for exceptions and directly handled interrupts like IPIs. Create a new function, which just contains the conditional invocation of irq_move_irq() and the final ack_APIC_irq(). Reuse the new function in apic_ack_edge(). Preparatory change for the real fix. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06genirq/migration: Avoid out of line call if pending is not setThomas Gleixner
The upcoming fix for the -EBUSY return from affinity settings requires to use the irq_move_irq() functionality even on irq remapped interrupts. To avoid the out of line call, move the check for the pending bit into an inline helper. Preparatory change for the real fix. No functional change. Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
2018-06-06genirq/generic_pending: Do not lose pending affinity updateThomas Gleixner
The generic pending interrupt mechanism moves interrupts from the interrupt handler on the original target CPU to the new destination CPU. This is required for x86 and ia64 due to the way the interrupt delivery and acknowledge works if the interrupts are not remapped. However that update can fail for various reasons. Some of them are valid reasons to discard the pending update, but the case, when the previous move has not been fully cleaned up is not a legit reason to fail. Check the return value of irq_do_set_affinity() for -EBUSY, which indicates a pending cleanup, and rearm the pending move in the irq dexcriptor so it's tried again when the next interrupt arrives. Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
2018-06-06x86/apic/vector: Prevent hlist corruption and leaksThomas Gleixner
Several people observed the WARN_ON() in irq_matrix_free() which triggers when the caller tries to free an vector which is not in the allocation range. Song provided the trace information which allowed to decode the root cause. The rework of the vector allocation mechanism failed to preserve a sanity check, which prevents setting a new target vector/CPU when the previous affinity change has not fully completed. As a result a half finished affinity change can be overwritten, which can cause the leak of a irq descriptor pointer on the previous target CPU and double enqueue of the hlist head into the cleanup lists of two or more CPUs. After one CPU cleaned up its vector the next CPU will invoke the cleanup handler with vector 0, which triggers the out of range warning in the matrix allocator. Prevent this by checking the apic_data of the interrupt whether the move_in_progress flag is false and the hlist node is not hashed. Return -EBUSY if not. This prevents the damage and restores the behaviour before the vector allocation rework, but due to other changes in that area it also widens the chance that user space can observe -EBUSY. In theory this should be fine, but actually not all user space tools handle -EBUSY correctly. Addressing that is not part of this fix, but will be addressed in follow up patches. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Dmitry Safonov <0x7f454c46@gmail.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de
2018-06-06x86/vector: Fix the args of vector_alloc tracepointDou Liyang
The vector_alloc tracepont reversed the reserved and ret aggs, that made the trace print wrong. Exchange them. Fixes: 8d1e3dca7de6 ("x86/vector: Add tracepoints for vector management") Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.fnst@cn.fujitsu.com
2018-06-06x86/idt: Simplify the idt_setup_apic_and_irq_gates()Dou Liyang
The idt_setup_apic_and_irq_gates() sets the gates from FIRST_EXTERNAL_VECTOR up to FIRST_SYSTEM_VECTOR first. then secondly, from FIRST_SYSTEM_VECTOR to NR_VECTORS, it takes both APIC=y and APIC=n into account. But for APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS, all vectors has been set at the first step. Simplify the second step, make it just work for APIC=y. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180523023555.2933-1-douly.fnst@cn.fujitsu.com
2018-06-06x86/platform/uv: Remove extra parenthesesVarsha Rao
Remove extra parentheses to fix the extraneous parentheses clang warning. Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180520080012.8215-1-rvarsha016@gmail.com
2018-06-06x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SMEKirill A. Shutemov
AMD SME claims one bit from physical address to indicate whether the page is encrypted or not. To achieve that we clear out the bit from __PHYSICAL_MASK. The capability to adjust __PHYSICAL_MASK is required beyond AMD SME. For instance for upcoming Intel Multi-Key Total Memory Encryption. Factor it out into a separate feature with own Kconfig handle. It also helps with overhead of AMD SME. It saves more than 3k in .text on defconfig + AMD_MEM_ENCRYPT: add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564) We would need to return to this once we have infrastructure to patch constants in code. That's good candidate for it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-mm@kvack.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
2018-06-06x86: Mark native_set_p4d() as __always_inlineArnd Bergmann
When CONFIG_OPTIMIZE_INLINING is enabled, the function native_set_p4d() may not be fully inlined into the caller, resulting in a false-positive warning about an access to the __pgtable_l5_enabled variable from a non-__init function, despite the original caller being an __init function: WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_set_p4d() to the variable .init.data:__pgtable_l5_enabled WARNING: vmlinux.o(.text.unlikely+0x1429): Section mismatch in reference from the function native_p4d_clear() to the variable .init.data:__pgtable_l5_enabled The function native_set_p4d() references the variable __initdata __pgtable_l5_enabled. This is often because native_set_p4d lacks a __initdata annotation or the annotation of __pgtable_l5_enabled is wrong. Marking the native_set_p4d function and its caller native_p4d_clear() avoids this problem. I did not bisect the original cause, but I assume this is related to the recent rework that turned pgtable_l5_enabled() into an inline function, which in turn caused the compiler to make different inlining decisions. Fixes: ad3fe525b950 ("x86/mm: Unify pgtable_l5_enabled usage in early boot code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Link: https://lkml.kernel.org/r/20180605113715.1133726-1-arnd@arndb.de
2018-06-04Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into ↵Ingo Molnar
x86/urgent Merge these small and simple 1-2 commit branches into the urgent branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-03Linux 4.17Linus Torvalds
2018-06-03Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro. - fix io_destroy()/aio_complete() race - the vfs_open() change to get rid of open_check_o_direct() boilerplate was nice, but buggy. Al has a patch avoiding a revert, but that's definitely not a last-day fodder, so for now revert it is... * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Revert "fs: fold open_check_o_direct into do_dentry_open" fix io_destroy()/aio_complete() race
2018-06-03Revert "fs: fold open_check_o_direct into do_dentry_open"Al Viro
This reverts commit cab64df194667dc5d9d786f0a895f647f5501c0d. Having vfs_open() in some cases drop the reference to struct file combined with error = vfs_open(path, f, cred); if (error) { put_filp(f); return ERR_PTR(error); } return f; is flat-out wrong. It used to be error = vfs_open(path, f, cred); if (!error) { /* from now on we need fput() to dispose of f */ error = open_check_o_direct(f); if (error) { fput(f); f = ERR_PTR(error); } } else { put_filp(f); f = ERR_PTR(error); } and sure, having that open_check_o_direct() boilerplate gotten rid of is nice, but not that way... Worse, another call chain (via finish_open()) is FUBAR now wrt FILE_OPENED handling - in that case we get error returned, with file already hit by fput() *AND* FILE_OPENED not set. Guess what happens in path_openat(), when it hits if (!(opened & FILE_OPENED)) { BUG_ON(!error); put_filp(file); } The root cause of all that crap is that the callers of do_dentry_open() have no way to tell which way did it fail; while that could be fixed up (by passing something like int *opened to do_dentry_open() and have it marked if we'd called ->open()), it's probably much too late in the cycle to do so right now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-03Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: - two patches addressing the problem that the scheduler allows under certain conditions user space tasks to be scheduled on CPUs which are not yet fully booted which causes a few subtle and hard to debug issue - add a missing runqueue clock update in the deadline scheduler which triggers a warning under certain circumstances - fix a silly typo in the scheduler header file * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/headers: Fix typo sched/deadline: Fix missing clock update sched/core: Require cpu_active() in select_task_rq(), for user tasks sched/core: Fix rules for running on online && !active CPUs
2018-06-03Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fixes from Thomas Gleixner: - fix 'perf test Session topology' segfault on s390 (Thomas Richter) - fix NULL return handling in bpf__prepare_load() (YueHaibing) - fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier) - fix perf.data format description of NRCPUS header (Arnaldo Carvalho de Melo) - update perf.data documentation section on cpu topology - handle uncore event aliases in small groups properly (Kan Liang) - add missing perf_sample.addr into python sample dictionary (Leo Yan) * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix perf.data format description of NRCPUS header perf script python: Add addr into perf sample dict perf data: Update documentation section on cpu topology perf cs-etm: Fix indexing for decoder packet queue perf bpf: Fix NULL return handling in bpf__prepare_load() perf test: "Session topology" dumps core on s390 perf parse-events: Handle uncore event aliases in small groups properly
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Infinite loop in _decode_session6(), from Eric Dumazet. 2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric Dumazet. 3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux. 4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki Makita. 5) Incorrect idr release in cls_flower, from Paul Blakey. 6) Probe error handling fix in davinci_emac, from Dan Carpenter. 7) Memory leak in XPS configuration, from Alexander Duyck. 8) Use after free with cloned sockets in kcm, from Kirill Tkhai. 9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas Dichtel. 10) Fix UAPI hole in bpf data structure for 32-bit compat applications, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) bpf: fix uapi hole for 32 bit compat applications net: usb: cdc_mbim: add flag FLAG_SEND_ZLP ip6_tunnel: remove magic mtu value 0xFFF8 ip_tunnel: restore binding to ifaces with a large mtu net: dsa: b53: Add BCM5389 support kcm: Fix use-after-free caused by clonned sockets net-sysfs: Fix memory leak in XPS configuration ixgbe: fix parsing of TC actions for HW offload net: ethernet: davinci_emac: fix error handling in probe() net/ncsi: Fix array size in dumpit handler cls_flower: Fix incorrect idr release when failing to modify rule net/sonic: Use dma_mapping_error() xfrm Fix potential error pointer dereference in xfrm_bundle_create. vhost_net: flush batched heads before trying to busy polling tun: Fix NULL pointer dereference in XDP redirect be2net: Fix error detection logic for BE3 net: qmi_wwan: Add Netgear Aircard 779S mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG atm: zatm: fix memcmp casting iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs ...
2018-06-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "Eve of merge window fix: The original code was so bogus as to be casting the wrong generic device to an rport and proceeding to take actions based on the bogus values it found. Fortunately it seems the location that is dereferenced always exists, so the code hasn't oopsed yet, but it certainly annoys the memory checkers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: scsi_transport_srp: Fix shost to rport translation
2018-06-02Merge tag 'drm-fixes-for-v4.17-rc8' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A few final fixes: i915: - fix for potential Spectre vector in the new query uAPI - fix NULL pointer deref (FDO #106559) - DMI fix to hide LVDS for Radiant P845 (FDO #105468) amdgpu: - suspend/resume DC regression fix - underscan flicker fix on fiji - gamma setting fix after dpms omap: - fix oops regression core: - fix PSR timing dw-hdmi: - fix oops regression" * tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux: drm/amd/display: Update color props when modeset is required drm/amd/display: Make atomic-check validate underscan changes drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense drm/amd/display: Fix BUG_ON during CRTC atomic check update drm/i915/query: nospec expects no more than an unsigned long drm/i915/query: Protect tainted function pointer lookup drm/i915/lvds: Move acpi lid notification registration to registration phase drm/i915: Disable LVDS on Radiant P845 drm/omap: fix NULL deref crash with SDI displays drm/psr: Fix missed entry in PSR setup time table.
2018-06-03Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Two last minute DC fixes for 4.17. A fix for underscan on fiji and a fix for gamma settings getting after dpms. * 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux: drm/amd/display: Update color props when modeset is required drm/amd/display: Make atomic-check validate underscan changes
2018-06-02Merge tag 'mips_fixes_4.17_3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from James Hogan: "A final few MIPS fixes for 4.17: - drop Lantiq gphy reboot/remove reset (4.14) - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0) - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)" * tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests MIPS: lantiq: gphy: Drop reboot/remove reset asserts
2018-06-02Merge tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fix from Alex Williamson: "Revert a pfn page mapping optimization identified as introducing a bad page state regression (Alex Williamson)" * tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio: Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"
2018-06-02Merge tag 'char-misc-4.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are four small bugfixes for some char/misc drivers. Well, really three fixes and one fix for one of those fixes due to problems found by 0-day. This resolves some reported issues with the hwtracing drivers, and a reported regression for the thunderbolt subsystem. All of these have been in linux-next for a while now with no reported problems" * tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: hwtracing: stm: fix build error on some arches intel_th: Use correct device when freeing buffers stm class: Use vmalloc for the master map thunderbolt: Handle NULL boot ACL entries properly
2018-06-02Merge tag 'staging-4.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO driver fixes from Greg KH: "Here are some old IIO driver fixes that were sitting in my tree for a few weeks. Sorry about not getting them to you sooner. They fix a number of small IIO driver issues that have been reported. All of these have been in linux-next for a while with no reported problems" * tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: select buffer for at91-sama5d2_adc iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels iio:kfifo_buf: check for uint overflow iio:buffer: make length types match kfifo types iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock iio: adc: stm32-dfsdm: fix successive oversampling settings iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
2018-06-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Just three small last minute regressions that were found in the last week. The Broadcom fix is a bit big for rc7, but since it is fixing driver crash regressions that were merged via netdev into rc1, I am sending it. - bnxt netdev changes merged this cycle caused the bnxt RDMA driver to crash under certain situations - Arnd found (several, unfortunately) kconfig problems with the patches adding INFINIBAND_ADDR_TRANS. Reverting this last part, will fix it more fully outside -rc. - Subtle change in error code for a uapi function caused breakage in userspace. This was bug was subtly introduced cycle" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/core: Fix error code for invalid GID entry IB: Revert "remove redundant INFINIBAND kconfig dependencies" RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
2018-06-02Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A documentation bugfix and a MAINTAINERS addition" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: ocores: update HDL sources URL i2c: xlp9xx: Add MAINTAINERS entry
2018-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge two fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix the NULL mapping case in __isolate_lru_page() mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()
2018-06-02mm: fix the NULL mapping case in __isolate_lru_page()Hugh Dickins
George Boole would have noticed a slight error in 4.16 commit 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page"). Fix it, to match both the comment above it, and the original behaviour. Although anonymous pages are not marked PageDirty at first, we have an old habit of calling SetPageDirty when a page is removed from swap cache: so there's a category of ex-swap pages that are easily migratable, but were inadvertently excluded from compaction's async migration in 4.16. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805302014001.12558@eggly.anvils Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Ivan Kalvachev <ikalvachev@gmail.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-02mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()Hugh Dickins
Swapping load on huge=always tmpfs (with khugepaged tuned up to be very eager, but I'm not sure that is relevant) soon hung uninterruptibly, waiting for page lock in shmem_getpage_gfp()'s find_lock_entry(), most often when "cp -a" was trying to write to a smallish file. Debug showed that the page in question was not locked, and page->mapping NULL by now, but page->index consistent with having been in a huge page before. Reproduced in minutes on a 4.15 kernel, even with 4.17's 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") added in; but took hours to reproduce on a 4.17 kernel (no idea why). The culprit proved to be the __ClearPageDirty() on tails beyond i_size in __split_huge_page(): the non-atomic __bitoperation may have been safe when 4.8's baa355fd3314 ("thp: file pages support for split_huge_page()") introduced it, but liable to erase PageWaiters after 4.10's 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit"). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805291841070.3197@eggly.anvils Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-02Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"Alex Williamson
Bisection by Amadeusz Sławiński implicates this commit leading to bad page state issues after VM shutdown, likely due to unbalanced page references. The original commit was intended only as a performance improvement, therefore revert for offline rework. Link: https://lkml.org/lkml/2018/6/2/97 Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping") Cc: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com> Reported-by: Amadeusz Sławiński <amade@asmblr.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-06-02 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in order to fix offsets on 32 bit archs. This will have a minor merge conflict with net-next which has the __u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this location. Resolution is to use the gpl_compatible member. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01bpf: fix uapi hole for 32 bit compat applicationsDaniel Borkmann
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the case of struct bpf_map_info but also struct bpf_prog_info. In net-next commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info") added a bitfield into it to expose some flags related to programs. Thus, add an unnamed __u32 bitfield for both so that alignment keeps the same in both 32 and 64 bit cases, and can be naturally extended from there as in b85fab0e67b. Before: # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ __u64 netns_dev; /* 44 8 */ __u64 netns_ino; /* 52 8 */ /* size: 64, cachelines: 1, members: 10 */ /* padding: 4 */ }; After (same as on 64 bit): # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ __u64 netns_dev; /* 48 8 */ __u64 netns_ino; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 10 */ /* sum members: 60, holes: 1, sum holes: 4 */ }; Reported-by: Dmitry V. Levin <ldv@altlinux.org> Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps") Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-01net: usb: cdc_mbim: add flag FLAG_SEND_ZLPDaniele Palmas
Testing Telit LM940 with ICMP packets > 14552 bytes revealed that the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc mbim data interface won't be anymore responsive. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01Merge branch 'tunnel-mtus'David S. Miller
Nicolas Dichtel says: ==================== ip[6] tunnels: fix mtu calculations The first patch restores the possibility to bind an ip4 tunnel to an interface whith a large mtu. The second patch was spotted after the first fix. I also target it to net because it fixes the max mtu value that can be used for ipv6 tunnels. v2: remove the 0xfff8 in ip_tunnel_newlink() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01ip6_tunnel: remove magic mtu value 0xFFF8Nicolas Dichtel
I don't know where this value comes from (probably a copy and paste and paste and paste ...). Let's use standard values which are a bit greater. Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01ip_tunnel: restore binding to ifaces with a large mtuNicolas Dichtel
After commit f6cc9c054e77, the following conf is broken (note that the default loopback mtu is 65536, ie IP_MAX_MTU + 1): $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo add tunnel "gre0" failed: Invalid argument $ ip l a type dummy $ ip l s dummy1 up $ ip l s dummy1 mtu 65535 $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1 add tunnel "gre0" failed: Invalid argument dev_set_mtu() doesn't allow to set a mtu which is too large. First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove the magic value 0xFFF8 and use IP_MAX_MTU instead. 0xFFF8 seems to be there for ages, I don't know why this value was used. With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU: $ ip l s dummy1 mtu 66000 After that patch, it's also possible to bind an ip tunnel on that kind of interface. CC: Petr Machata <petrm@mellanox.com> CC: Ido Schimmel <idosch@mellanox.com> Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-05-31 1) Avoid possible overflow of the offset variable in _decode_session6(), this fixes an infinite lookp there. From Eric Dumazet. 2) We may use an error pointer in the error path of xfrm_bundle_create(). Fix this by returning this pointer directly to the caller. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: dsa: b53: Add BCM5389 supportDamien Thébault
This patch adds support for the BCM5389 switch connected through MDIO. Signed-off-by: Damien Thébault <damien.thebault@vitec.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01kcm: Fix use-after-free caused by clonned socketsKirill Tkhai
(resend for properly queueing in patchwork) kcm_clone() creates kernel socket, which does not take net counter. Thus, the net may die before the socket is completely destructed, i.e. kcm_exit_net() is executed before kcm_done(). Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net-sysfs: Fix memory leak in XPS configurationAlexander Duyck
This patch reorders the error cases in showing the XPS configuration so that we hold off on memory allocation until after we have verified that we can support XPS on a given ring. Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>