aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-17module: Add retpoline tag to VERMAGICAndi Kleen
Add a marker for retpoline to the module VERMAGIC. This catches the case when a non RETPOLINE compiled module gets loaded into a retpoline kernel, making it insecure. It doesn't handle the case when retpoline has been runtime disabled. Even in this case the match of the retcompile status will be enforced. This implies that even with retpoline run time disabled all modules loaded need to be recompiled. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Cc: rusty@rustcorp.com.au Cc: arjan.van.de.ven@intel.com Cc: jeyu@kernel.org Cc: torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
2018-01-17x86/cpufeature: Move processor tracing out of scattered featuresPaolo Bonzini
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX), so do not duplicate it in the scattered features word. Besides being more tidy, this will be useful for KVM when it presents processor tracing to the guests. KVM selects host features that are supported by both the host kernel (depending on command line options, CPU errata, or whatever) and KVM. Whenever a full feature word exists, KVM's code is written in the expectation that the CPUID bit number matches the X86_FEATURE_* bit number, but this is not the case for X86_FEATURE_INTEL_PT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luwei Kang <luwei.kang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16objtool: Improve error message for bad file argumentJosh Poimboeuf
If a nonexistent file is supplied to objtool, it complains with a non-helpful error: open: No such file or directory Improve it to: objtool: Can't open 'foo': No such file or directory Reported-by: Markus <M4rkusXXL@web.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16objtool: Fix seg fault with gold linkerJosh Poimboeuf
Objtool segfaults when the gold linker is used with CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y. With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before being passed to objtool. The gold linker seems to strip unused ELF symbols by default, which confuses objtool and causes the seg fault when it's trying to generate ORC metadata. Objtool should really be running immediately after GCC anyway, without a linker call in between. Change the makefile ordering so that objtool is called before the linker. Reported-and-tested-by: Markus <M4rkusXXL@web.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-15x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macrosTom Lendacky
The PAUSE instruction is currently used in the retpoline and RSB filling macros as a speculation trap. The use of PAUSE was originally suggested because it showed a very, very small difference in the amount of cycles/time used to execute the retpoline as compared to LFENCE. On AMD, the PAUSE instruction is not a serializing instruction, so the pause/jmp loop will use excess power as it is speculated over waiting for return to mispredict to the correct target. The RSB filling macro is applicable to AMD, and, if software is unable to verify that LFENCE is serializing on AMD (possible when running under a hypervisor), the generic retpoline support will be used and, so, is also applicable to AMD. Keep the current usage of PAUSE for Intel, but add an LFENCE instruction to the speculation trap for AMD. The same sequence has been adopted by GCC for the GCC generated retpolines. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Kees Cook <keescook@google.com> Link: https://lkml.kernel.org/r/20180113232730.31060.36287.stgit@tlendack-t1.amdoffice.net
2018-01-15x86/retpoline: Fill RSB on context switch for affected CPUsDavid Woodhouse
On context switch from a shallow call stack to a deeper one, as the CPU does 'ret' up the deeper side it may encounter RSB entries (predictions for where the 'ret' goes to) which were populated in userspace. This is problematic if neither SMEP nor KPTI (the latter of which marks userspace pages as NX for the kernel) are active, as malicious code in userspace may then be executed speculatively. Overwrite the CPU's return prediction stack with calls which are predicted to return to an infinite loop, to "capture" speculation if this happens. This is required both for retpoline, and also in conjunction with IBRS for !SMEP && !KPTI. On Skylake+ the problem is slightly different, and an *underflow* of the RSB may cause errant branch predictions to occur. So there it's not so much overwrite, as *filling* the RSB to attempt to prevent it getting empty. This is only a partial solution for Skylake+ since there are many other conditions which may result in the RSB becoming empty. The full solution on Skylake+ is to use IBRS, which will prevent the problem even when the RSB becomes empty. With IBRS, the RSB-stuffing will not be required on context switch. [ tglx: Added missing vendor check and slighty massaged comments and changelog ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
2018-01-15x86/kasan: Panic if there is not enough memory to bootAndrey Ryabinin
Currently KASAN doesn't panic in case it don't have enough memory to boot. Instead, it crashes in some random place: kernel BUG at arch/x86/mm/physaddr.c:27! RIP: 0010:__phys_addr+0x268/0x276 Call Trace: kasan_populate_shadow+0x3f2/0x497 kasan_init+0x12e/0x2b2 setup_arch+0x2825/0x2a2c start_kernel+0xc8/0x15f4 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x72/0x75 secondary_startup_64+0xa5/0xb0 Use memblock_virt_alloc_try_nid() for allocations without failure fallback. It will panic with an out of memory message. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: lkp@01.org Link: https://lkml.kernel.org/r/20180110153602.18919-1-aryabinin@virtuozzo.com
2018-01-14x86/retpoline: Remove compile time warningThomas Gleixner
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler does not have retpoline support. Linus rationale for this is: It's wrong because it will just make people turn off RETPOLINE, and the asm updates - and return stack clearing - that are independent of the compiler are likely the most important parts because they are likely the ones easiest to target. And it's annoying because most people won't be able to do anything about it. The number of people building their own compiler? Very small. So if their distro hasn't got a compiler yet (and pretty much nobody does), the warning is just annoying crap. It is already properly reported as part of the sysfs interface. The compile-time warning only encourages bad things. Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support") Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
2018-01-14x86,perf: Disable intel_bts when PTIPeter Zijlstra
The intel_bts driver does not use the 'normal' BTS buffer which is exposed through the cpu_entry_area but instead uses the memory allocated for the perf AUX buffer. This obviously comes apart when using PTI because then the kernel mapping; which includes that AUX buffer memory; disappears. Fixing this requires to expose a mapping which is visible in all context and that's not trivial. As a quick fix disable this driver when PTI is enabled to prevent malfunction. Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Robert Święcki <robert@swiecki.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: greg@kroah.com Cc: hughd@google.com Cc: luto@amacapital.net Cc: Vince Weaver <vince@deater.net> Cc: torvalds@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
2018-01-14security/Kconfig: Correct the Documentation reference for PTIW. Trevor King
When the config option for PTI was added a reference to documentation was added as well. But the documentation did not exist at that point. The final documentation has a different file name. Fix it up to point to the proper file. Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig") Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-security-module@vger.kernel.org Cc: James Morris <james.l.morris@oracle.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
2018-01-14x86/pti: Fix !PCID and sanitize definesThomas Gleixner
The switch to the user space page tables in the low level ASM code sets unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base address of the page directory to the user part, bit 11 is switching the PCID to the PCID associated with the user page tables. This fails on a machine which lacks PCID support because bit 11 is set in CR3. Bit 11 is reserved when PCID is inactive. While the Intel SDM claims that the reserved bits are ignored when PCID is disabled, the AMD APM states that they should be cleared. This went unnoticed as the AMD APM was not checked when the code was developed and reviewed and test systems with Intel CPUs never failed to boot. The report is against a Centos 6 host where the guest fails to boot, so it's not yet clear whether this is a virt issue or can happen on real hardware too, but thats irrelevant as the AMD APM clearly ask for clearing the reserved bits. Make sure that on non PCID machines bit 11 is not set by the page table switching code. Andy suggested to rename the related bits and masks so they are clearly describing what they should be used for, which is done as well for clarity. That split could have been done with alternatives but the macro hell is horrible and ugly. This can be done on top if someone cares to remove the extra orq. For now it's a straight forward fix. Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
2018-01-13selftests/x86: Add test_vsyscallAndy Lutomirski
This tests that the vsyscall entries do what they're expected to do. It also confirms that attempts to read the vsyscall page behave as expected. If changes are made to the vsyscall code or its memory map handling, running this test in all three of vsyscall=none, vsyscall=emulate, and vsyscall=native are helpful. (Because it's easy, this also compares the vsyscall results to their vDSO equivalents.) Note to KAISER backporters: please test this under all three vsyscall modes. Also, in the emulate and native modes, make sure that test_vsyscall_64 agrees with the command line or config option as to which mode you're in. It's quite easy to mess up the kernel such that native mode accidentally emulates or vice versa. Greg, etc: please backport this to all your Meltdown-patched kernels. It'll help make sure the patches didn't regress vsyscalls. CSigned-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-12x86/retpoline: Fill return stack buffer on vmexitDavid Woodhouse
In accordance with the Intel and AMD documentation, we need to overwrite all entries in the RSB on exiting a guest, to prevent malicious branch target predictions from affecting the host kernel. This is needed both for retpoline and for IBRS. [ak: numbers again for the RSB stuffing labels] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/irq32: Convert assembler indirect jumpsAndi Kleen
Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-12-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/checksum32: Convert assembler indirect jumpsDavid Woodhouse
Convert all indirect jumps in 32bit checksum assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-11-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/xen: Convert Xen hypercall indirect jumpsDavid Woodhouse
Convert indirect call in Xen hypercall to use non-speculative sequence, when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-10-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/hyperv: Convert assembler indirect jumpsDavid Woodhouse
Convert all indirect jumps in hyperv inline asm code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-9-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/ftrace: Convert ftrace assembler indirect jumpsDavid Woodhouse
Convert all indirect jumps in ftrace assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-8-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/entry: Convert entry assembler indirect jumpsDavid Woodhouse
Convert indirect jumps in core 32/64bit entry assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return address after the 'call' instruction must be *precisely* at the .Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work, and the use of alternatives will mess that up unless we play horrid games to prepend with NOPs and make the variants the same length. It's not worth it; in the case where we ALTERNATIVE out the retpoline, the first instruction at __x86.indirect_thunk.rax is going to be a bare jmp *%rax anyway. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline/crypto: Convert crypto assembler indirect jumpsDavid Woodhouse
Convert all indirect jumps in crypto assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-6-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/spectre: Add boot time option to select Spectre v2 mitigationDavid Woodhouse
Add a spectre_v2= option to select the mitigation used for the indirect branch speculation vulnerability. Currently, the only option available is retpoline, in its various forms. This will be expanded to cover the new IBRS/IBPB microcode features. The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a serializing instruction, which is indicated by the LFENCE_RDTSC feature. [ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS integration becomes simple ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
2018-01-12x86/retpoline: Add initial retpoline supportDavid Woodhouse
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide the corresponding thunks. Provide assembler macros for invoking the thunks in the same way that GCC does, from native and inline assembler. This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In some circumstances, IBRS microcode features may be used instead, and the retpoline can be disabled. On AMD CPUs if lfence is serialising, the retpoline can be dramatically simplified to a simple "lfence; jmp *\reg". A future patch, after it has been verified that lfence really is serialising in all circumstances, can enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition to X86_FEATURE_RETPOLINE. Do not align the retpoline in the altinstr section, because there is no guarantee that it stays aligned when it's copied over the oldinstr during alternative patching. [ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks] [ tglx: Put actual function CALL/JMP in front of the macros, convert to symbolic labels ] [ dwmw2: Convert back to numeric labels, merge objtool fixes ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
2018-01-12objtool: Allow alternatives to be ignoredJosh Poimboeuf
Getting objtool to understand retpolines is going to be a bit of a challenge. For now, take advantage of the fact that retpolines are patched in with alternatives. Just read the original (sane) non-alternative instruction, and ignore the patched-in retpoline. This allows objtool to understand the control flow *around* the retpoline, even if it can't yet follow what's inside. This means the ORC unwinder will fail to unwind from inside a retpoline, but will work fine otherwise. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-3-git-send-email-dwmw@amazon.co.uk
2018-01-12objtool: Detect jumps to retpoline thunksJosh Poimboeuf
A direct jump to a retpoline thunk is really an indirect jump in disguise. Change the objtool instruction type accordingly. Objtool needs to know where indirect branches are so it can detect switch statement jump tables. This fixes a bunch of warnings with CONFIG_RETPOLINE like: arch/x86/events/intel/uncore_nhmex.o: warning: objtool: nhmex_rbox_msr_enable_event()+0x44: sibling call from callable instruction with modified stack frame kernel/signal.o: warning: objtool: copy_siginfo_to_user()+0x91: sibling call from callable instruction with modified stack frame ... Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-2-git-send-email-dwmw@amazon.co.uk
2018-01-11x86/pti: Make unpoison of pgd for trusted boot work for realDave Hansen
The inital fix for trusted boot and PTI potentially misses the pgd clearing if pud_alloc() sets a PGD. It probably works in *practice* because for two adjacent calls to map_tboot_page() that share a PGD entry, the first will clear NX, *then* allocate and set the PGD (without NX clear). The second call will *not* allocate but will clear the NX bit. Defer the NX clearing to a point after it is known that all top-level allocations have occurred. Add a comment to clarify why. [ tglx: Massaged changelog ] Fixes: 262b6b30087 ("x86/tboot: Unbreak tboot with PTI enabled") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: "Tim Chen" <tim.c.chen@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: peterz@infradead.org Cc: ning.sun@intel.com Cc: tboot-devel@lists.sourceforge.net Cc: andi@firstfloor.org Cc: luto@kernel.org Cc: law@redhat.com Cc: pbonzini@redhat.com Cc: torvalds@linux-foundation.org Cc: gregkh@linux-foundation.org Cc: dwmw@amazon.co.uk Cc: nickc@redhat.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.com
2018-01-10x86/alternatives: Fix optimize_nops() checkingBorislav Petkov
The alternatives code checks only the first byte whether it is a NOP, but with NOPs in front of the payload and having actual instructions after it breaks the "optimized' test. Make sure to scan all bytes before deciding to optimize the NOPs in there. Reported-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
2018-01-09sysfs/cpu: Fix typos in vulnerability documentationDavid Woodhouse
Fixes: 87590ce6e ("sysfs/cpu: Add vulnerability folder") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-01-09x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSCTom Lendacky
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference to MFENCE_RDTSC. However, since the kernel could be running under a hypervisor that does not support writing that MSR, read the MSR back and verify that the bit has been set successfully. If the MSR can be read and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the MFENCE_RDTSC feature. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net
2018-01-09x86/cpu/AMD: Make LFENCE a serializing instructionTom Lendacky
To aid in speculation control, make LFENCE a serializing instruction since it has less overhead than MFENCE. This is done by setting bit 1 of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not have this MSR. For these families, the LFENCE instruction is already serializing. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net
2018-01-08x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()Jike Song
The following code contains dead logic: 162 if (pgd_none(*pgd)) { 163 unsigned long new_p4d_page = __get_free_page(gfp); 164 if (!new_p4d_page) 165 return NULL; 166 167 if (pgd_none(*pgd)) { 168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page))); 169 new_p4d_page = 0; 170 } 171 if (new_p4d_page) 172 free_page(new_p4d_page); 173 } There can't be any difference between two pgd_none(*pgd) at L162 and L167, so it's always false at L171. Dave Hansen explained: Yes, the double-test was part of an optimization where we attempted to avoid using a global spinlock in the fork() path. We would check for unallocated mid-level page tables without the lock. The lock was only taken when we needed to *make* an entry to avoid collisions. Now that it is all single-threaded, there is no chance of a collision, no need for a lock, and no need for the re-check. As all these functions are only called during init, mark them __init as well. Fixes: 03f4424f348e ("x86/mm/pti: Add functions to clone kernel PMDs") Signed-off-by: Jike Song <albcamus@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Jiri Koshina <jikos@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kees Cook <keescook@google.com> Cc: Andi Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg KH <gregkh@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paul Turner <pjt@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180108160341.3461-1-albcamus@gmail.com
2018-01-08x86/tboot: Unbreak tboot with PTI enabledDave Hansen
This is another case similar to what EFI does: create a new set of page tables, map some code at a low address, and jump to it. PTI mistakes this low address for userspace and mistakenly marks it non-executable in an effort to make it unusable for userspace. Undo the poison to allow execution. Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jeff Law <law@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: David" <dwmw@amazon.co.uk> Cc: Nick Clifton <nickc@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com
2018-01-08x86/cpu: Implement CPU vulnerabilites sysfs functionsThomas Gleixner
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and spectre_v2. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
2018-01-08sysfs/cpu: Add vulnerability folderThomas Gleixner
As the meltdown/spectre problem affects several CPU architectures, it makes sense to have common way to express whether a system is affected by a particular vulnerability or not. If affected the way to express the mitigation should be common as well. Create /sys/devices/system/cpu/vulnerabilities folder and files for meltdown, spectre_v1 and spectre_v2. Allow architectures to override the show function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
2018-01-06x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]David Woodhouse
Add the bug bits for spectre v1/2 and force them unconditionally for all cpus. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1515239374-23361-2-git-send-email-dwmw@amazon.co.uk
2018-01-06x86/Documentation: Add PTI descriptionDave Hansen
Add some details about how PTI works, what some of the downsides are, and how to debug it when things go wrong. Also document the kernel parameter: 'pti/nopti'. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at> Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Cc: Richard Fellner <richard.fellner@student.tugraz.at> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com
2018-01-06x86/pti: Unbreak EFI old_memmapJiri Kosina
EFI_OLD_MEMMAP's efi_call_phys_prolog() calls set_pgd() with swapper PGD that has PAGE_USER set, which makes PTI set NX on it, and therefore EFI can't execute it's code. Fix that by forcefully clearing _PAGE_NX from the PGD (this can't be done by the pgprot API). _PAGE_NX will be automatically reintroduced in efi_call_phys_epilog(), as _set_pgd() will again notice that this is _PAGE_USER, and set _PAGE_NX on it. Tested-by: Dimitri Sivanich <sivanich@hpe.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1801052215460.11852@cbobk.fhfr.pm
2018-01-05x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWNThomas Gleixner
Use the name associated with the particular attack which needs page table isolation for mitigation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jiri Koshina <jikos@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Lutomirski <luto@amacapital.net> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Greg KH <gregkh@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
2018-01-05x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asmDavid Woodhouse
Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile. Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: ak@linux.intel.com Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
2018-01-05x86/tlb: Drop the _GPL from the cpu_tlbstate exportThomas Gleixner
The recent changes for PTI touch cpu_tlbstate from various tlb_flush inlines. cpu_tlbstate is exported as GPL symbol, so this causes a regression when building out of tree drivers for certain graphics cards. Aside of that the export was wrong since it was introduced as it should have been EXPORT_PER_CPU_SYMBOL_GPL(). Use the correct PER_CPU export and drop the _GPL to restore the previous state which allows users to utilize the cards they payed for. As always I'm really thrilled to make this kind of change to support the #friends (or however the hot hashtag of today is spelled) from that closet sauce graphics corp. Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org
2018-01-05x86/events/intel/ds: Use the proper cache flush method for mapping ds buffersPeter Zijlstra
Thomas reported the following warning: BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498 caller is native_flush_tlb_single+0x57/0xc0 native_flush_tlb_single+0x57/0xc0 __set_pte_vaddr+0x2d/0x40 set_pte_vaddr+0x2f/0x40 cea_set_pte+0x30/0x40 ds_update_cea.constprop.4+0x4d/0x70 reserve_ds_buffers+0x159/0x410 x86_reserve_hardware+0x150/0x160 x86_pmu_event_init+0x3e/0x1f0 perf_try_init_event+0x69/0x80 perf_event_alloc+0x652/0x740 SyS_perf_event_open+0x3f6/0xd60 do_syscall_64+0x5c/0x190 set_pte_vaddr is used to map the ds buffers into the cpu entry area, but there are two problems with that: 1) The resulting flush is not supposed to be called in preemptible context 2) The cpu entry area is supposed to be per CPU, but the debug store buffers are mapped for all CPUs so these mappings need to be flushed globally. Add the necessary preemption protection across the mapping code and flush TLBs globally. Fixes: c1961a4631da ("x86/events/intel/ds: Map debug buffers in cpu_entry_area") Reported-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net
2018-01-05x86/kaslr: Fix the vaddr_end messThomas Gleixner
vaddr_end for KASLR is only documented in the KASLR code itself and is adjusted depending on config options. So it's not surprising that a change of the memory layout causes KASLR to have the wrong vaddr_end. This can map arbitrary stuff into other areas causing hard to understand problems. Remove the whole ifdef magic and define the start of the cpu_entry_area to be the end of the KASLR vaddr range. Add documentation to that effect. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com>, Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04x86/mm: Map cpu_entry_area at the same place on 4/5 levelThomas Gleixner
There is no reason for 4 and 5 level pagetables to have a different layout. It just makes determining vaddr_end for KASLR harder than necessary. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com>, Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04x86/mm: Set MODULES_END to 0xffffffffff000000Andrey Ryabinin
Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary. So passing page unaligned address to kasan_populate_zero_shadow() have two possible effects: 1) It may leave one page hole in supposed to be populated area. After commit 21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that hole happens to be in the shadow covering fixmap area and leads to crash: BUG: unable to handle kernel paging request at fffffbffffe8ee04 RIP: 0010:check_memory_region+0x5c/0x190 Call Trace: <NMI> memcpy+0x1f/0x50 ghes_copy_tofrom_phys+0xab/0x180 ghes_read_estatus+0xfb/0x280 ghes_notify_nmi+0x2b2/0x410 nmi_handle+0x115/0x2c0 default_do_nmi+0x57/0x110 do_nmi+0xf8/0x150 end_repeat_nmi+0x1a/0x1e Note, the crash likely disappeared after commit 92a0f81d8957, which changed kasan_populate_zero_shadow() call the way it was before commit 21506525fb8d. 2) Attempt to load module near MODULES_END will fail, because __vmalloc_node_range() called from kasan_module_alloc() will hit the WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error. To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned which means that MODULES_END should be 8*PAGE_SIZE aligned. The whole point of commit f06bdd4001c2 was to move MODULES_END down if NR_CPUS is big, so the cpu_entry_area takes a lot of space. But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") the cpu_entry_area is no longer in fixmap, so we could just set MODULES_END to a fixed 8*PAGE_SIZE aligned address. Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") Reported-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
2018-01-03x86/process: Define cpu_tss_rw in same section as declarationNick Desaulniers
cpu_tss_rw is declared with DECLARE_PER_CPU_PAGE_ALIGNED but then defined with DEFINE_PER_CPU_SHARED_ALIGNED leading to section mismatch warnings. Use DEFINE_PER_CPU_PAGE_ALIGNED consistently. This is necessary because it's mapped to the cpu entry area and must be page aligned. [ tglx: Massaged changelog a bit ] Fixes: 1a935bc3d4ea ("x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: thomas.lendacky@amd.com Cc: Borislav Petkov <bpetkov@suse.de> Cc: tklauser@distanz.ch Cc: minipli@googlemail.com Cc: me@kylehuey.com Cc: namit@vmware.com Cc: luto@kernel.org Cc: jpoimboe@redhat.com Cc: tj@kernel.org Cc: cl@linux.com Cc: bp@suse.de Cc: thgarnie@google.com Cc: kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180103203954.183360-1-ndesaulniers@google.com
2018-01-03x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()Thomas Gleixner
The preparation for PTI which added CR3 switching to the entry code misplaced the CR3 switch in entry_SYSCALL_compat(). With PTI enabled the entry code tries to access a per cpu variable after switching to kernel GS. This fails because that variable is not mapped to user space. This results in a double fault and in the worst case a kernel crash. Move the switch ahead of the access and clobber RSP which has been saved already. Fixes: 8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching") Reported-by: Lars Wendler <wendler.lars@web.de> Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org>, Cc: Dave Hansen <dave.hansen@linux.intel.com>, Cc: Peter Zijlstra <peterz@infradead.org>, Cc: Greg KH <gregkh@linuxfoundation.org>, , Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>, Cc: Juergen Gross <jgross@suse.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
2018-01-03x86/dumpstack: Print registers for first stack frameJosh Poimboeuf
In the stack dump code, if the frame after the starting pt_regs is also a regs frame, the registers don't get printed. Fix that. Reported-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Alexander Tsoy <alexander@tsoy.me> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@vger.kernel.org Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/396f84491d2f0ef64eda4217a2165f5712f6a115.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03x86/dumpstack: Fix partial register dumpsJosh Poimboeuf
The show_regs_safe() logic is wrong. When there's an iret stack frame, it prints the entire pt_regs -- most of which is random stack data -- instead of just the five registers at the end. show_regs_safe() is also poorly named: the on_stack() checks aren't for safety. Rename the function to show_regs_if_on_stack() and add a comment to explain why the checks are needed. These issues were introduced with the "partial register dump" feature of the following commit: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") That patch had gone through a few iterations of development, and the above issues were artifacts from a previous iteration of the patch where 'regs' pointed directly to the iret frame rather than to the (partially empty) pt_regs. Tested-by: Alexander Tsoy <alexander@tsoy.me> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@vger.kernel.org Fixes: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") Link: http://lkml.kernel.org/r/5b05b8b344f59db2d3d50dbdeba92d60f2304c54.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03x86/pti: Make sure the user/kernel PTEs matchThomas Gleixner
Meelis reported that his K8 Athlon64 emits MCE warnings when PTI is enabled: [Hardware Error]: Error Addr: 0x0000ffff81e000e0 [Hardware Error]: MC1 Error: L1 TLB multimatch. [Hardware Error]: cache level: L1, tx: INSN The address is in the entry area, which is mapped into kernel _AND_ user space. That's special because we switch CR3 while we are executing there. User mapping: 0xffffffff81e00000-0xffffffff82000000 2M ro PSE GLB x pmd Kernel mapping: 0xffffffff81000000-0xffffffff82000000 16M ro PSE x pmd So the K8 is complaining that the TLB entries differ. They differ in the GLB bit. Drop the GLB bit when installing the user shared mapping. Fixes: 6dc72c3cbca0 ("x86/mm/pti: Share entry text PMD") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Meelis Roos <mroos@linux.ee> Cc: Borislav Petkov <bp@alien8.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031407180.1957@nanos
2018-01-03x86/cpu, x86/pti: Do not enable PTI on AMD processorsTom Lendacky
AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault. Disable page table isolation by default on AMD processors by not setting the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI is set. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20171227054354.20369.94587.stgit@tlendack-t1.amdoffice.net
2018-01-03x86/pti: Enable PTI by defaultThomas Gleixner
This really want's to be enabled by default. Users who know what they are doing can disable it either in the config or on the kernel command line. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org