aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/entry
AgeCommit message (Collapse)Author
2018-02-28x86/entry/64: Add instruction suffixJan Beulich
Omitting suffixes from instructions in AT&T mode is bad practice when operand size cannot be determined by the assembler from register operands, and is likely going to be warned about by upstream gas in the future (mine does already). Add the single missing suffix here. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/5A93F96902000078001ABAC8@prv-mh.provo.novell.com
2018-02-21x86/entry/64: Simplify ENCODE_FRAME_POINTERJosh Poimboeuf
On 64-bit, the stack pointer is always aligned on interrupt, so instead of setting the LSB of the pt_regs address, we can just add 1 to it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180221024214.lhl5jfgw33c4vz3m@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Open-code switch_to_thread_stack()Dominik Brodowski
Open-code the two instances which called switch_to_thread_stack(). This allows us to remove the wrapper around DO_SWITCH_TO_THREAD_STACK. While at it, update the UNWIND hint to reflect where the IRET frame is, and update the commentary to reflect what we are actually doing here. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-7-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move ASM_CLAC to interrupt_entry()Dominik Brodowski
Moving ASM_CLAC to interrupt_entry means two instructions (addq / pushq and call interrupt_entry) are not covered by it. However, it offers a noticeable size reduction (-.2k): text data bss dec hex filename 16882 0 0 16882 41f2 entry_64.o-orig 16623 0 0 16623 40ef entry_64.o Suggested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-6-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Remove 'interrupt' macroDominik Brodowski
It is now trivial to call interrupt_entry() and then the actual worker. Therefore, remove the interrupt macro and open code it all. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()Dominik Brodowski
We can also move the CLD, SWAPGS, and the switch_to_thread_stack() call to the interrupt_entry() helper function. As we do not want call depths of two, convert switch_to_thread_stack() to a macro. However, switch_to_thread_stack() has another user in entry_64_compat.S, which currently expects it to be a function. To keep the code changes in this patch minimal, create a wrapper function. The switch to a macro means that there is some binary code duplication if CONFIG_IA32_EMULATION=y is enabled. Therefore, the size reduction differs whether CONFIG_IA32_EMULATION is enabled or not: CONFIG_IA32_EMULATION=y (-0.13k): text data bss dec hex filename 17158 0 0 17158 4306 entry_64.o-orig 17028 0 0 17028 4284 entry_64.o CONFIG_IA32_EMULATION=n (-0.27k): text data bss dec hex filename 17158 0 0 17158 4306 entry_64.o-orig 16882 0 0 16882 41f2 entry_64.o Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entryDominik Brodowski
Moving the switch to IRQ stack from the interrupt macro to the helper function requires some trickery: All ENTER_IRQ_STACK really cares about is where the "original" stack -- meaning the GP registers etc. -- is stored. Therefore, we need to offset the stored RSP value by 8 whenever ENTER_IRQ_STACK is called from within a function. In such cases, and after switching to the IRQ stack, we need to push the "original" return address (i.e. the return address from the call to the interrupt entry function) to the IRQ stack. This trickery allows us to carve another .85k from the text size (it would be more except for the additional unwind hints): text data bss dec hex filename 18006 0 0 18006 4656 entry_64.o-orig 17158 0 0 17158 4306 entry_64.o Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper functionDominik Brodowski
The PUSH_AND_CLEAR_REGS macro is able to insert the GP registers "above" the original return address. This allows us to move a sizeable part of the interrupt entry macro to an interrupt entry helper function: text data bss dec hex filename 21088 0 0 21088 5260 entry_64.o-orig 18006 0 0 18006 4656 entry_64.o Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20Revert "x86/retpoline: Simplify vmexit_fill_RSB()"David Woodhouse
This reverts commit 1dde7415e99933bb7293d6b2843752cbdb43ec11. By putting the RSB filling out of line and calling it, we waste one RSB slot for returning from the function itself, which means one fewer actual function call we can make if we're doing the Skylake abomination of call-depth counting. It also changed the number of RSB stuffings we do on vmexit from 32, which was correct, to 16. Let's just stop with the bikeshedding; it didn't actually *fix* anything anyway. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arjan.van.de.ven@intel.com Cc: bp@alien8.de Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Link: http://lkml.kernel.org/r/1519037457-7643-4-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/entry/64: Use 'xorl' for faster register clearingDominik Brodowski
On some x86 CPU microarchitectures using 'xorq' to clear general-purpose registers is slower than 'xorl'. As 'xorl' is sufficient to clear all 64 bits of these registers due to zero-extension [*], switch the x86 64-bit entry code to use 'xorl'. No change in functionality and no change in code size. [*] According to Intel 64 and IA-32 Architecture Software Developer's Manual, section 3.4.1.1, the result of 32-bit operands are "zero- extended to a 64-bit result in the destination general-purpose register." The AMD64 Architecture Programmer’s Manual Volume 3, Appendix B.1, describes the same behaviour. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180214175924.23065-3-linux@dominikbrodowski.net [ Improved on the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/entry: Reduce the code footprint of the 'idtentry' macroDominik Brodowski
Play a little trick in the generic PUSH_AND_CLEAR_REGS macro to insert the GP registers "above" the original return address. This allows us to (re-)insert the macro in error_entry() and paranoid_entry() and to remove it from the idtentry macro. This reduces the static footprint significantly: text data bss dec hex filename 24307 0 0 24307 5ef3 entry_64.o-orig 20987 0 0 20987 51fb entry_64.o Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180214175924.23065-2-linux@dominikbrodowski.net [ Small tweaks to comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-14Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar: "Here's the latest set of Spectre and PTI related fixes and updates: Spectre: - Add entry code register clearing to reduce the Spectre attack surface - Update the Spectre microcode blacklist - Inline the KVM Spectre helpers to get close to v4.14 performance again. - Fix indirect_branch_prediction_barrier() - Fix/improve Spectre related kernel messages - Fix array_index_nospec_mask() asm constraint - KVM: fix two MSR handling bugs PTI: - Fix a paranoid entry PTI CR3 handling bug - Fix comments objtool: - Fix paranoid_entry() frame pointer warning - Annotate WARN()-related UD2 as reachable - Various fixes - Add Add Peter Zijlstra as objtool co-maintainer Misc: - Various x86 entry code self-test fixes - Improve/simplify entry code stack frame generation and handling after recent heavy-handed PTI and Spectre changes. (There's two more WIP improvements expected here.) - Type fix for cache entries There's also some low risk non-fix changes I've included in this branch to reduce backporting conflicts: - rename a confusing x86_cpu field name - de-obfuscate the naming of single-TLB flushing primitives" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) x86/entry/64: Fix CR3 restore in paranoid_exit() x86/cpu: Change type of x86_cache_size variable to unsigned int x86/spectre: Fix an error message x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping selftests/x86/mpx: Fix incorrect bounds with old _sigfault x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]() x86/speculation: Add <asm/msr-index.h> dependency nospec: Move array_index_nospec() parameter checking into separate macro x86/speculation: Fix up array_index_nospec_mask() asm constraint x86/debug: Use UD2 for WARN() x86/debug, objtool: Annotate WARN()-related UD2 as reachable objtool: Fix segfault in ignore_unreachable_insn() selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory selftests/x86/pkeys: Remove unused functions selftests/x86: Clean up and document sscanf() usage selftests/x86: Fix vDSO selftest segfault for vsyscall=none x86/entry/64: Remove the unused 'icebp' macro ...
2018-02-15x86/entry/64: Fix CR3 restore in paranoid_exit()Ingo Molnar
Josh Poimboeuf noticed the following bug: "The paranoid exit code only restores the saved CR3 when it switches back to the user GS. However, even in the kernel GS case, it's possible that it needs to restore a user CR3, if for example, the paranoid exception occurred in the syscall exit path between SWITCH_TO_USER_CR3_STACK and SWAPGS." Josh also confirmed via targeted testing that it's possible to hit this bug. Fix the bug by also restoring CR3 in the paranoid_exit_no_swapgs branch. The reason we haven't seen this bug reported by users yet is probably because "paranoid" entry points are limited to the following cases: idtentry double_fault do_double_fault has_error_code=1 paranoid=2 idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK idtentry machine_check do_mce has_error_code=0 paranoid=1 Amongst those entry points only machine_check is one that will interrupt an IRQS-off critical section asynchronously - and machine check events are rare. The other main asynchronous entries are NMI entries, which can be very high-freq with perf profiling, but they are special: they don't use the 'idtentry' macro but are open coded and restore user CR3 unconditionally so don't have this bug. Reported-and-tested-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180214073910.boevmg65upbk3vqb@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Remove the unused 'icebp' macroBorislav Petkov
That macro was touched around 2.5.8 times, judging by the full history linux repo, but it was unused even then. Get rid of it already. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux@dominikbrodowski.net Link: http://lkml.kernel.org/r/20180212201318.GD14640@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Fix paranoid_entry() frame pointer warningJosh Poimboeuf
With the following commit: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros") ... one of my suggested improvements triggered a frame pointer warning: arch/x86/entry/entry_64.o: warning: objtool: paranoid_entry()+0x11: call without frame pointer save/setup The warning is correct for the build-time code, but it's actually not relevant at runtime because of paravirt patching. The paravirt swapgs call gets replaced with either a SWAPGS instruction or NOPs at runtime. Go back to the previous behavior by removing the ELF function annotation for paranoid_entry() and adding an unwind hint, which effectively silences the warning. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Fixes: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros") Link: http://lkml.kernel.org/r/20180212174503.5acbymg5z6p32snu@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Indent PUSH_AND_CLEAR_REGS and POP_REGS properlyDominik Brodowski
... same as the other macros in arch/x86/entry/calling.h Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-8-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and ↵Dominik Brodowski
SAVE_AND_CLEAR_REGS macros Previously, error_entry() and paranoid_entry() saved the GP registers onto stack space previously allocated by its callers. Combine these two steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro for that. This adds a significant amount ot text size. However, Ingo Molnar points out that: "these numbers also _very_ significantly over-represent the extra footprint. The assumptions that resulted in us compressing the IRQ entry code have changed very significantly with the new x86 IRQ allocation code we introduced in the last year: - IRQ vectors are usually populated in tightly clustered groups. With our new vector allocator code the typical per CPU allocation percentage on x86 systems is ~3 device vectors and ~10 fixed vectors out of ~220 vectors - i.e. a very low ~6% utilization (!). [...] The days where we allocated a lot of vectors on every CPU and the compression of the IRQ entry code text mattered are over. - Another issue is that only a small minority of vectors is frequent enough to actually matter to cache utilization in practice: 3-4 key IPIs and 1-2 device IRQs at most - and those vectors tend to be tightly clustered as well into about two groups, and are probably already on 2-3 cache lines in practice. For the common case of 'cache cold' IRQs it's the depth of the call chain and the fragmentation of the resulting I$ that should be the main performance limit - not the overall size of it. - The CPU side cost of IRQ delivery is still very expensive even in the best, most cached case, as in 'over a thousand cycles'. So much stuff is done that maybe contemporary x86 IRQ entry microcode already prefetches the IDT entry and its expected call target address."[*] [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need modification. Previously, %rsp was manually decreased by 15*8; with this patch, %rsp is decreased by 15 pushq instructions. [jpoimboe@redhat.com: unwind hint improvements] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Use PUSH_AND_CLEAN_REGS in more casesDominik Brodowski
entry_SYSCALL_64_after_hwframe() and nmi() can be converted to use PUSH_AND_CLEAN_REGS instead of opencoded variants thereof. Due to the interleaving, the additional XOR-based clearing of R8 and R9 in entry_SYSCALL_64_after_hwframe() should not have any noticeable negative implications. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-6-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macroDominik Brodowski
Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Interleave XOR register clearing with PUSH instructionsDominik Brodowski
Same as is done for syscalls, interleave XOR with PUSH instructions for exceptions/interrupts, in order to minimize the cost of the additional instructions required for register clearing. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Merge the POP_C_REGS and POP_EXTRA_REGS macros into a single ↵Dominik Brodowski
POP_REGS macro The two special, opencoded cases for POP_C_REGS can be handled by ASM macros. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Merge SAVE_C_REGS and SAVE_EXTRA_REGS, remove unused extensionsDominik Brodowski
All current code paths call SAVE_C_REGS and then immediately SAVE_EXTRA_REGS. Therefore, merge these two macros and order the MOV sequeneces properly. While at it, remove the macros to save all except specific registers, as these macros have been unused for a long time. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-10Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "ARM: - icache invalidation optimizations, improving VM startup time - support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - a small fix for power-management notifiers, and some cosmetic changes PPC: - add MMIO emulation for vector loads and stores - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - improve the handling of escalation interrupts with the XIVE interrupt controller - support decrement register migration - various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - exitless interrupts for emulated devices - cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - show vcpu id in its anonymous inode name - many fixes and cleanups - per-VCPU MSR bitmaps (already merged through x86/pti branch) - stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)" * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits) KVM: PPC: Book3S: Add MMIO emulation for VMX instructions KVM: PPC: Book3S HV: Branch inside feature section KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code KVM: PPC: Book3S PR: Fix broken select due to misspelling KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs() KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled KVM: PPC: Book3S HV: Drop locks before reading guest memory kvm: x86: remove efer_reload entry in kvm_vcpu_stat KVM: x86: AMD Processor Topology Information x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested kvm: embed vcpu id to dentry of vcpu anon inode kvm: Map PFN-type memory regions as writable (if possible) x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n KVM: arm/arm64: Fixup userspace irqchip static key optimization KVM: arm/arm64: Fix userspace_irqchip_in_use counting KVM: arm/arm64: Fix incorrect timer_is_pending logic MAINTAINERS: update KVM/s390 maintainers MAINTAINERS: add Halil as additional vfio-ccw maintainer MAINTAINERS: add David as a reviewer for KVM/s390 ...
2018-02-10x86/mm/pti: Fix PTI comment in entry_SYSCALL_64()Nadav Amit
The comment is confusing since the path is taken when CONFIG_PAGE_TABLE_ISOLATION=y is disabled (while the comment says it is not taken). Signed-off-by: Nadav Amit <namit@vmware.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: nadav.amit@gmail.com Link: http://lkml.kernel.org/r/20180209170638.15161-1-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06Merge branch 'linus' into sched/urgent, to resolve conflictsIngo Molnar
Conflicts: arch/arm64/kernel/entry.S arch/x86/Kconfig include/linux/sched/mm.h kernel/fork.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06x86/entry/64/compat: Clear registers for compat syscalls, to reduce ↵Dan Williams
speculation attack surface At entry userspace may have populated registers with values that could otherwise be useful in a speculative execution attack. Clear them to minimize the kernel's attack surface. Originally-From: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151787989697.7847.4083702787288600552.stgit@dwillia2-desk3.amr.corp.intel.com [ Made small improvements to the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06x86/entry/64: Clear registers for exceptions/interrupts, to reduce ↵Dan Williams
speculation attack surface Clear the 'extra' registers on entering the 64-bit kernel for exceptions and interrupts. The common registers are not cleared since they are likely clobbered well before they can be exploited in a speculative execution attack. Originally-From: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151787989146.7847.15749181712358213254.stgit@dwillia2-desk3.amr.corp.intel.com [ Made small improvements to the changelog and the code comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06x86/entry/64: Clear extra registers beyond syscall arguments, to reduce ↵Dan Williams
speculation attack surface At entry userspace may have (maliciously) populated the extra registers outside the syscall calling convention with arbitrary values that could be useful in a speculative execution (Spectre style) attack. Clear these registers to minimize the kernel's attack surface. Note, this only clears the extra registers and not the unused registers for syscalls less than 6 arguments, since those registers are likely to be clobbered well before their values could be put to use under speculation. Note, Linus found that the XOR instructions can be executed with minimized cost if interleaved with the PUSH instructions, and Ingo's analysis found that R10 and R11 should be included in the register clearing beyond the typical 'extra' syscall calling convention registers. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151787988577.7847.16733592218894189003.stgit@dwillia2-desk3.amr.corp.intel.com [ Made small improvements to the changelog and the code comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier/x86: Provide core serializing commandMathieu Desnoyers
There are two places where core serialization is needed by membarrier: 1) When returning from the membarrier IPI, 2) After scheduler updates curr to a thread with a different mm, before going back to user-space, since the curr->mm is used by membarrier to check whether it needs to send an IPI to that CPU. x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go back to user-space. The IRET instruction is core serializing, but not SYSEXIT. x86-64 uses IRET as return from interrupt, which takes care of the IPI. However, it can return to user-space through either SYSRETL (compat code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing, we rely instead on write_cr3() performed by switch_mm() to provide core serialization after changing the current mm, and deal with the special case of kthread -> uthread (temporarily keeping current mm into active_mm) by adding a sync_core() in that specific case. Use the new sync_core_before_usermode() to guarantee this. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-04Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull spectre/meltdown updates from Thomas Gleixner: "The next round of updates related to melted spectrum: - The initial set of spectre V1 mitigations: - Array index speculation blocker and its usage for syscall, fdtable and the n180211 driver. - Speculation barrier and its usage in user access functions - Make indirect calls in KVM speculation safe - Blacklisting of known to be broken microcodes so IPBP/IBSR are not touched. - The initial IBPB support and its usage in context switch - The exposure of the new speculation MSRs to KVM guests. - A fix for a regression in x86/32 related to the cpu entry area - Proper whitelisting for known to be safe CPUs from the mitigations. - objtool fixes to deal proper with retpolines and alternatives - Exclude __init functions from retpolines which speeds up the boot process. - Removal of the syscall64 fast path and related cleanups and simplifications - Removal of the unpatched paravirt mode which is yet another source of indirect unproteced calls. - A new and undisputed version of the module mismatch warning - A couple of cleanup and correctness fixes all over the place Yet another step towards full mitigation. There are a few things still missing like the RBS underflow mitigation for Skylake and other small details, but that's being worked on. That said, I'm taking a belated christmas vacation for a week and hope that everything is magically solved when I'm back on Feb 12th" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/x86: Add IBPB support KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL x86/pti: Mark constant arrays as __initconst x86/spectre: Simplify spectre_v2 command line parsing x86/retpoline: Avoid retpolines for built-in __init functions x86/kvm: Update spectre-v1 mitigation KVM: VMX: make MSR bitmaps per-VCPU x86/paravirt: Remove 'noreplace-paravirt' cmdline option x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" x86/spectre: Report get_user mitigation for spectre_v1 nl80211: Sanitize array index in parse_txq_params vfs, fdtable: Prevent bounds-check bypass via speculative execution x86/syscall: Sanitize syscall table de-references under speculation x86/get_user: Use pointer masking to limit speculation ...
2018-01-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - handle 'infinitely'-long sleeping tasks, from Miroslav Benes - remove 'immediate' feature, as it turns out it doesn't provide the originally expected semantics, and brings more issues than value * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add locking to force and signal functions livepatch: Remove immediate feature livepatch: force transition to finish livepatch: send a fake signal to all blocking tasks
2018-01-30x86/hyperv: Reenlightenment notifications supportVitaly Kuznetsov
Hyper-V supports Live Migration notification. This is supposed to be used in conjunction with TSC emulation: when a VM is migrated to a host with different TSC frequency for some short period the host emulates the accesses to TSC and sends an interrupt to notify about the event. When the guest is done updating everything it can disable TSC emulation and everything will start working fast again. These notifications weren't required until now as Hyper-V guests are not supposed to use TSC as a clocksource: in Linux the TSC is even marked as unstable on boot. Guests normally use 'tsc page' clocksource and host updates its values on migrations automatically. Things change when with nested virtualization: even when the PV clocksources (kvm-clock or tsc page) are passed through to the nested guests the TSC frequency and frequency changes need to be know.. Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that the fix in on the way. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
2018-01-30x86/syscall: Sanitize syscall table de-references under speculationDan Williams
The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/asm: Move 'status' from thread_struct to thread_infoAndy Lutomirski
The TS_COMPAT bit is very hot and is accessed from code paths that mostly also touch thread_info::flags. Move it into struct thread_info to improve cache locality. The only reason it was in thread_struct is that there was a brief period during which arch-specific fields were not allowed in struct thread_info. Linus suggested further changing: ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); to: if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED))) ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); on the theory that frequently dirtying the cacheline even in pure 64-bit code that never needs to modify status hurts performance. That could be a reasonable followup patch, but I suspect it matters less on top of this patch. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org
2018-01-30x86/entry/64: Push extra regs right awayAndy Lutomirski
With the fast path removed there is no point in splitting the push of the normal and the extra register set. Just push the extra regs right away. [ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
2018-01-30x86/entry/64: Remove the SYSCALL64 fast pathAndy Lutomirski
The SYCALLL64 fast path was a nice, if small, optimization back in the good old days when syscalls were actually reasonably fast. Now there is PTI to slow everything down, and indirect branches are verboten, making everything messier. The retpoline code in the fast path is particularly nasty. Just get rid of the fast path. The slow path is barely slower. [ tglx: Split out the 'push all extra regs' part ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
2018-01-30Merge tag 'v4.15' into x86/pti, to be able to merge dependent changesIngo Molnar
Time has come to switch PTI development over to a v4.15 base - we'll still try to make sure that all PTI fixes backport cleanly to v4.14 and earlier. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-29Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti updates from Thomas Gleixner: "Another set of melted spectrum related changes: - Code simplifications and cleanups for RSB and retpolines. - Make the indirect calls in KVM speculation safe. - Whitelist CPUs which are known not to speculate from Meltdown and prepare for the new CPUID flag which tells the kernel that a CPU is not affected. - A less rigorous variant of the module retpoline check which merily warns when a non-retpoline protected module is loaded and reflects that fact in the sysfs file. - Prepare for Indirect Branch Prediction Barrier support. - Prepare for exposure of the Speculation Control MSRs to guests, so guest OSes which depend on those "features" can use them. Includes a blacklist of the broken microcodes. The actual exposure of the MSRs through KVM is still being worked on" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Simplify indirect_branch_prediction_barrier() x86/retpoline: Simplify vmexit_fill_RSB() x86/cpufeatures: Clean up Spectre v2 related CPUID flags x86/cpu/bugs: Make retpoline module warning conditional x86/bugs: Drop one "mitigation" from dmesg x86/nospec: Fix header guards names x86/alternative: Print unadorned pointers x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown x86/msr: Add definitions for new speculation control MSRs x86/cpufeatures: Add AMD feature bits for Speculation Control x86/cpufeatures: Add Intel feature bits for Speculation Control x86/cpufeatures: Add CPUID_7_EDX CPUID leaf module/retpoline: Warn about missing retpoline in module KVM: VMX: Make indirect call speculation safe KVM: x86: Make indirect calls in emulator speculation safe
2018-01-27x86/retpoline: Simplify vmexit_fill_RSB()Borislav Petkov
Simplify it to call an asm-function instead of pasting 41 insn bytes at every call site. Also, add alignment to the macro as suggested here: https://support.google.com/faqs/answer/7625886 [dwmw2: Clean up comments, let it clobber %ebx and just tell the compiler] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ak@linux.intel.com Cc: dave.hansen@intel.com Cc: karahmed@amazon.de Cc: arjan@linux.intel.com Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: pbonzini@redhat.com Cc: tim.c.chen@linux.intel.com Cc: gregkh@linux-foundation.org Link: https://lkml.kernel.org/r/1517070274-12128-3-git-send-email-dwmw@amazon.co.uk
2018-01-21Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti fixes from Thomas Gleixner: "A small set of fixes for the meltdown/spectre mitigations: - Make kprobes aware of retpolines to prevent probes in the retpoline thunks. - Make the machine check exception speculation protected. MCE used to issue an indirect call directly from the ASM entry code. Convert that to a direct call into a C-function and issue the indirect call from there so the compiler can add the retpoline protection, - Make the vmexit_fill_RSB() assembly less stupid - Fix a typo in the PTI documentation" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Optimize inline assembler for vmexit_fill_RSB x86/pti: Document fix wrong index kprobes/x86: Disable optimizing on the function jumps to indirect thunk kprobes/x86: Blacklist indirect thunk functions for kprobes retpoline: Introduce start/end markers of indirect thunk x86/mce: Make machine check speculation protected
2018-01-19x86/mce: Make machine check speculation protectedThomas Gleixner
The machine check idtentry uses an indirect branch directly from the low level code. This evades the speculation protection. Replace it by a direct call into C code and issue the indirect call there so the compiler can apply the proper speculation protection. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by:Borislav Petkov <bp@alien8.de> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Niced-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
2018-01-17Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti bits and fixes from Thomas Gleixner: "This last update contains: - An objtool fix to prevent a segfault with the gold linker by changing the invocation order. That's not just for gold, it's a general robustness improvement. - An improved error message for objtool which spares tearing hairs. - Make KASAN fail loudly if there is not enough memory instead of oopsing at some random place later - RSB fill on context switch to prevent RSB underflow and speculation through other units. - Make the retpoline/RSB functionality work reliably for both Intel and AMD - Add retpoline to the module version magic so mismatch can be detected - A small (non-fix) update for cpufeatures which prevents cpu feature clashing for the upcoming extra mitigation bits to ease backporting" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: module: Add retpoline tag to VERMAGIC x86/cpufeature: Move processor tracing out of scattered features objtool: Improve error message for bad file argument objtool: Fix seg fault with gold linker x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros x86/retpoline: Fill RSB on context switch for affected CPUs x86/kasan: Panic if there is not enough memory to boot
2018-01-15x86/retpoline: Fill RSB on context switch for affected CPUsDavid Woodhouse
On context switch from a shallow call stack to a deeper one, as the CPU does 'ret' up the deeper side it may encounter RSB entries (predictions for where the 'ret' goes to) which were populated in userspace. This is problematic if neither SMEP nor KPTI (the latter of which marks userspace pages as NX for the kernel) are active, as malicious code in userspace may then be executed speculatively. Overwrite the CPU's return prediction stack with calls which are predicted to return to an infinite loop, to "capture" speculation if this happens. This is required both for retpoline, and also in conjunction with IBRS for !SMEP && !KPTI. On Skylake+ the problem is slightly different, and an *underflow* of the RSB may cause errant branch predictions to occur. So there it's not so much overwrite, as *filling* the RSB to attempt to prevent it getting empty. This is only a partial solution for Skylake+ since there are many other conditions which may result in the RSB becoming empty. The full solution on Skylake+ is to use IBRS, which will prevent the problem even when the RSB becomes empty. With IBRS, the RSB-stuffing will not be required on context switch. [ tglx: Added missing vendor check and slighty massaged comments and changelog ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
2018-01-14Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "This contains: - a PTI bugfix to avoid setting reserved CR3 bits when PCID is disabled. This seems to cause issues on a virtual machine at least and is incorrect according to the AMD manual. - a PTI bugfix which disables the perf BTS facility if PTI is enabled. The BTS AUX buffer is not globally visible and causes the CPU to fault when the mapping disappears on switching CR3 to user space. A full fix which restores BTS on PTI is non trivial and will be worked on. - PTI bugfixes for EFI and trusted boot which make sure that the user space visible page table entries have the NX bit cleared - removal of dead code in the PTI pagetable setup functions - add PTI documentation - add a selftest for vsyscall to verify that the kernel actually implements what it advertises. - a sysfs interface to expose vulnerability and mitigation information so there is a coherent way for users to retrieve the status. - the initial spectre_v2 mitigations, aka retpoline: + The necessary ASM thunk and compiler support + The ASM variants of retpoline and the conversion of affected ASM code + Make LFENCE serializing on AMD so it can be used as speculation trap + The RSB fill after vmexit - initial objtool support for retpoline As I said in the status mail this is the most of the set of patches which should go into 4.15 except two straight forward patches still on hold: - the retpoline add on of LFENCE which waits for ACKs - the RSB fill after context switch Both should be ready to go early next week and with that we'll have covered the major holes of spectre_v2 and go back to normality" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86,perf: Disable intel_bts when PTI security/Kconfig: Correct the Documentation reference for PTI x86/pti: Fix !PCID and sanitize defines selftests/x86: Add test_vsyscall x86/retpoline: Fill return stack buffer on vmexit x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline: Add initial retpoline support objtool: Allow alternatives to be ignored objtool: Detect jumps to retpoline thunks x86/pti: Make unpoison of pgd for trusted boot work for real x86/alternatives: Fix optimize_nops() checking sysfs/cpu: Fix typos in vulnerability documentation x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC ...
2018-01-14x86/pti: Fix !PCID and sanitize definesThomas Gleixner
The switch to the user space page tables in the low level ASM code sets unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base address of the page directory to the user part, bit 11 is switching the PCID to the PCID associated with the user page tables. This fails on a machine which lacks PCID support because bit 11 is set in CR3. Bit 11 is reserved when PCID is inactive. While the Intel SDM claims that the reserved bits are ignored when PCID is disabled, the AMD APM states that they should be cleared. This went unnoticed as the AMD APM was not checked when the code was developed and reviewed and test systems with Intel CPUs never failed to boot. The report is against a Centos 6 host where the guest fails to boot, so it's not yet clear whether this is a virt issue or can happen on real hardware too, but thats irrelevant as the AMD APM clearly ask for clearing the reserved bits. Make sure that on non PCID machines bit 11 is not set by the page table switching code. Andy suggested to rename the related bits and masks so they are clearly describing what they should be used for, which is done as well for clarity. That split could have been done with alternatives but the macro hell is horrible and ugly. This can be done on top if someone cares to remove the extra orq. For now it's a straight forward fix. Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
2018-01-12x86/retpoline/entry: Convert entry assembler indirect jumpsDavid Woodhouse
Convert indirect jumps in core 32/64bit entry assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return address after the 'call' instruction must be *precisely* at the .Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work, and the use of alternatives will mess that up unless we play horrid games to prepend with NOPs and make the variants the same length. It's not worth it; in the case where we ALTERNATIVE out the retpoline, the first instruction at __x86.indirect_thunk.rax is going to be a bare jmp *%rax anyway. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
2018-01-03Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 page table isolation fixes from Thomas Gleixner: "A couple of urgent fixes for PTI: - Fix a PTE mismatch between user and kernel visible mapping of the cpu entry area (differs vs. the GLB bit) and causes a TLB mismatch MCE on older AMD K8 machines - Fix the misplaced CR3 switch in the SYSCALL compat entry code which causes access to unmapped kernel memory resulting in double faults. - Fix the section mismatch of the cpu_tss_rw percpu storage caused by using a different mechanism for declaration and definition. - Two fixes for dumpstack which help to decode entry stack issues better - Enable PTI by default in Kconfig. We should have done that earlier, but it slipped through the cracks. - Exclude AMD from the PTI enforcement. Not necessarily a fix, but if AMD is so confident that they are not affected, then we should not burden users with the overhead" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/process: Define cpu_tss_rw in same section as declaration x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat() x86/dumpstack: Print registers for first stack frame x86/dumpstack: Fix partial register dumps x86/pti: Make sure the user/kernel PTEs match x86/cpu, x86/pti: Do not enable PTI on AMD processors x86/pti: Enable PTI by default
2018-01-03x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()Thomas Gleixner
The preparation for PTI which added CR3 switching to the entry code misplaced the CR3 switch in entry_SYSCALL_compat(). With PTI enabled the entry code tries to access a per cpu variable after switching to kernel GS. This fails because that variable is not mapped to user space. This results in a double fault and in the worst case a kernel crash. Move the switch ahead of the access and clobber RSP which has been saved already. Fixes: 8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching") Reported-by: Lars Wendler <wendler.lars@web.de> Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org>, Cc: Dave Hansen <dave.hansen@linux.intel.com>, Cc: Peter Zijlstra <peterz@infradead.org>, Cc: Greg KH <gregkh@linuxfoundation.org>, , Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>, Cc: Juergen Gross <jgross@suse.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
2017-12-29Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 page table isolation updates from Thomas Gleixner: "This is the final set of enabling page table isolation on x86: - Infrastructure patches for handling the extra page tables. - Patches which map the various bits and pieces which are required to get in and out of user space into the user space visible page tables. - The required changes to have CR3 switching in the entry/exit code. - Optimizations for the CR3 switching along with documentation how the ASID/PCID mechanism works. - Updates to dump pagetables to cover the user space page tables for W+X scans and extra debugfs files to analyze both the kernel and the user space visible page tables The whole functionality is compile time controlled via a config switch and can be turned on/off on the command line as well" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) x86/ldt: Make the LDT mapping RO x86/mm/dump_pagetables: Allow dumping current pagetables x86/mm/dump_pagetables: Check user space page table for WX pages x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy x86/mm/pti: Add Kconfig x86/dumpstack: Indicate in Oops whether PTI is configured and enabled x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming x86/mm: Use INVPCID for __native_flush_tlb_single() x86/mm: Optimize RESTORE_CR3 x86/mm: Use/Fix PCID to optimize user/kernel switches x86/mm: Abstract switching CR3 x86/mm: Allow flushing for future ASID switches x86/pti: Map the vsyscall page if needed x86/pti: Put the LDT in its own PGD if PTI is on x86/mm/64: Make a full PGD-entry size hole in the memory map x86/events/intel/ds: Map debug buffers in cpu_entry_area x86/cpu_entry_area: Add debugstore entries to cpu_entry_area x86/mm/pti: Map ESPFIX into user space x86/mm/pti: Share entry text PMD x86/entry: Align entry text section to PMD boundary ...
2017-12-23x86/mm: Optimize RESTORE_CR3Peter Zijlstra
Most NMI/paranoid exceptions will not in fact change pagetables and would thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3 writes. Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the kernel mappings and now that we track which user PCIDs need flushing we can avoid those too when possible. This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both sites have plenty available. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>