aboutsummaryrefslogtreecommitdiff
path: root/Documentation/x86/x86_64
AgeCommit message (Collapse)Author
2021-10-05ABI: sysfs-mce: add a new ABI fileMauro Carvalho Chehab
Reduce the gap of missing ABIs for Intel servers with MCE by adding a new ABI file. The contents of this file comes from: Documentation/x86/x86_64/machinecheck.rst Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/801a26985e32589eb78ba4b728d3e19fdea18f04.1632994837.git.mchehab+huawei@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-08Merge tag 'docs-5.15-2' of git://git.lwn.net/linuxLinus Torvalds
Pull more documentation updates from Jonathan Corbet: "Another collection of documentation patches, mostly fixes but also includes another set of traditional Chinese translations" * tag 'docs-5.15-2' of git://git.lwn.net/linux: docs: pdfdocs: Fix typo in CJK-language specific font settings docs: kernel-hacking: Remove inappropriate text docs/zh_TW: add translations for zh_TW/filesystems docs/zh_TW: add translations for zh_TW/cpu-freq docs/zh_TW: add translations for zh_TW/arm64 docs/zh_CN: Modify the translator tag and fix the wrong word Documentation/features/vm: correct huge-vmap APIs Documentation: block: blk-mq: Fix small typo in multi-queue docs Documentation: in_irq() cleanup Documentation: arm: marvell: Add 88F6825 model into list Documentation/process/maintainer-pgp-guide: Replace broken link to PGP path finder Documentation: locking: fix references Documentation: Update details of The Linux Kernel Module Programming Guide docs: x86: Remove obsolete information about x86_64 vmalloc() faulting Documentation/process/applying-patches: Activate linux-next man hyperlink
2021-08-20docs: x86: Remove obsolete information about x86_64 vmalloc() faultingPeilin Ye
x86_64 vmalloc() mappings are no longer "synchronized" among page tables via faulting since commit 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area"), since the corresponding P4D or PUD pages are now preallocated at boot, by preallocate_vmalloc_pages(). Drop the "lazily synchronized" description for less confusion. While this file is x86_64-specific, it is worth noting that things are different for x86_32, where vmalloc()-related changes to `init_mm.pgd` are synchronized to all page tables in the system during runtime, via arch_sync_kernel_mappings(). Unfortunately, this synchronization is subject to race condition, which is further handled via faulting, see vmalloc_fault(). See commit 4819e15f740e ("x86/mm/32: Bring back vmalloc faulting on x86_32") for more details. Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20210818220123.2623-1-yepeilin.cs@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-08-12x86/reboot: Document how to override DMI platform quirksPaul Gortmaker
commit 5955633e91bf ("x86/reboot: Skip DMI checks if reboot set by user") made it so that it's not required to recompile the kernel in order to bypass broken reboot quirks compiled into an image: * This variable is used privately to keep track of whether or not * reboot_type is still set to its default value (i.e., reboot= hasn't * been set on the command line). This is needed so that we can * suppress DMI scanning for reboot quirks. Without it, it's * impossible to override a faulty reboot quirk without recompiling. However, at the time it was not eally documented outside the source code, and so this information isn't really available to the average user out there. The change is a little white lie and invented "reboot=default" since it is easy to remember, and documents well. The truth is that any random string that is *not* a currently accepted string will work. Since that doesn't document well for non-coders, and since it's unknown what the future additions might be, lay claim on "default" since that is exactly what it achieves. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210530162447.996461-3-paul.gortmaker@windriver.com
2021-08-12x86/reboot: Document the "reboot=pci" optionPaul Gortmaker
It is mentioned in the top level non-arch specific file but it was overlooked here for x86. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210530162447.996461-2-paul.gortmaker@windriver.com
2021-06-10doc: Remove references to IBM CalgaryHubert Jasudowicz
The Calgary IOMMU driver has been removed in 90dc392fc445 ("x86: Remove the calgary IOMMU driver") Clean up stale docs that refer to it. Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1bd2b57dd1db53df09e520b8170ff61418805de4.1623274832.git.hubert.jasudowicz@gmail.com
2021-04-27docs: Fix typo in Documentation/x86/x86_64/5level-paging.rstbilbao@vt.edu
fix two typos in the documentation (Documentation/x86/x86_64/5level-paging.rst), changing 'paing' for 'paging' and using the right verbal form for plural on 'some vendors offer'. Signed-off-by: Carlos Bilbao <bilbao@vt.edu> Link: https://lore.kernel.org/r/2599991.mvXUDI8C0e@iron-maiden Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-10-23Merge tag 'docs-5.10-2' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation fixes from Jonathan Corbet: "A handful of late-arriving documentation fixes" * tag 'docs-5.10-2' of git://git.lwn.net/linux: docs: Add two missing entries in vm sysctl index docs/vm: trivial fixes to several spelling mistakes docs: submitting-patches: describe preserving review/test tags Documentation: Chinese translation of Documentation/arm64/hugetlbpage.rst Documentation: x86: fix a missing word in x86_64/mm.rst. docs: driver-api: remove a duplicated index entry docs: lkdtm: Modernize and improve details docs: deprecated.rst: Expand str*cpy() replacement notes docs/cpu-load: format the example code.
2020-10-21Documentation: x86: fix a missing word in x86_64/mm.rst.Wei Lin Chang
This patch adds a missing word in x86/x86_64/mm.rst, without which the note reads awkwardly. Signed-off-by: Wei Lin Chang <r09922117@csie.ntu.edu.tw> Link: https://lore.kernel.org/r/20201015062242.26296-1-r09922117@csie.ntu.edu.tw Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-10-13x86/numa: add 'nohmat' optionDan Williams
Disable parsing of the HMAT for debug, to workaround broken platform instances, or cases where it is otherwise not wanted. [rdunlap@infradead.org: fix build when CONFIG_ACPI is not set] Link: https://lkml.kernel.org/r/70e5ee34-9809-a997-7b49-499e4be61307@infradead.org Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643095540.4062302.732962081968036212.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-04Merge tag 'docs-5.9' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...
2020-07-13Documentation: x86: machinecheck: drop doubled wordsRandy Dunlap
Drop the doubled word "see". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20200703213107.30758-3-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-18Documentation/x86/64: Add documentation for GS/FS addressing modeThomas Gleixner
Explain how the GS/FS based addressing can be utilized in user space applications along with the differences between the generic prctl() based GS/FS base control and the FSGSBASE version available on newer CPUs. Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200528201402.1708239-15-sashal@kernel.org
2020-04-28Documentation: x86: fix space instead of tab in uefi docFlavio Suligoi
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it> Link: https://lore.kernel.org/r/1588080745-21999-1-git-send-email-f.suligoi@asem.it Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-12-30Doc: x86: Fix a typo in mm.rstMasanari Iida
Fix a spelling typo in mm.rst. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Link: https://lore.kernel.org/r/20191226162138.17601-1-standby24x7@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-09-03dma-mapping: fix filename referencesAndy Shevchenko
After commit cf65a0f6f6ff ("dma-mapping: move all DMA mapping code to kernel/dma") some of the files are referring to outdated information, i.e. old file names of DMA mapping sources. Fix it here. Note, the lines with "Glue code for..." have been removed completely. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-07-15docs: cgroup-v1: add it to the admin-guide bookMauro Carvalho Chehab
Those files belong to the admin guide, so add them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-09Merge tag 'docs-5.3' of git://git.lwn.net/linuxLinus Torvalds
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
2019-06-14docs: cgroup-v1: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
Convert the cgroup-v1 files to ReST format, in order to allow a later addition to the admin-guide. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-06-08docs: fix broken documentation linksMauro Carvalho Chehab
Mostly due to x86 and acpi conversion, several documentation links are still pointing to the old file. Fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-10Merge tag 'docs-5.2a' of git://git.lwn.net/linuxLinus Torvalds
Pull more documentation updates from Jonathan Corbet: "Some late arriving documentation changes. In particular, this contains the conversion of the x86 docs to RST, which has been in the works for some time but needed a couple of final tweaks" * tag 'docs-5.2a' of git://git.lwn.net/linux: (29 commits) Documentation: x86: convert x86_64/machinecheck to reST Documentation: x86: convert x86_64/cpu-hotplug-spec to reST Documentation: x86: convert x86_64/fake-numa-for-cpusets to reST Documentation: x86: convert x86_64/5level-paging.txt to reST Documentation: x86: convert x86_64/mm.txt to reST Documentation: x86: convert x86_64/uefi.txt to reST Documentation: x86: convert x86_64/boot-options.txt to reST Documentation: x86: convert i386/IO-APIC.txt to reST Documentation: x86: convert usb-legacy-support.txt to reST Documentation: x86: convert orc-unwinder.txt to reST Documentation: x86: convert resctrl_ui.txt to reST Documentation: x86: convert microcode.txt to reST Documentation: x86: convert pti.txt to reST Documentation: x86: convert amd-memory-encryption.txt to reST Documentation: x86: convert intel_mpx.txt to reST Documentation: x86: convert protection-keys.txt to reST Documentation: x86: convert pat.txt to reST Documentation: x86: convert mtrr.txt to reST Documentation: x86: convert tlb.txt to reST Documentation: x86: convert zero-page.txt to reST ...
2019-05-08Documentation: x86: convert x86_64/machinecheck to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-08Documentation: x86: convert x86_64/cpu-hotplug-spec to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-08Documentation: x86: convert x86_64/fake-numa-for-cpusets to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-08Documentation: x86: convert x86_64/5level-paging.txt to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-08Documentation: x86: convert x86_64/mm.txt to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-08Documentation: x86: convert x86_64/uefi.txt to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-08Documentation: x86: convert x86_64/boot-options.txt to reSTChangbin Du
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-04-16x86/mm: Fix the 56-bit addresses memory map in Documentation/x86/x86_64/mm.txtStephen Kitt
This fixes a PT typo, and the following 56-bit address-space addresses: * the hole extends from 0100000000000000 to feffffffffffffff * the KASAN shadow memory area stops at fffffbffffffffff (see kasan.h) Signed-off-by: Stephen Kitt <steve@sk2.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alex.popov@linux.com Cc: bhe@redhat.com Cc: corbet@lwn.net Cc: kirill.shutemov@linux.intel.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20190415150853.10354-1-steve@sk2.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11x86/dma/amd-gart: Stop resizing dma_debug_entry poolRobin Murphy
dma-debug is now capable of adding new entries to its pool on-demand if the initial preallocation was insufficient, so the IOMMU_LEAK logic no longer needs to explicitly change the pool size. This does lose it the ability to save a couple of megabytes of RAM by reducing the pool size below its default, but it seems unlikely that that is a realistic concern these days (or indeed that anyone is actively debugging AGP drivers' DMA usage any more). Getting rid of dma_debug_resize_entries() will make room for further streamlining in the dma-debug code itself. Removing the call reveals quite a lot of cruft which has been useless for nearly a decade since commit 19c1a6f5764d ("x86 gart: reimplement IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire 'iommu=leak' parameter, which controlled nothing except whether dma_debug_resize_entries() was called or not. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Qian Cai <cai@lca.pw> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-11-06x86/mm: Move LDT remap out of KASLR region on 5-level pagingKirill A. Shutemov
On 5-level paging the LDT remap area is placed in the middle of the KASLR randomization region and it can overlap with the direct mapping, the vmalloc or the vmap area. The LDT mapping is per mm, so it cannot be moved into the P4D page table next to the CPU_ENTRY_AREA without complicating PGD table allocation for 5-level paging. The 4 PGD slot gap just before the direct mapping is reserved for hypervisors, so it cannot be used. Move the direct mapping one slot deeper and use the resulting gap for the LDT remap area. The resulting layout is the same for 4 and 5 level paging. [ tglx: Massaged changelog ] Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: willy@infradead.org Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com
2018-11-01Merge tag 'stackleak-v4.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull stackleak gcc plugin from Kees Cook: "Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin was ported from grsecurity by Alexander Popov. It provides efficient stack content poisoning at syscall exit. This creates a defense against at least two classes of flaws: - Uninitialized stack usage. (We continue to work on improving the compiler to do this in other ways: e.g. unconditional zero init was proposed to GCC and Clang, and more plugin work has started too). - Stack content exposure. By greatly reducing the lifetime of valid stack contents, exposures via either direct read bugs or unknown cache side-channels become much more difficult to exploit. This complements the existing buddy and heap poisoning options, but provides the coverage for stacks. The x86 hooks are included in this series (which have been reviewed by Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already been merged through the arm64 tree (written by Laura Abbott and reviewed by Mark Rutland and Will Deacon). With VLAs having been removed this release, there is no need for alloca() protection, so it has been removed from the plugin" * tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: Drop unneeded stackleak_check_alloca() stackleak: Allow runtime disabling of kernel stack erasing doc: self-protection: Add information about STACKLEAK feature fs/proc: Show STACKLEAK metrics in the /proc file system lkdtm: Add a test for STACKLEAK gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
2018-10-24Merge tag 'docs-4.20' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "This is a fairly typical cycle for documentation. There's some welcome readability improvements for the formatted output, some LICENSES updates including the addition of the ISC license, the removal of the unloved and unmaintained 00-INDEX files, the deprecated APIs document from Kees, more MM docs from Mike Rapoport, and the usual pile of typo fixes and corrections" * tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits) docs: Fix typos in histogram.rst docs: Introduce deprecated APIs list kernel-doc: fix declaration type determination doc: fix a typo in adding-syscalls.rst docs/admin-guide: memory-hotplug: remove table of contents doc: printk-formats: Remove bogus kobject references for device nodes Documentation: preempt-locking: Use better example dm flakey: Document "error_writes" feature docs/completion.txt: Fix a couple of punctuation nits LICENSES: Add ISC license text LICENSES: Add note to CDDL-1.0 license that it should not be used docs/core-api: memory-hotplug: add some details about locking internals docs/core-api: rename memory-hotplug-notifier to memory-hotplug docs: improve readability for people with poorer eyesight yama: clarify ptrace_scope=2 in Yama documentation docs/vm: split memory hotplug notifier description to Documentation/core-api docs: move memory hotplug description into admin-guide/mm doc: Fix acronym "FEKEK" in ecryptfs docs: fix some broken documentation references iommu: Fix passthrough option documentation ...
2018-10-06x86/mm/doc: Enhance the x86-64 virtual memory layout descriptionsIngo Molnar
After the cleanups from Baoquan He, make it even more readable: - Remove the 'bits' area size column: it's pretty pointless and was even wrong for some of the entries. Given that MB, GB, TB, PT are 10, 20, 30 and 40 bits, a "8 TB" size description makes it obvious that it's 43 bits. - Introduce an "offset" column: -------------------------------------------------------------------------------- start addr | offset | end addr | size | VM area description -----------------|------------|------------------|---------|-------------------- ... ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base), this is what limits max physical memory supported. The -120 TB notation makes it obvious where this particular virtual memory region starts: 120 TB down from the top of the 64-bit virtual memory space. Especially the layout of the kernel mappings is a *lot* more obvious when written this way, plus it's much easier to compare it with the size column and understand/check/validate and modify the kernel's layout in the future. - Mark the part from where the 47-bit and 56-bit kernel layouts are 100% identical, this starts at the -512 GB offset and the EFI region. - Re-shuffle the size desciptions to be continous blocks of sizes, instead of the often mixed size. I.e. write "0.5 TB" instead of "512 GB" if we are still in the TB-granular region of the map. - Make the 47-bit and 56-bit descriptions use the *exact* same layout and wording, and only differ where there's a material difference. This makes it easy to compare the two tables side by side by switching between two terminal tabs. - Plus enhance a lot of other stylistic/typographical details: make the tables explicitly tabular, add headers, enhance certain entries, etc. etc. Note that there are some apparent errors in the tables as well, but I'll fix them in a separate patch to make it easier to review/validate. Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-06x86/mm/doc: Clean up the x86-64 virtual memory layout descriptionsBaoquan He
In Documentation/x86/x86_64/mm.txt, the description of the x86-64 virtual memory layout has become a confusing hodgepodge of inconsistencies: - there's a hard to read mixture of 'TB' and 'bits' notation - the entries sometimes mention a size in the description and sometimes not - sometimes they list holes by address, sometimes only as an 'unused hole' line So make it all a coherent, readable, well organized description. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20181006084327.27467-3-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-09Drop all 00-INDEX files from Documentation/Henrik Austad
This is a respin with a wider audience (all that get_maintainer returned) and I know this spams a *lot* of people. Not sure what would be the correct way, so my apologies for ruining your inbox. The 00-INDEX files are supposed to give a summary of all files present in a directory, but these files are horribly out of date and their usefulness is brought into question. Often a simple "ls" would reveal the same information as the filenames are generally quite descriptive as a short introduction to what the file covers (it should not surprise anyone what Documentation/sched/sched-design-CFS.txt covers) A few years back it was mentioned that these files were no longer really needed, and they have since then grown further out of date, so perhaps it is time to just throw them out. A short status yields the following _outdated_ 00-INDEX files, first counter is files listed in 00-INDEX but missing in the directory, last is files present but not listed in 00-INDEX. List of outdated 00-INDEX: Documentation: (4/10) Documentation/sysctl: (0/1) Documentation/timers: (1/0) Documentation/blockdev: (3/1) Documentation/w1/slaves: (0/1) Documentation/locking: (0/1) Documentation/devicetree: (0/5) Documentation/power: (1/1) Documentation/powerpc: (0/5) Documentation/arm: (1/0) Documentation/x86: (0/9) Documentation/x86/x86_64: (1/1) Documentation/scsi: (4/4) Documentation/filesystems: (2/9) Documentation/filesystems/nfs: (0/2) Documentation/cgroup-v1: (0/2) Documentation/kbuild: (0/4) Documentation/spi: (1/0) Documentation/virtual/kvm: (1/0) Documentation/scheduler: (0/2) Documentation/fb: (0/1) Documentation/block: (0/1) Documentation/networking: (6/37) Documentation/vm: (1/3) Then there are 364 subdirectories in Documentation/ with several files that are missing 00-INDEX alltogether (and another 120 with a single file and no 00-INDEX). I don't really have an opinion to whether or not we /should/ have 00-INDEX, but the above 00-INDEX should either be removed or be kept up to date. If we should keep the files, I can try to keep them updated, but I rather not if we just want to delete them anyway. As a starting point, remove all index-files and references to 00-INDEX and see where the discussion is going. Signed-off-by: Henrik Austad <henrik@austad.us> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Just-do-it-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: [Almost everybody else] Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-09-04x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscallsAlexander Popov
The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-13Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Thomas Gleixner: "Early TSC based time stamping to allow better boot time analysis. This comes with a general cleanup of the TSC calibration code which grew warts and duct taping over the years and removes 250 lines of code. Initiated and mostly implemented by Pavel with help from various folks" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/kvmclock: Mark kvm_get_preset_lpj() as __init x86/tsc: Consolidate init code sched/clock: Disable interrupts when calling generic_sched_clock_init() timekeeping: Prevent false warning when persistent clock is not available sched/clock: Close a hole in sched_clock_init() x86/tsc: Make use of tsc_calibrate_cpu_early() x86/tsc: Split native_calibrate_cpu() into early and late parts sched/clock: Use static key for sched_clock_running sched/clock: Enable sched clock early sched/clock: Move sched clock initialization and merge with generic clock x86/tsc: Use TSC as sched clock early x86/tsc: Initialize cyc2ns when tsc frequency is determined x86/tsc: Calibrate tsc only once ARM/time: Remove read_boot_clock64() s390/time: Remove read_boot_clock64() timekeeping: Default boot time offset to local_clock() timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset() s390/time: Add read_persistent_wall_and_boot_offset() x86/xen/time: Output xen sched_clock time from 0 x86/xen/time: Initialize pv xen time in init_hypervisor_platform() ...
2018-07-20x86/tsc: Redefine notsc to behave as tsc=unstablePavel Tatashin
Currently, the notsc kernel parameter disables the use of the TSC by sched_clock(). However, this parameter does not prevent the kernel from accessing tsc in other places. The only rationale to boot with notsc is to avoid timing discrepancies on multi-socket systems where TSC are not properly synchronized, and thus exclude TSC from being used for time keeping. But that prevents using TSC as sched_clock() as well, which is not necessary as the core sched_clock() implementation can handle non synchronized TSC based sched clocks just fine. However, there is another method to solve the above problem: booting with tsc=unstable parameter. This parameter allows sched_clock() to use TSC and just excludes it from timekeeping. So there is no real reason to keep notsc, but for compatibility reasons the parameter has to stay. Make it behave like 'tsc=unstable' instead. [ tglx: Massaged changelog ] Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
2018-07-06x86/numa_emulation: Introduce uniform split capabilityDan Williams
The current NUMA emulation capabilities for splitting System RAM by a fixed size or by a set number of nodes may result in some nodes being larger than others. The implementation prioritizes establishing a minimum usable memory size over satisfying the requested number of NUMA nodes. Introduce a uniform split capability that evenly partitions each physical NUMA node into N emulated nodes. For example numa=fake=3U creates 6 emulated nodes total on a system that has 2 physical nodes. This capability is useful for debugging and evaluating platform memory-side-cache capabilities as described by the ACPI HMAT (see 5.2.27.5 Memory Side Cache Information Structure in ACPI 6.2a) Compare numa=fake=6 that results in only 5 nodes being created against numa=fake=3U which takes the 2 physical nodes and evenly divides them. numa=fake=6 available: 5 nodes (0-4) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 node 0 size: 2648 MB node 0 free: 2443 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 node 1 size: 2672 MB node 1 free: 2442 MB node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 node 2 size: 5291 MB node 2 free: 5278 MB node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 node 3 size: 2677 MB node 3 free: 2665 MB node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 node 4 size: 2676 MB node 4 free: 2663 MB node distances: node 0 1 2 3 4 0: 10 20 10 20 20 1: 20 10 20 10 10 2: 10 20 10 20 20 3: 20 10 20 10 10 4: 20 10 20 10 10 numa=fake=3U available: 6 nodes (0-5) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 node 0 size: 2900 MB node 0 free: 2637 MB node 1 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 node 1 size: 3023 MB node 1 free: 3012 MB node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 node 2 size: 2015 MB node 2 free: 2004 MB node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 node 3 size: 2704 MB node 3 free: 2522 MB node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 node 4 size: 2709 MB node 4 free: 2698 MB node 5 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 node 5 size: 2612 MB node 5 free: 2601 MB node distances: node 0 1 2 3 4 5 0: 10 10 10 20 20 20 1: 10 10 10 20 20 20 2: 10 10 10 20 20 20 3: 20 20 20 10 10 10 4: 20 20 20 10 10 10 5: 20 20 20 10 10 10 Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/153089328617.27680.14930758266174305832.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-28x86/pci-dma: remove the explicit nodac and allowdac optionChristoph Hellwig
This is something drivers should decide (modulo chipset quirks like for VIA), which as far as I can tell is how things have been handled for the last 15 years. Note that we keep the usedac option for now, as it is used in the wild to override the too generic VIA quirk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-28x86/pci-dma: remove the experimental forcesac boot optionChristoph Hellwig
Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying the huge overhead of an IOMMU is rather pointless, and this seriously gets in the way of dma mapping work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-28Documentation/x86: remove a stray reference to pci-nommu.cChristoph Hellwig
This is just the minimal workaround. The file is mostly either stale and/or duplicative of Documentation/admin-guide/kernel-parameters.txt, but that is much more work than I'm willing to do right now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-04-12Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is readyIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-03x86/mm: Fix documentation of module mapping range with 4-level pagingKirill A. Shutemov
Commit: f5a40711fa58 ("x86/mm: Set MODULES_END to 0xffffffffff000000") changed MODULES_END back to a fixed value, but didn't update the documentation of memory layout for 4-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f5a40711fa58 ("x86/mm: Set MODULES_END to 0xffffffffff000000") Link: http://lkml.kernel.org/r/20180402121025.10244-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=yKirill A. Shutemov
All pieces of the puzzle are in place and we can now allow to boot with CONFIG_X86_5LEVEL=y on a machine without LA57 support. Kernel will detect that LA57 is missing and fold p4d at runtime. Update the documentation and the Kconfig option description to reflect the change. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-05x86/kaslr: Fix the vaddr_end messThomas Gleixner
vaddr_end for KASLR is only documented in the KASLR code itself and is adjusted depending on config options. So it's not surprising that a change of the memory layout causes KASLR to have the wrong vaddr_end. This can map arbitrary stuff into other areas causing hard to understand problems. Remove the whole ifdef magic and define the start of the cpu_entry_area to be the end of the KASLR vaddr range. Add documentation to that effect. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com>, Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04x86/mm: Map cpu_entry_area at the same place on 4/5 levelThomas Gleixner
There is no reason for 4 and 5 level pagetables to have a different layout. It just makes determining vaddr_end for KASLR harder than necessary. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com>, Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04x86/mm: Set MODULES_END to 0xffffffffff000000Andrey Ryabinin
Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary. So passing page unaligned address to kasan_populate_zero_shadow() have two possible effects: 1) It may leave one page hole in supposed to be populated area. After commit 21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that hole happens to be in the shadow covering fixmap area and leads to crash: BUG: unable to handle kernel paging request at fffffbffffe8ee04 RIP: 0010:check_memory_region+0x5c/0x190 Call Trace: <NMI> memcpy+0x1f/0x50 ghes_copy_tofrom_phys+0xab/0x180 ghes_read_estatus+0xfb/0x280 ghes_notify_nmi+0x2b2/0x410 nmi_handle+0x115/0x2c0 default_do_nmi+0x57/0x110 do_nmi+0xf8/0x150 end_repeat_nmi+0x1a/0x1e Note, the crash likely disappeared after commit 92a0f81d8957, which changed kasan_populate_zero_shadow() call the way it was before commit 21506525fb8d. 2) Attempt to load module near MODULES_END will fail, because __vmalloc_node_range() called from kasan_module_alloc() will hit the WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error. To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned which means that MODULES_END should be 8*PAGE_SIZE aligned. The whole point of commit f06bdd4001c2 was to move MODULES_END down if NR_CPUS is big, so the cpu_entry_area takes a lot of space. But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") the cpu_entry_area is no longer in fixmap, so we could just set MODULES_END to a fixed 8*PAGE_SIZE aligned address. Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") Reported-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
2017-12-23x86/pti: Put the LDT in its own PGD if PTI is onAndy Lutomirski
With PTI enabled, the LDT must be mapped in the usermode tables somewhere. The LDT is per process, i.e. per mm. An earlier approach mapped the LDT on context switch into a fixmap area, but that's a big overhead and exhausted the fixmap space when NR_CPUS got big. Take advantage of the fact that there is an address space hole which provides a completely unused pgd. Use this pgd to manage per-mm LDT mappings. This has a down side: the LDT isn't (currently) randomized, and an attack that can write the LDT is instant root due to call gates (thanks, AMD, for leaving call gates in AMD64 but designing them wrong so they're only useful for exploits). This can be mitigated by making the LDT read-only or randomizing the mapping, either of which is strightforward on top of this patch. This will significantly slow down LDT users, but that shouldn't matter for important workloads -- the LDT is only used by DOSEMU(2), Wine, and very old libc implementations. [ tglx: Cleaned it up. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>