Age | Commit message (Collapse) | Author |
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
[akpm@linux-foundation.org: cxd2880_common.h needs bits.h for GENMASK()]
[andriy.shevchenko@linux.intel.com: delay.h: fix for removed kernel.h]
Link: https://lkml.kernel.org/r/20211028170143.56523-1-andriy.shevchenko@linux.intel.com
[akpm@linux-foundation.org: include/linux/fwnode.h needs bits.h for BIT()]
Link: https://lkml.kernel.org/r/20211027150324.79827-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Let's support multiple registered callbacks, making sure that
registering vmcore callbacks cannot fail. Make the callback return a
bool instead of an int, handling how to deal with errors internally.
Drop unused HAVE_OLDMEM_PFN_IS_RAM.
We soon want to make use of this infrastructure from other drivers:
virtio-mem, registering one callback for each virtio-mem device, to
prevent reading unplugged virtio-mem memory.
Handle it via a generic vmcore_cb structure, prepared for future
extensions: for example, once we support virtio-mem on s390x where the
vmcore is completely constructed in the second kernel, we want to detect
and add plugged virtio-mem memory ranges to the vmcore in order for them
to get dumped properly.
Handle corner cases that are unexpected and shouldn't happen in sane
setups: registering a callback after the vmcore has already been opened
(warn only) and unregistering a callback after the vmcore has already been
opened (warn and essentially read only zeroes from that point on).
Link: https://lkml.kernel.org/r/20211005121430.30136-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
HVMOP_get_mem_type is not expected to fail, "This call failing is
indication of something going quite wrong and it would be good to know
about this." [1]
Let's add a pr_warn_once().
Link: https://lkml.kernel.org/r/3b935aa0-6d85-0bcd-100e-15098add3c4c@oracle.com [1]
Link: https://lkml.kernel.org/r/20211005121430.30136-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Let's simplify return handling.
Link: https://lkml.kernel.org/r/20211005121430.30136-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem,
this series tackles the last sane way how a VM could accidentially
access logically unplugged memory managed by a virtio-mem device:
/proc/vmcore
When dumping memory via "makedumpfile", PG_offline pages, used by
virtio-mem to flag logically unplugged memory, are already properly
excluded; however, especially when accessing/copying /proc/vmcore "the
usual way", we can still end up reading logically unplugged memory part
of a virtio-mem device.
Patch #1-#3 are cleanups. Patch #4 extends the existing
oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings
for patch #8, which implements the virtio-mem logic to query the state
of device blocks.
Patch #8:
"Although virtio-mem currently supports reading unplugged memory in the
hypervisor, this will change in the future, indicated to the device
via a new feature flag. We similarly sanitized /proc/kcore access
recently.
[...]
Distributions that support virtio-mem+kdump have to make sure that the
virtio_mem module will be part of the kdump kernel or the kdump
initrd; dracut was recently [2] extended to include virtio-mem in the
generated initrd. As long as no special kdump kernels are used, this
will automatically make sure that virtio-mem will be around in the
kdump initrd and sanitize /proc/vmcore access -- with dracut"
This is the last remaining bit to support
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of
virtio-mem.
Note: this is best-effort. We'll never be able to control what runs
inside the second kernel, really, but we also don't have to care: we
only care about sane setups where we don't want our VM getting zapped
once we touch the wrong memory location while dumping. While we usually
expect sane setups to use "makedumfile", nothing really speaks against
just copying /proc/vmcore, especially in environments where HWpoisioning
isn't typically expected. Also, we really don't want to put all our
trust completely on the memmap, so sanitizing also makes sense when just
using "makedumpfile".
[1] https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com
[2] https://github.com/dracutdevs/dracut/pull/1157
[3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html
This patch (of 9):
The callback is only used for the vmcore nowadays.
Link: https://lkml.kernel.org/r/20211005121430.30136-1-david@redhat.com
Link: https://lkml.kernel.org/r/20211005121430.30136-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has served its purpose and is no longer used. All usercopy
violations appear to have been handled by now, any remaining instances
(or new bugs) will cause copies to be rejected.
This isn't a direct revert of commit 2d891fbc3bb6 ("usercopy: Allow
strict enforcement of whitelists"); since usercopy_fallback is
effectively 0, the fallback handling is removed too.
This also removes the usercopy_fallback module parameter on slab_common.
Link: https://github.com/KSPP/linux/issues/153
Link: https://lkml.kernel.org/r/20210921061149.1091163-1-steve@sk2.org
Signed-off-by: Stephen Kitt <steve@sk2.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Stanley <joel@jms.id.au> [defconfig change]
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E . Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We want to specify flags when hotplugging memory. Let's prepare to pass
flags to memblock_add_node() by adjusting all existing users.
Note that when hotplugging memory the system is already up and running
and we might have concurrent memblock users: for example, while we're
hotplugging memory, kexec_file code might search for suitable memory
regions to place kexec images. It's important to add the memory
directly to memblock via a single call with the right flags, instead of
adding the memory first and apply flags later: otherwise, concurrent
memblock users might temporarily stumble over memblocks with wrong
flags, which will be important in a follow-up patch that introduces a
new flag to properly handle add_memory_driver_managed().
Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shahab Vahedi <shahab@synopsys.com> [arch/arc]
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jianyong Wu <Jianyong.Wu@arm.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CONFIG_MEMORY_HOTPLUG was marked BROKEN over one year and we just
restricted it to 64 bit. Let's remove the unused x86 32bit
implementation and simplify the Kconfig.
Link: https://lkml.kernel.org/r/20210929143600.49379-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.
Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org> [kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
allocation
We can specify the number of hugepages to allocate at boot. But the
hugepages is balanced in all nodes at present. In some scenarios, we
only need hugepages in one node. For example: DPDK needs hugepages
which are in the same node as NIC.
If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline. But only four
hugepages are used. The others should be free after boot. If the
system memory is low(for example: 64G), it will be an impossible task.
So extend the hugepages parameter to support specifying hugepages on a
specific node. For example add following parameter:
hugepagesz=1G hugepages=0:1,1:3
It will allocate 1 hugepage in node0 and 3 hugepages in node1.
Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()
The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.
@@
identifier vaddr;
expression size;
@@
(
- memblock_phys_free(__pa(vaddr), size);
+ memblock_free(vaddr, size);
|
- memblock_free_ptr(vaddr, size);
+ memblock_free(vaddr, size);
)
[sfr@canb.auug.org.au: fixup]
Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au
Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().
The callers are updated with the below semantic patch:
@@
expression addr;
expression size;
@@
- memblock_free(addr, size);
+ memblock_phys_free(addr, size);
Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
memblock_free_early_nid() is unused and memblock_free_early() is an
alias for memblock_free().
Replace calls to memblock_free_early() with calls to memblock_free() and
remove memblock_free_early() and memblock_free_early_nid().
Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
free_p2m_page() wrongly passes a virtual pointer to memblock_free() that
treats it as a physical address.
Call memblock_free_ptr() instead that gets a virtual address to free the
memory.
Link: https://lkml.kernel.org/r/20210930185031.18648-3-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The generic version of arch_is_kernel_initmem_freed() now does the same
as s390 version.
Remove the s390 version.
Link: https://lkml.kernel.org/r/b6feb5dfe611a322de482762fc2df3a9eece70c7.1633001016.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The generic version of arch_is_kernel_initmem_freed() now does the same
as powerpc version.
Remove the powerpc version.
Link: https://lkml.kernel.org/r/c53764eb45d41491e2b21da2e7812239897dbebb.1633001016.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK the kernel crashes:
Unable to handle kernel paging request at virtual address ffff7000028f2000
...
swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
[ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
Hardware name: linux,dummy-virt (DT)
pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
pc : kasan_check_range+0x90/0x1a0
lr : memcpy+0x88/0xf4
sp : ffff80001378fe20
...
Call trace:
kasan_check_range+0x90/0x1a0
pcpu_page_first_chunk+0x3f0/0x568
setup_per_cpu_areas+0xb8/0x184
start_kernel+0x8c/0x328
The vm area used in vm_area_register_early() has no kasan shadow memory,
Let's add a new kasan_populate_early_vm_area_shadow() function to
populate the vm area shadow memory to fix the issue.
[wangkefeng.wang@huawei.com: fix redefinition of 'kasan_populate_early_vm_area_shadow']
Link: https://lkml.kernel.org/r/20211011123211.3936196-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20210910053354.26721-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Marco Elver <elver@google.com> [KASAN]
Acked-by: Andrey Konovalov <andreyknvl@gmail.com> [KASAN]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,
percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000
percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000
percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000
then we could get
WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838
and the system could not boot successfully.
Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.
Link: https://lkml.kernel.org/r/20210910053354.26721-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull kvm fixes from Paolo Bonzini:
- Fixes for s390 interrupt delivery
- Fixes for Xen emulator bugs showing up as debug kernel WARNs
- Fix another issue with SEV/ES string I/O VMGEXITs
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Take srcu lock in post_kvm_run_save()
KVM: SEV-ES: fix another issue with string I/O VMGEXITs
KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()
KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
KVM: s390: clear kicked_mask before sleeping again
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"These are pretty late, but they do fix concrete issues.
- ensure the trap vector's address is aligned.
- avoid re-populating the KASAN shadow memory.
- allow kasan to build without warnings, which have recently become
errors"
* tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix asan-stack clang build
riscv: Do not re-populate shadow memory with kasan_populate_early_shadow
riscv: fix misalgned trap vector base address
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Three commits fixing some issues introduced with the recent IOMMU
changes we merged.
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
powerpc/pseries/iommu: Check if the default window in use before removing it
powerpc/pseries/iommu: Use correct vfree for it_map
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a build-time warning in x86/sm4"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/sm4 - Fix invalid section entry size
|
|
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
When calling this function, all the shadow memory is already populated
with kasan_early_shadow_pte which has PAGE_KERNEL protection.
kasan_populate_early_shadow write-protects the mapping of the range
of addresses passed in argument in zero_pte_populate, which actually
write-protects all the shadow memory mapping since kasan_early_shadow_pte
is used for all the shadow memory at this point. And then when using
memblock API to populate the shadow memory, the first write access to the
kernel stack triggers a trap. This becomes visible with the next commit
that contains a fix for asan-stack.
We already manually populate all the shadow memory in kasan_early_init
and we write-protect kasan_early_shadow_pte at the end of kasan_init
which makes the calls to kasan_populate_early_shadow superfluous so
we can remove them.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support")
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from WiFi (mac80211), and BPF.
Current release - regressions:
- skb_expand_head: adjust skb->truesize to fix socket memory
accounting
- mptcp: fix corrupt receiver key in MPC + data + checksum
Previous releases - regressions:
- multicast: calculate csum of looped-back and forwarded packets
- cgroup: fix memory leak caused by missing cgroup_bpf_offline
- cfg80211: fix management registrations locking, prevent list
corruption
- cfg80211: correct false positive in bridge/4addr mode check
- tcp_bpf: fix race in the tcp_bpf_send_verdict resulting in reusing
previous verdict
Previous releases - always broken:
- sctp: enhancements for the verification tag, prevent attackers from
killing SCTP sessions
- tipc: fix size validations for the MSG_CRYPTO type
- mac80211: mesh: fix HE operation element length check, prevent out
of bound access
- tls: fix sign of socket errors, prevent positive error codes being
reported from read()/write()
- cfg80211: scan: extend RCU protection in
cfg80211_add_nontrans_list()
- implement ->sock_is_readable() for UDP and AF_UNIX, fix poll() for
sockets in a BPF sockmap
- bpf: fix potential race in tail call compatibility check resulting
in two operations which would make the map incompatible succeeding
- bpf: prevent increasing bpf_jit_limit above max
- bpf: fix error usage of map_fd and fdget() in generic batch update
- phy: ethtool: lock the phy for consistency of results
- prevent infinite while loop in skb_tx_hash() when Tx races with
driver reconfiguring the queue <> traffic class mapping
- usbnet: fixes for bad HW conjured by syzbot
- xen: stop tx queues during live migration, prevent UAF
- net-sysfs: initialize uid and gid before calling
net_ns_get_ownership
- mlxsw: prevent Rx stalls under memory pressure"
* tag 'net-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
Revert "net: hns3: fix pause config problem after autoneg disabled"
mptcp: fix corrupt receiver key in MPC + data + checksum
riscv, bpf: Fix potential NULL dereference
octeontx2-af: Fix possible null pointer dereference.
octeontx2-af: Display all enabled PF VF rsrc_alloc entries.
octeontx2-af: Check whether ipolicers exists
net: ethernet: microchip: lan743x: Fix skb allocation failure
net/tls: Fix flipped sign in async_wait.err assignment
net/tls: Fix flipped sign in tls_err_abort() calls
net/smc: Correct spelling mistake to TCPF_SYN_RECV
net/smc: Fix smc_link->llc_testlink_time overflow
nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
vmxnet3: do not stop tx queues after netif_device_detach()
r8169: Add device 10ec:8162 to driver r8169
ptp: Document the PTP_CLK_MAGIC ioctl number
usbnet: fix error return code in usbnet_probe()
net: hns3: adjust string spaces of some parameters of tx bd info in debugfs
net: hns3: expand buffer len for some debugfs command
net: hns3: add more string spaces for dumping packets number of queue info in debugfs
net: hns3: fix data endian problem of some functions of debugfs
...
|
|
The bpf_jit_binary_free() function requires a non-NULL argument. When
the RISC-V BPF JIT fails to converge in NR_JIT_ITERATIONS steps,
jit_data->header will be NULL, which triggers a NULL
dereference. Avoid this by checking the argument, prior calling the
function.
Fixes: ca6cb5447cec ("riscv, bpf: Factor common RISC-V JIT code")
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20211028125115.514587-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Xen interrupt injection for event channels relies on accessing the
guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
gfn_to_hva_cache.
This requires the srcu lock to be held, which is mostly the case except
for this code path:
[ 11.822877] WARNING: suspicious RCU usage
[ 11.822965] -----------------------------
[ 11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
[ 11.823131]
[ 11.823131] other info that might help us debug this:
[ 11.823131]
[ 11.823196]
[ 11.823196] rcu_scheduler_active = 2, debug_locks = 1
[ 11.823253] 1 lock held by dom:0/90:
[ 11.823292] #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
[ 11.823379]
[ 11.823379] stack backtrace:
[ 11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
[ 11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 11.823612] Call Trace:
[ 11.823645] dump_stack+0x7a/0xa5
[ 11.823681] lockdep_rcu_suspicious+0xc5/0x100
[ 11.823726] __kvm_xen_has_interrupt+0x179/0x190
[ 11.823773] kvm_cpu_has_extint+0x6d/0x90
[ 11.823813] kvm_cpu_accept_dm_intr+0xd/0x40
[ 11.823853] kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
< post_kvm_run_save() inlined here >
[ 11.823906] kvm_arch_vcpu_ioctl_run+0x135/0x6a0
[ 11.823947] kvm_vcpu_ioctl+0x263/0x680
Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The trap vector marked by label .Lsecondary_park must align on a
4-byte boundary, as the {m,s}tvec is defined to require 4-byte
alignment.
Signed-off-by: Chen Lu <181250012@smail.nju.edu.cn>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Fixes: e011995e826f ("RISC-V: Move relocate and few other functions out of __init")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull nds32 tracing fix from Steven Rostedt:
"Fix nds32le build when DYNAMIC_FTRACE is disabled
A randconfig found that nds32le architecture fails to build due to a
prototype mismatch between a ftrace function pointer and the function
it was to be assigned to. That function pointer prototype missed being
updated when all the ftrace callbacks were updated"
* tag 'trace-v5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/nds32: Update the proto for ftrace_trace_function to match ftrace_stub
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 fix from Dinh Nguyen:
"Fix a build error for allmodconfig"
* tag 'nios2_fixes_for_v5.15_part3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Make NIOS2_DTB_SOURCE_BOOL depend on !COMPILE_TEST
|
|
The ftrace callback prototype was changed to pass a special ftrace_regs
instead of pt_regs as the last parameter, but the static ftrace for nds32
missed updating ftrace_trace_function and this caused a warning when
compared to ftrace_stub:
../arch/nds32/kernel/ftrace.c: In function '_mcount':
../arch/nds32/kernel/ftrace.c:24:35: error: comparison of distinct pointer types lacks a cast [-Werror]
24 | if (ftrace_trace_function != ftrace_stub)
| ^~
Link: https://lore.kernel.org/all/20211027055554.19372-1-rdunlap@infradead.org/
Link: https://lkml.kernel.org/r/20211027125101.33449969@gandalf.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Fixes: d19ad0775dcd6 ("ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
If the guest requests string I/O from the hypervisor via VMGEXIT,
SW_EXITINFO2 will contain the REP count. However, sev_es_string_io
was incorrectly treating it as the size of the GHCB buffer in
bytes.
This fixes the "outsw" test in the experimental SEV tests of
kvm-unit-tests.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Marc Orr <marcorr@google.com>
Tested-by: Marc Orr <marcorr@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
nios2:allmodconfig builds fail with
make[1]: *** No rule to make target 'arch/nios2/boot/dts/""',
needed by 'arch/nios2/boot/dts/built-in.a'. Stop.
make: [Makefile:1868: arch/nios2/boot/dts] Error 2 (ignored)
This is seen with compile tests since those enable NIOS2_DTB_SOURCE_BOOL,
which in turn enables NIOS2_DTB_SOURCE. This causes the build error
because the default value for NIOS2_DTB_SOURCE is an empty string.
Disable NIOS2_DTB_SOURCE_BOOL for compile tests to avoid the error.
Fixes: 2fc8483fdcde ("nios2: Build infrastructure")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"One last set of small fixes for the soc tree:
- Incorrect ethernet phy settings found on i.mx and allwinner
platforms
- a revert for a Qualcomm DT change that caused a boot regression
- four patches for incorrect settings in i.MX DT files
- new MAINTAINER file entries for dhcom boards
- a Kconfig fix for a reset driver that became unselectable
- three more code changes for bugs in reset drivers"
* tag 'arm-soc-fixes-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: Add maintainers for DHCOM i.MX6 and DHCOM/DHCOR STM32MP1
Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target"
arm64: dts: imx8mm-kontron: Fix connection type for VSC8531 RGMII PHY
arm64: dts: imx8mm-kontron: Fix CAN SPI clock frequency
arm64: dts: imx8mm-kontron: Fix polarity of reg_rst_eth2
arm64: dts: imx8mm-kontron: Set lower limit of VDD_SNVS to 800 mV
arm64: dts: imx8mm-kontron: Make sure SOC and DRAM supply voltages are correct
reset: socfpga: add empty driver allowing consumers to probe
reset: tegra-bpmp: Handle errors in BPMP response
reset: pistachio: Re-enable driver selection
reset: brcmstb-rescal: fix incorrect polarity of status bit
ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode
arm64: dts: allwinner: h5: NanoPI Neo 2: Fix ethernet node
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-10-26
We've added 12 non-merge commits during the last 7 day(s) which contain
a total of 23 files changed, 118 insertions(+), 98 deletions(-).
The main changes are:
1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen.
2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang.
3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai.
4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer.
5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun.
6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian.
7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix potential race in tail call compatibility check
bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET
selftests/bpf: Use recv_timeout() instead of retries
net: Implement ->sock_is_readable() for UDP and AF_UNIX
skmsg: Extract and reuse sk_msg_is_readable()
net: Rename ->stream_memory_read to ->sock_is_readable
tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
cgroup: Fix memory leak caused by missing cgroup_bpf_offline
bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()
bpf: Prevent increasing bpf_jit_limit above max
bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
bpf: Define bpf_jit_alloc_exec_limit for riscv JIT
====================
Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
Qualcomm ARM64 DTS one more fix for 5.15
This reverts a clock change in the Qualcomm RB5 devicetree which in some
combinations of firmware and configuration causes the device to crash
during boot.
Data on an adjacent platform indicates that this is probably not be the
root cause of the problem, but this resolves the regression seen on RB5
and will allow the SM8250 platform to boot v5.15.
* tag 'qcom-arm64-fixes-for-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target"
Link: https://lore.kernel.org/r/20211025201213.1145348-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
sm8250 target"
This reverts commit 001ce9785c0674d913531345e86222c965fc8bf4.
This upstream commit broke AOSP (post Android 12 merge) build
on RB5. The device either silently crashes into USB crash mode
after android boot animation or we see a blank blue screen
with following dpu errors in dmesg:
[ T444] hw recovery is not complete for ctl:3
[ T444] [drm:dpu_encoder_phys_vid_prepare_for_kickoff:539] [dpu error]enc31 intf1 ctl 3 reset failure: -22
[ T444] [drm:dpu_encoder_phys_vid_wait_for_commit_done:513] [dpu error]vblank timeout
[ T444] [drm:dpu_kms_wait_for_commit_done:454] [dpu error]wait for commit done returned -110
[ C7] [drm:dpu_encoder_frame_done_timeout:2127] [dpu error]enc31 frame done timeout
[ T444] [drm:dpu_encoder_phys_vid_wait_for_commit_done:513] [dpu error]vblank timeout
[ T444] [drm:dpu_kms_wait_for_commit_done:454] [dpu error]wait for commit done returned -110
Fixes: 001ce9785c06 ("arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target")
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211014135410.4136412-1-dmitry.baryshkov@linaro.org
|
|
Pull ARM fixes from Russell King:
- Fix clang-related relocation warning in futex code
- Fix incorrect use of get_kernel_nofault()
- Fix bad code generation in __get_user_check() when kasan is enabled
- Ensure TLB function table is correctly aligned
- Remove duplicated string function definitions in decompressor
- Fix link-time orphan section warnings
- Fix old-style function prototype for arch_init_kprobes()
- Only warn about XIP address when not compile testing
- Handle BE32 big endian for keystone2 remapping
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.S
ARM: 9141/1: only warn about XIP address when not compile testing
ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype
ARM: 9138/1: fix link warning with XIP + frame-pointer
ARM: 9134/1: remove duplicate memcpy() definition
ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned
ARM: 9132/1: Fix __get_user_check failure with ARM KASAN images
ARM: 9125/1: fix incorrect use of get_kernel_nofault()
ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
|
|
In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
making a final check whether the vCPU should be woken from HLT by any
incoming interrupt.
This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
really shouldn't be sleeping when the task state has already been set.
I think it's actually harmless as it would just manifest itself as a
spurious wakeup, but it's causing a debug warning:
[ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
Fix the warning by turning it into an *explicit* spurious wakeup. When
invoked with !task_is_running(current) (and we might as well add
in_atomic() there while we're at it), just return 1 to indicate that
an IRQ is pending, which will cause a wakeup and then something will
call it again in a context that *can* sleep so it can fault the page
back in.
Cc: stable@vger.kernel.org
Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes for interrupt delivery
Two bugs that might result in CPUs not woken up when interrupts are
pending.
|
|
On the preemption path when updating a Xen guest's runstate times, this
lock is taken inside the scheduler rq->lock, which is a raw spinlock.
This was shown in a lockdep warning:
[ 89.138354] =============================
[ 89.138356] [ BUG: Invalid wait context ]
[ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E
[ 89.138360] -----------------------------
[ 89.138361] xen_shinfo_test/2575 is trying to lock:
[ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
[ 89.138442] other info that might help us debug this:
[ 89.138444] context-{5:5}
[ 89.138445] 4 locks held by xen_shinfo_test/2575:
[ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
[ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
[ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
[ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
...
[ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm]
[ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm]
[ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
[ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm]
[ 89.138870] kvm_sched_out+0x2f/0x40 [kvm]
[ 89.138900] __schedule+0x5de/0xbd0
Cc: stable@vger.kernel.org
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
My intel-ixp42x-welltech-epbx100 no longer boot since 4.14.
This is due to commit 463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel
mapping regression")
which forgot to handle CONFIG_CPU_ENDIAN_BE32 as possible BE config.
Suggested-by: Krzysztof Hałasa <khalasa@piap.pl>
Fixes: 463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel mapping regression")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
The iommu_init_table() helper takes an address range to reserve in
the IOMMU table being initialized to exclude MMIO addresses, this is
useful if the window stretches far beyond 4GB (although wastes some TCEs).
At the moment the code searches for such MMIO32 range and fails if none
found which is considered a problem while it really is not: it is actually
better as this says there is no MMIO32 to reserve and we can use
usually wasted TCEs. Furthermore PHYP never actually allows creating
windows starting at busaddress=0 so this MMIO32 range is never useful.
This removes error exit and initializes the table with zero range if
no MMIO32 is detected.
Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-5-aik@ozlabs.ru
|
|
At the moment this check is performed after we remove the default window
which is late and disallows to revert whatever changes enable_ddw()
has made to DMA windows.
This moves the check and error exit before removing the window.
This raised the message severity from "debug" to "warning" as this
should not happen in practice and cannot be triggered by the userspace.
Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-4-aik@ozlabs.ru
|
|
The it_map array is vzalloc'ed so use vfree() for it when creating
a huge DMA window failed for whatever reason.
While at this, write zero to it_map.
Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-3-aik@ozlabs.ru
|
|
Expose the maximum amount of useable memory from the arm64 JIT.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211014142554.53120-3-lmb@cloudflare.com
|
|
Expose the maximum amount of useable memory from the riscv JIT.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Luke Nelson <luke.r.nels@gmail.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20211014142554.53120-2-lmb@cloudflare.com
|
|
Pull more x86 kvm fixes from Paolo Bonzini:
- Cache coherency fix for SEV live migration
- Fix for instruction emulation with PKU
- fixes for rare delaying of interrupt delivery
- fix for SEV-ES buffer overflow
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
KVM: SEV-ES: keep INS functions together
KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
KVM: x86: split the two parts of emulator_pio_in
KVM: SEV-ES: clean up kvm_sev_es_ins/outs
KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
KVM: SEV-ES: rename guest_ins_data to sev_pio_data
KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
KVM: MMU: Reset mmu->pkru_mask to avoid stale data
KVM: nVMX: promptly process interrupts delivered while in guest mode
KVM: x86: check for interrupts before deciding whether to exit the fast path
|
|
The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.
For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.
For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Make the diff a little nicer when we actually get to fixing
the bug. No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|