Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga into char-misc-next
Moritz writes:
FPGA Manager changes for 5.8
Here's the first set of changes for the 5.8-rc1 merge window.
Dominic's change adds support for accessing AFU regions with gdb.
Gustavo's change is a cleanup patch regarding variable lenght arrays.
Richard's changes update dt-bindings and add support for stratix and agilex.
Sergiu's changes update spi transfers with the new delay field.
Xu's change addresses an issue with a wrong return value.
Shubhrajyoti's change makes the Zynq FPGA driver return -EPROBE_DEFER on
check of devm_clk_get failure.
Xu's change for DFL enables multiple opens.
All of these patches have been reviewed, have appropriate Acked-by's and
have been in the last few linux-next releases without issues.
Signed-off-by: Moritz Fischer <mdf@kernel.org>
* tag 'fpga-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga:
fpga: dfl: afu: support debug access to memory-mapped afu regions
fpga: dfl.h: Replace zero-length array with flexible-array member
arm64: dts: agilex: correct service layer driver's compatible value
dt-bindings, firmware: add compatible value Intel Stratix10 service layer binding
fpga: stratix10-soc: add compatible property value for intel agilex
arm64: dts: agilex: correct FPGA manager driver's compatible value
dt-bindings: fpga: add compatible value to Stratix10 SoC FPGA manager binding
fpga: machxo2-spi: Use new structure for SPI transfer delays
fpga: ice40-spi: Use new structure for SPI transfer delays
fpga: dfl: support multiple opens on feature device node.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Ensure that direct mapping alias is always flushed when changing
page attributes. The optimization for small ranges failed to do so
when the virtual address was in the vmalloc or module space.
- Unbreak the trace event registration for syscalls without arguments
caused by the refactoring of the SYSCALL_DEFINE0() macro.
- Move the printk in the TSC deadline timer code to a place where it
is guaranteed to only be called once during boot and cannot be
rearmed by clearing warn_once after boot. If it's invoked post boot
then lockdep rightfully complains about a potential deadlock as the
calling context is different.
- A series of fixes for objtool and the ORC unwinder addressing
variety of small issues:
- Stack offset tracking for indirect CFAs in objtool ignored
subsequent pushs and pops
- Repair the unwind hints in the register clearing entry ASM code
- Make the unwinding in the low level exit to usermode code stop
after switching to the trampoline stack. The unwind hint is no
longer valid and the ORC unwinder emits a warning as it can't
find the registers anymore.
- Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
which caused objtool to generate bogus ORC data.
- Prevent unwinder warnings when dumping the stack of a
non-current task as there is no way to be sure about the
validity because the dumped stack can be a moving target.
- Make the ORC unwinder behave the same way as the frame pointer
unwinder when dumping an inactive tasks stack and do not skip
the first frame.
- Prevent ORC unwinding before ORC data has been initialized
- Immediately terminate unwinding when a unknown ORC entry type
is found.
- Prevent premature stop of the unwinder caused by IRET frames.
- Fix another infinite loop in objtool caused by a negative
offset which was not catched.
- Address a few build warnings in the ORC unwinder and add
missing static/ro_after_init annotations"
* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
x86/apic: Move TSC deadline timer debug printk
ftrace/x86: Fix trace event registration for syscalls without arguments
x86/mm/cpa: Flush direct map alias during cpa
objtool: Fix infinite loop in for_offset_range()
x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
x86/unwind/orc: Fix error path for bad ORC entry type
x86/unwind/orc: Prevent unwinding before ORC initialization
x86/unwind/orc: Don't skip the first frame for inactive tasks
x86/unwind: Prevent false warnings for non-current tasks
x86/unwind/orc: Convert global variables to static
x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
x86/entry/64: Fix unwind hints in __switch_to_asm()
x86/entry/64: Fix unwind hints in kernel exit path
x86/entry/64: Fix unwind hints in register clearing code
objtool: Fix stack offset tracking for indirect CFAs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single fix for the fallout of the recent futex uacess rework.
With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser()
correctly and emits a 'maybe unitialized' warning. While we usually
ignore compiler stupidity the conditional store is pointless anyway
because the correct case has to store. For the fault case the extra
store does no harm"
* tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM: futex: Address build warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"A smattering of fixes and cleanups:
- Dead code removal.
- Exporting riscv_cpuid_to_hartid_mask for modules.
- Per-CPU tracking of ISA features.
- Setting max_pfn correctly when probing memory.
- Adding a note to the VDSO so glibc can check the kernel's version
without a uname().
- A fix to force the bootloader to initialize the boot spin tables,
which still get used as a fallback when SBI-0.1 is enabled"
* tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Remove unused code from STRICT_KERNEL_RWX
riscv: force __cpu_up_ variables to put in data section
riscv: add Linux note to vdso
riscv: set max_pfn to the PFN of the last page
RISC-V: Remove N-extension related defines
RISC-V: Add bitmap reprensenting ISA features common across CPUs
RISC-V: Export riscv_cpuid_to_hartid_mask() API
|
|
Merge misc fixes from Andrew Morton:
"14 fixes and one selftest to verify the ipc fixes herein"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: limit boost_watermark on small zones
ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
epoll: atomically remove wait entry on wake up
kselftests: introduce new epoll60 testcase for catching lost wakeups
percpu: make pcpu_alloc() aware of current gfp context
mm/slub: fix incorrect interpretation of s->offset
scripts/gdb: repair rb_first() and rb_last()
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
scripts/decodecode: fix trapping instruction formatting
kernel/kcov.c: fix typos in kcov_remote_start documentation
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
mm, memcg: fix error return value of mem_cgroup_css_alloc()
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
|
|
When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().
Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory(). As the
original coding of this call was correct, revert the change made by that
commit.
Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Avoid potential NULL dereference in huge_pte_alloc() on pmd_alloc()
failure"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hugetlb: avoid potential NULL dereference
|
|
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, mostly for ARM and AMD, and more documentation.
Slightly bigger than usual because I couldn't send out what was
pending for rc4, but there is nothing worrisome going on. I have more
fixes pending for guest debugging support (gdbstub) but I will send
them next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM: selftests: Fix build for evmcs.h
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
docs/virt/kvm: Document configuring and running nested guests
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
KVM: x86: Fixes posted interrupt check for IRQs delivery modes
KVM: SVM: fill in kvm_run->debug.arch.dr[67]
KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
KVM: arm64: Fix 32bit PC wrap-around
KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM: arm64: Save/restore sp_el0 as part of __guest_enter
KVM: arm64: Delete duplicated label in invalid_vector
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
...
|
|
The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may
pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL:
| CC arch/arm64/mm/pageattr.o
| CC arch/arm64/mm/hugetlbpage.o
| from arch/arm64/mm/hugetlbpage.c:10:
| arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’:
| ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference]
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
| |arch/arm64/mm/hugetlbpage.c:232:10:
| |./arch/arm64/include/asm/pgtable-types.h:28:24:
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
This can only occur when the kernel cannot allocate a page, and so is
unlikely to happen in practice before other systems start failing.
We can avoid this by bailing out if pmd_alloc() fails, as we do earlier
in the function if pud_alloc() fails.
Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Kyrill Tkachov <kyrylo.tkachov@arm.com>
Cc: <stable@vger.kernel.org> # 4.5.x-
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Stephen reported the following build warning on a ARM multi_v7_defconfig
build with GCC 9.2.1:
kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
1676 | return oldval == cmparg;
| ~~~~~~~^~~~~~~~~
kernel/futex.c:1652:6: note: 'oldval' was declared here
1652 | int oldval, ret;
| ^~~~~~
introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser()
calling conventions change").
While that change should not make any difference it confuses GCC which
fails to work out that oldval is not referenced when the return value is
not zero.
GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the
early return, the issue is with the assembly macros. GCC fails to detect
that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT
which makes oldval uninteresting. The store to the callsite supplied oldval
pointer is conditional on ret == 0.
The straight forward way to solve this is to make the store unconditional.
Aside of addressing the build warning this makes sense anyway because it
removes the conditional from the fastpath. In the error case the stored
value is uninteresting and the extra store does not matter at all.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a potential scheduling latency problem for the algorithms
used by WireGuard"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arch/nhpoly1305 - process in explicit 4k chunks
crypto: arch/lib - limit simd usage to 4k chunks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix for running nested uner z/VM
There are circumstances when running nested under z/VM that would trigger a
WARN_ON_ONCE. Remove the WARN_ON_ONCE. Long term we certainly want to make this
code more robust and flexible, but just returning instead of WARNING makes
guest bootable again.
|
|
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
as supported. My wild guess is that userspaces like QEMU are using "#ifdef
KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
wrong because the compilation host may not be the runtime host.
The userspace might still want to keep the old "#ifdef" though to not break the
guest debug on old kernels.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200505154750.126300-1-peterx@redhat.com>
[Do the same for PPC and s390. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Using CPUID data can be useful for the processor compatibility
check, but that's it. Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives: in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.
Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU")
Reported-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the
RSB has always clobbered RFLAGS, its current incarnation just happens
clear CF and ZF in the processs. Relying on the macro to clear CF and
ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation:
Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
that the ZF flag is always set.
Reported-by: Qian Cai <cai@lca.pw>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This patch removes the unused functions set_kernel_text_rw/ro.
Currently, it is not being invoked from anywhere and no other architecture
(except arm) uses this code. Even in ARM, these functions are not invoked
from anywhere currently.
Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support")
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
In LPAR we will only get an intercept for FC==3 for the PQAP
instruction. Running nested under z/VM can result in other intercepts as
well as ECA_APIE is an effective bit: If one hypervisor layer has
turned this bit off, the end result will be that we will get intercepts for
all function codes. Usually the first one will be a query like PQAP(QCI).
So the WARN_ON_ONCE is not right. Let us simply remove it.
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.3+
Fixes: e5282de93105 ("s390: ap: kvm: add PQAP interception for AQIC")
Link: https://lore.kernel.org/kvm/20200505083515.2720-1-borntraeger@de.ibm.com
Reported-by: Qian Cai <cailca@icloud.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
|
Put __cpu_up_stack_pointer and __cpu_up_task_pointer in data section.
Currently, these two variables are put in bss section, there is a
potential risk that secondary harts get the uninitialized value before
main hart finishing the bss clearing. In this case, all secondary
harts would pass the waiting loop and enable the MMU before main hart
set up the page table.
This issue happens on random booting of multiple harts, which means
it will manifest for BBL and OpenSBI v0.6 (or older version). In OpenSBI
v0.7 (or higher version), we have HSM extension so all the secondary harts
are brought-up by Linux kernel in an orderly fashion. This means we don't
need this change for OpenSBI v0.7 (or higher version).
Signed-off-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The Linux note in the vdso allows glibc to check the running kernel
version without having to issue the uname syscall.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The current max_pfn equals to zero. In this case, I found it caused users
cannot get some page information through /proc such as kpagecount in v5.6
kernel because of new sanity checks. The following message is displayed by
stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t
1" on HiFive unleashed board.
# stress-ng --verbose --physpage 1 -t 1
stress-ng: debug: [109] 4 processors online, 4 processors configured
stress-ng: info: [109] dispatching hogs: 1 physpage
stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0
stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0
stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found
stress-ng: debug: [109] cache allocate: default cache size: 2048K
stress-ng: debug: [109] starting stressors
stress-ng: debug: [109] 1 stressor spawned
stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0)
stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success)
stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
...
stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0)
stress-ng: debug: [109] process [110] terminated
stress-ng: info: [109] successful run completed in 1.00s
#
After applying this patch, the kernel can pass the test.
# stress-ng --verbose --physpage 1 -t 1
stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage
stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs
stress-ng: debug: [104] cache allocate: default cache size: 2048K
stress-ng: debug: [104] starting stressors
stress-ng: debug: [104] 1 stressor spawned
stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated
stress-ng: info: [104] successful run completed in 1.01s
#
Cc: stable@vger.kernel.org
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Yash Shah <yash.shah@sifive.com>
Tested-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The RISC-V N-extension is still in draft state hence remove
N-extension related defines from asm/csr.h.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This patch adds riscv_isa bitmap which represents Host ISA features
common across all Host CPUs. The riscv_isa is not same as elf_hwcap
because elf_hwcap will only have ISA features relevant for user-space
apps whereas riscv_isa will have ISA features relevant to both kernel
and user-space apps.
One of the use-case for riscv_isa bitmap is in KVM hypervisor where
we will use it to do following operations:
1. Check whether hypervisor extension is available
2. Find ISA features that need to be virtualized (e.g. floating
point support, vector extension, etc.)
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The riscv_cpuid_to_hartid_mask() API should be exported to allow
building KVM RISC-V as loadable module.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Commit f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
the following infinite loop:
BUG: stack guard page was hit at 000000008f595917 \
(stack is 00000000bdefe5a4..00000000ae2b06f5)
kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
Call Trace:
irqfd_resampler_ack+0x32/0x90 [kvm]
kvm_notify_acked_irq+0x62/0xd0 [kvm]
kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
ioapic_set_irq+0x20e/0x240 [kvm]
kvm_ioapic_set_irq+0x5c/0x80 [kvm]
kvm_set_irq+0xbb/0x160 [kvm]
? kvm_hv_set_sint+0x20/0x20 [kvm]
irqfd_resampler_ack+0x32/0x90 [kvm]
kvm_notify_acked_irq+0x62/0xd0 [kvm]
kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
ioapic_set_irq+0x20e/0x240 [kvm]
kvm_ioapic_set_irq+0x5c/0x80 [kvm]
kvm_set_irq+0xbb/0x160 [kvm]
? kvm_hv_set_sint+0x20/0x20 [kvm]
....
The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state. That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd. But if the interrupt has _not_ gone low then
ioapic_set_irq is invoked again, causing an infinite loop.
This can only happen for a level-triggered interrupt, otherwise
irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
and then low. Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set. Thus, fix the bug by restricting the lazy invocation
of the ack notifier to edge-triggered interrupts, the only ones that
need it.
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: borisvk@bstnet.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://www.spinics.net/lists/kvm/msg213512.html
Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Current logic incorrectly uses the enum ioapic_irq_destination_types
to check the posted interrupt destination types. However, the value was
set using APIC_DM_XXX macros, which are left-shifted by 8 bits.
Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.
Fixes: (fdcf75621375 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for Linux 5.7, take #2
- Fix compilation with Clang
- Correctly initialize GICv4.1 in the absence of a virtual ITS
- Move SP_EL0 save/restore to the guest entry/exit code
- Handle PC wrap around on 32bit guests, and narrow all 32bit
registers on userspace access
|
|
The corresponding code was added for VMX in commit 42dbaa5a057
("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD.
Fix this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use BUG() in the impossible-to-hit default case when switching on the
scope of INVEPT to squash a warning with clang 11 due to clang treating
the BUG_ON() as conditional.
>> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free'
is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
BUG_ON(1);
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: ce8fe7b77bd8 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix the following warnings seen with !CONFIG_MODULES:
arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable]
29 | static struct orc_entry *cur_orc_table = __start_orc_unwind;
| ^~~~~~~~~~~~~
arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable]
28 | static int *cur_orc_ip_table = __start_orc_unwind_ip;
| ^~~~~~~~~~~~~~~~
Fixes: 153eb2223c79 ("x86/unwind/orc: Convert global variables to static")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Next Mailing List <linux-next@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Add -fasynchronous-unwind-tables to the vDSO CFLAGS"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vdso: Add -fasynchronous-unwind-tables to cflags
|
|
Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a
lockdep splat due to a lock order violation between hrtimer_base::lock and
console_sem, when the 'once' condition is reset via
/sys/kernel/debug/clear_warn_once after boot.
The initial printk cannot trigger this because that happens during boot
when the local APIC timer is set up on the boot CPU.
Prevent it by moving the printk to a place which is guaranteed to be only
called once during boot.
Mark the deadline timer check related functions and data __init while at
it.
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de
|
|
The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and
simply defines __abi_sys_$NAME as alias of __do_sys_$NAME.
As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match
with the declared trace event name.
See also commit 1c758a2202a6 ("tracing/x86: Update syscall trace events to
handle new prefixed syscall func names").
Add __do_sys_ to the valid prefixes which are checked in
arch_syscall_match_sym_name().
Fixes: d2b5de495ee9 ("x86/entry: Refactor SYSCALL_DEFINE0 macros")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz
|
|
In the unlikely event that a 32bit vcpu traps into the hypervisor
on an instruction that is located right at the end of the 32bit
range, the emulation of that instruction is going to increment
PC past the 32bit range. This isn't great, as userspace can then
observe this value and get a bit confused.
Conversly, userspace can do things like (in the context of a 64bit
guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
that PC hasn't been truncated. More confusion.
Fix both by:
- truncating PC increments for 32bit guests
- sanitizing all 32bit regs every time a core reg is changed by
userspace, and that PSTATE indicates a 32bit mode.
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As an optimization, cpa_flush() was changed to optionally only flush
the range in @cpa if it was small enough. However, this range does
not include any direct map aliases changed in cpa_process_alias(). So
small set_memory_() calls that touch that alias don't get the direct
map changes flushed. This situation can happen when the virtual
address taking variants are passed an address in vmalloc or modules
space.
In these cases, force a full TLB flush.
Note this issue does not extend to cases where the set_memory_() calls are
passed a direct map address, or page array, etc, as the primary target. In
those cases the direct map would be flushed.
Fixes: 935f5839827e ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200424105343.GA20730@hirez.programming.kicks-ass.net
|
|
On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables
by default since gcc-8, so now the de facto platform ABI is to allow
unwinding from async signal handlers.
However on bare metal targets (aarch64-none-elf), and on old gcc,
async and sync unwind tables are not enabled by default to avoid
runtime memory costs.
This means if linux is built with a baremetal toolchain the vdso.so
may not have unwind tables which breaks the gcc platform ABI guarantee
in userspace.
Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o
cflags to address the ABI change.
Fixes: 28b1a824a4f4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Cc: Will Deacon <will@kernel.org>
Reported-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We currently save/restore sp_el0 in C code. This is a bit unsafe,
as a lot of the C code expects 'current' to be accessible from
there (and the opportunity to run kernel code in HYP is specially
great with VHE).
Instead, let's move the save/restore of sp_el0 to the assembly
code (in __guest_enter), making sure that sp_el0 is correct
very early on when we exit the guest, and is preserved as long
as possible to its host value when we enter the guest.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
SYM_CODE_START defines \label , so it is redundant to define \label again.
A redefinition at the same place is accepted by GNU as
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=159fbb6088f17a341bcaaac960623cab881b4981)
but rejected by the clang integrated assembler.
Fixes: 617a2f392c92 ("arm64: kvm: Annotate assembly using modern annoations")
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/988
Link: https://lore.kernel.org/r/20200413231016.250737-1-maskray@google.com
|
|
Rather than chunking via PAGE_SIZE, this commit changes the arch
implementations to chunk in explicit 4k parts, so that calculations on
maximum acceptable latency don't suddenly become invalid on platforms
where PAGE_SIZE isn't 4k, such as arm64.
Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Fixes: 012c82388c03 ("crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305")
Fixes: a00fa0c88774 ("crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305")
Fixes: 16aae3595a9d ("crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The initial Zinc patchset, after some mailing list discussion, contained
code to ensure that kernel_fpu_enable would not be kept on for more than
a 4k chunk, since it disables preemption. The choice of 4k isn't totally
scientific, but it's not a bad guess either, and it's what's used in
both the x86 poly1305, blake2s, and nhpoly1305 code already (in the form
of PAGE_SIZE, which this commit corrects to be explicitly 4k for the
former two).
Ard did some back of the envelope calculations and found that
at 5 cycles/byte (overestimate) on a 1ghz processor (pretty slow), 4k
means we have a maximum preemption disabling of 20us, which Sebastian
confirmed was probably a good limit.
Unfortunately the chunking appears to have been left out of the final
patchset that added the glue code. So, this commit adds it back in.
Fixes: 84e03fa39fbe ("crypto: x86/chacha - expose SIMD ChaCha routine as library function")
Fixes: b3aad5bad26a ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function")
Fixes: a44a3430d71b ("crypto: arm/chacha - expose ARM ChaCha routine as library function")
Fixes: d7d7b8535662 ("crypto: x86/poly1305 - wire up faster implementations for kernel")
Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: a6b803b3ddc7 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: Eric Biggers <ebiggers@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Correct the compatible property value for Intel Service Layer driver
on Intel Agilex SoC platform.
Signed-off-by: Richard Gong <richard.gong@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
|
|
Correct the compatible property value for FPGA manager driver on
Intel Agilex SoC platform.
Signed-off-by: Richard Gong <richard.gong@intel.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"A handful of fixes.
Specifically:
- fix linker argument to allow linking with lld
- build fix for configurations without a frame pointer
- a handful of build fixes related the SBI 0.1 vs 0.2 split
- remove STRICT_KERNEL_RWX for !MMU, which isn't useful"
* tag 'riscv-for-linus-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: select ARCH_HAS_STRICT_KERNEL_RWX only if MMU
riscv: sbi: Fix undefined reference to sbi_shutdown
tty: riscv: Using RISCV_SBI_V01 instead of RISCV_SBI
riscv: sbi: Correct sbi_shutdown() and sbi_clear_ipi() export
riscv: fix vdso build with lld
RISC-V: stacktrace: Declare sp_in_global outside ifdef
|
|
Pull s390 fix from Christian Borntraeger:
"Fix a race between page table upgrade and uaccess on s390.
This fixes CVE-2020-11884 which allows for a local kernel crash or
code execution"
* tag 'cve-2020-11884' from emailed bundle:
s390/mm: fix page table upgrade vs 2ndary address mode accesses
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu:
- Two patches from Dexuan fixing suspension bugs
- Three cleanup patches from Andy and Michael
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hyper-v: Remove internal types from UAPI header
hyper-v: Use UUID API for exporting the GUID
x86/hyperv: Suspend/resume the VP assist page for hibernation
Drivers: hv: Move AEOI determination to architecture dependent code
Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Add a few notrace annotations to avoid potential crashes when
switching ftrace tracers.
- Avoid setting affinity for floating irqs in pci code.
- Fix build issue found by kbuild test robot.
* tag 's390-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/protvirt: fix compilation issue
s390/pci: do not set affinity for floating irqs
s390/ftrace: fix potential crashes when switching tracers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- One important fix for a bug in the way we find the cache-line size
from the device tree, which was leading to the wrong size being
reported to userspace on some platforms.
- A fix for 8xx STRICT_KERNEL_RWX which was leaving TLB entries around
leading to a window at boot when the strict mapping wasn't enforced.
- A fix to enable our KUAP (kernel user access prevention) debugging on
PPC32.
- A build fix for clang in lib/mpi.
Thanks to: Chris Packham, Christophe Leroy, Nathan Chancellor, Qian Cai.
* tag 'powerpc-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
lib/mpi: Fix building for powerpc with clang
powerpc/mm: Fix CONFIG_PPC_KUAP_DEBUG on PPC32
powerpc/8xx: Fix STRICT_KERNEL_RWX startup test failure
powerpc/setup_64: Set cache-line-size based on cache-block-size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes:
- an uclamp accounting fix
- three frequency invariance fixes and a readability improvement"
* tag 'sched-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix reset-on-fork from RT with uclamp
x86, sched: Move check for CPU type to caller function
x86, sched: Don't enable static key when starting secondary CPUs
x86, sched: Account for CPUs with less than 4 cores in freq. invariance
x86, sched: Bail out of frequency invariance if base frequency is unknown
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two changes:
- fix exit event records
- extend x86 PMU driver enumeration to add Intel Jasper Lake CPU
support"
* tag 'perf-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: fix parent pid/tid in task exit events
perf/x86/cstate: Add Jasper Lake CPU support
|
|
The following execution path is possible:
fsnotify()
[ realign the stack and store previous SP in R10 ]
<IRQ>
[ only IRET regs saved ]
common_interrupt()
interrupt_entry()
<NMI>
[ full pt_regs saved ]
...
[ unwind stack ]
When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers. So the unwind stops
prematurely.
However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.
Handle this case properly. When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.
Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers. For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between. So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
|
|
If the ORC entry type is unknown, nothing else can be done other than
reporting an error. Exit the function instead of breaking out of the
switch statement.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
|