diff options
author | Linus Torvalds | 2024-03-15 13:03:13 -0700 |
---|---|---|
committer | Linus Torvalds | 2024-03-15 13:03:13 -0700 |
commit | 4f712ee0cbbd5c777d270427092bb301fc31044f (patch) | |
tree | 21feb90dbb43d3d771249558b090404b8eedc7c2 /Documentation | |
parent | 8a2fbffcbfcb60378626e5d4144a6ff43f3b6776 (diff) | |
parent | 4781179012d9380005649b0fe07f77dcaa2610e3 (diff) |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"S390:
- Changes to FPU handling came in via the main s390 pull request
- Only deliver to the guest the SCLP events that userspace has
requested
- More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same)
- Fix selftests undefined behavior
x86:
- Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can
be programmed *using the architectural encoding*. The enumeration
does NOT say anything about the encoding when the CPU doesn't
report support the event *in general*. It might support it, and it
might support it using the same encoding that made it into the
architectural PMU spec
- Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly
emulates RDMPC, counter availability, and a variety of other
PMC-related behaviors that depend on guest CPUID and therefore are
easier to validate with selftests than with custom guests (aka
kvm-unit-tests)
- Zero out PMU state on AMD if the virtual PMU is disabled, it does
not cause any bug but it wastes time in various cases where KVM
would check if a PMC event needs to be synthesized
- Optimize triggering of emulated events, with a nice ~10%
performance improvement in VM-Exit microbenchmarks when a vPMU is
exposed to the guest
- Tighten the check for "PMI in guest" to reduce false positives if
an NMI arrives in the host while KVM is handling an IRQ VM-Exit
- Fix a bug where KVM would report stale/bogus exit qualification
information when exiting to userspace with an internal error exit
code
- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support
- Rework TDP MMU root unload, free, and alloc to run with mmu_lock
held for read, e.g. to avoid serializing vCPUs when userspace
deletes a memslot
- Tear down TDP MMU page tables at 4KiB granularity (used to be
1GiB). KVM doesn't support yielding in the middle of processing a
zap, and 1GiB granularity resulted in multi-millisecond lags that
are quite impolite for CONFIG_PREEMPT kernels
- Allocate write-tracking metadata on-demand to avoid the memory
overhead when a kernel is built with i915 virtualization support
but the workloads use neither shadow paging nor i915 virtualization
- Explicitly initialize a variety of on-stack variables in the
emulator that triggered KMSAN false positives
- Fix the debugregs ABI for 32-bit KVM
- Rework the "force immediate exit" code so that vendor code
ultimately decides how and when to force the exit, which allowed
some optimization for both Intel and AMD
- Fix a long-standing bug where kvm_has_noapic_vcpu could be left
elevated if vCPU creation ultimately failed, causing extra
unnecessary work
- Cleanup the logic for checking if the currently loaded vCPU is
in-kernel
- Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere
in the kernel) are detected earlier and are less likely to hang the
kernel
x86 Xen emulation:
- Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the gpa
but the underlying host virtual address remains the same
- When possible, use a single host TSC value when computing the
deadline for Xen timers in order to improve the accuracy of the
timer emulation
- Inject pending upcall events when the vCPU software-enables its
APIC to fix a bug where an upcall can be lost (and to follow Xen's
behavior)
- Fall back to the slow path instead of warning if "fast" IRQ
delivery of Xen events fails, e.g. if the guest has aliased xAPIC
IDs
RISC-V:
- Support exception and interrupt handling in selftests
- New self test for RISC-V architectural timer (Sstc extension)
- New extension support (Ztso, Zacas)
- Support userspace emulation of random number seed CSRs
ARM:
- Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
- Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
- Conversion of KVM's representation of LPIs to an xarray, utilized
to address serialization some of the serialization on the LPI
injection path
- Support for _architectural_ VHE-only systems, advertised through
the absence of FEAT_E2H0 in the CPU's ID register
- Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
- Set reserved bits as zero in CPUCFG
- Start SW timer only when vcpu is blocking
- Do not restart SW timer when it is expired
- Remove unnecessary CSR register saving during enter guest
- Misc cleanups and fixes as usual
Generic:
- Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
always true on all architectures except MIPS (where Kconfig
determines the available depending on CPU capabilities). It is
replaced either by an architecture-dependent symbol for MIPS, and
IS_ENABLED(CONFIG_KVM) everywhere else
- Factor common "select" statements in common code instead of
requiring each architecture to specify it
- Remove thoroughly obsolete APIs from the uapi headers
- Move architecture-dependent stuff to uapi/asm/kvm.h
- Always flush the async page fault workqueue when a work item is
being removed, especially during vCPU destruction, to ensure that
there are no workers running in KVM code when all references to
KVM-the-module are gone, i.e. to prevent a very unlikely
use-after-free if kvm.ko is unloaded
- Grab a reference to the VM's mm_struct in the async #PF worker
itself instead of gifting the worker a reference, so that there's
no need to remember to *conditionally* clean up after the worker
Selftests:
- Reduce boilerplate especially when utilize selftest TAP
infrastructure
- Add basic smoke tests for SEV and SEV-ES, along with a pile of
library support for handling private/encrypted/protected memory
- Fix benign bugs where tests neglect to close() guest_memfd files"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
selftests: kvm: remove meaningless assignments in Makefiles
KVM: riscv: selftests: Add Zacas extension to get-reg-list test
RISC-V: KVM: Allow Zacas extension for Guest/VM
KVM: riscv: selftests: Add Ztso extension to get-reg-list test
RISC-V: KVM: Allow Ztso extension for Guest/VM
RISC-V: KVM: Forward SEED CSR access to user space
KVM: riscv: selftests: Add sstc timer test
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Add exception handling support
LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
LoongArch: KVM: Do not restart SW timer when it is expired
LoongArch: KVM: Start SW timer only when vcpu is blocking
LoongArch: KVM: Set reserved bits as zero in CPUCFG
KVM: selftests: Explicitly close guest_memfd files in some gmem tests
KVM: x86/xen: fix recursive deadlock in timer injection
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: improve accuracy of Xen timers
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 53 |
1 files changed, 41 insertions, 12 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 09c7e585ff58..0b5a33ee71ee 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -372,7 +372,7 @@ The bits in the dirty bitmap are cleared before the ioctl returns, unless KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information, see the description of the capability. -Note that the Xen shared info page, if configured, shall always be assumed +Note that the Xen shared_info page, if configured, shall always be assumed to be dirty. KVM will not explicitly mark it such. @@ -5487,8 +5487,9 @@ KVM_PV_ASYNC_CLEANUP_PERFORM __u8 long_mode; __u8 vector; __u8 runstate_update_flag; - struct { + union { __u64 gfn; + __u64 hva; } shared_info; struct { __u32 send_port; @@ -5516,19 +5517,20 @@ type values: KVM_XEN_ATTR_TYPE_LONG_MODE Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This - determines the layout of the shared info pages exposed to the VM. + determines the layout of the shared_info page exposed to the VM. KVM_XEN_ATTR_TYPE_SHARED_INFO - Sets the guest physical frame number at which the Xen "shared info" + Sets the guest physical frame number at which the Xen shared_info page resides. Note that although Xen places vcpu_info for the first 32 vCPUs in the shared_info page, KVM does not automatically do so - and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used - explicitly even when the vcpu_info for a given vCPU resides at the - "default" location in the shared_info page. This is because KVM may - not be aware of the Xen CPU id which is used as the index into the - vcpu_info[] array, so may know the correct default location. - - Note that the shared info page may be constantly written to by KVM; + and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or + KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA be used explicitly even when + the vcpu_info for a given vCPU resides at the "default" location + in the shared_info page. This is because KVM may not be aware of + the Xen CPU id which is used as the index into the vcpu_info[] + array, so may know the correct default location. + + Note that the shared_info page may be constantly written to by KVM; it contains the event channel bitmap used to deliver interrupts to a Xen guest, amongst other things. It is exempt from dirty tracking mechanisms — KVM will not explicitly mark the page as dirty each @@ -5537,9 +5539,21 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO any vCPU has been running or any event channel interrupts can be routed to the guest. - Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info + Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared_info page. +KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA + If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the + Xen capabilities, then this attribute may be used to set the + userspace address at which the shared_info page resides, which + will always be fixed in the VMM regardless of where it is mapped + in guest physical address space. This attribute should be used in + preference to KVM_XEN_ATTR_TYPE_SHARED_INFO as it avoids + unnecessary invalidation of an internal cache when the page is + re-mapped in guest physcial address space. + + Setting the hva to zero will disable the shared_info page. + KVM_XEN_ATTR_TYPE_UPCALL_VECTOR Sets the exception vector used to deliver Xen event channel upcalls. This is the HVM-wide vector injected directly by the hypervisor @@ -5636,6 +5650,21 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable the vcpu_info. +KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA + If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the + Xen capabilities, then this attribute may be used to set the + userspace address of the vcpu_info for a given vCPU. It should + only be used when the vcpu_info resides at the "default" location + in the shared_info page. In this case it is safe to assume the + userspace address will not change, because the shared_info page is + an overlay on guest memory and remains at a fixed host address + regardless of where it is mapped in guest physical address space + and hence unnecessary invalidation of an internal cache may be + avoided if the guest memory layout is modified. + If the vcpu_info does not reside at the "default" location then + it is not guaranteed to remain at the same host address and + hence the aforementioned cache invalidation is required. + KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO Sets the guest physical address of an additional pvclock structure for a given vCPU. This is typically used for guest vsyscall support. |