diff options
Diffstat (limited to 'Documentation/virt/kvm')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 337 | ||||
-rw-r--r-- | Documentation/virt/kvm/devices/vcpu.rst | 36 | ||||
-rw-r--r-- | Documentation/virt/kvm/index.rst | 26 | ||||
-rw-r--r-- | Documentation/virt/kvm/locking.rst | 43 | ||||
-rw-r--r-- | Documentation/virt/kvm/s390/index.rst | 12 | ||||
-rw-r--r-- | Documentation/virt/kvm/s390/s390-diag.rst (renamed from Documentation/virt/kvm/s390-diag.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/s390/s390-pv-boot.rst (renamed from Documentation/virt/kvm/s390-pv-boot.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/s390/s390-pv.rst (renamed from Documentation/virt/kvm/s390-pv.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/vcpu-requests.rst | 17 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/amd-memory-encryption.rst (renamed from Documentation/virt/kvm/amd-memory-encryption.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/cpuid.rst (renamed from Documentation/virt/kvm/cpuid.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/errata.rst | 39 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/halt-polling.rst (renamed from Documentation/virt/kvm/halt-polling.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/hypercalls.rst (renamed from Documentation/virt/kvm/hypercalls.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/index.rst | 19 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/mmu.rst (renamed from Documentation/virt/kvm/mmu.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/msr.rst (renamed from Documentation/virt/kvm/msr.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/nested-vmx.rst (renamed from Documentation/virt/kvm/nested-vmx.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/running-nested-guests.rst (renamed from Documentation/virt/kvm/running-nested-guests.rst) | 0 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/timekeeping.rst (renamed from Documentation/virt/kvm/timekeeping.rst) | 0 |
20 files changed, 423 insertions, 106 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 9f3172376ec3..d13fa6600467 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -151,12 +151,6 @@ In order to create user controlled virtual machines on S390, check KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as privileged user (CAP_SYS_ADMIN). -To use hardware assisted virtualization on MIPS (VZ ASE) rather than -the default trap & emulate implementation (which changes the virtual -memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the -flag KVM_VM_MIPS_VZ. - - On arm64, the physical address size for a VM (IPA Size limit) is limited to 40bits by default. The limit can be configured if the host supports the extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use @@ -417,7 +411,7 @@ kvm_run' (see below). ----------------- :Capability: basic -:Architectures: all except ARM, arm64 +:Architectures: all except arm64 :Type: vcpu ioctl :Parameters: struct kvm_regs (out) :Returns: 0 on success, -1 on error @@ -450,7 +444,7 @@ Reads the general purpose registers from the vcpu. ----------------- :Capability: basic -:Architectures: all except ARM, arm64 +:Architectures: all except arm64 :Type: vcpu ioctl :Parameters: struct kvm_regs (in) :Returns: 0 on success, -1 on error @@ -824,7 +818,7 @@ Writes the floating point state to the vcpu. ----------------------- :Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) -:Architectures: x86, ARM, arm64, s390 +:Architectures: x86, arm64, s390 :Type: vm ioctl :Parameters: none :Returns: 0 on success, -1 on error @@ -833,7 +827,7 @@ Creates an interrupt controller model in the kernel. On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 only go to the IOAPIC. -On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of +On arm64, a GICv2 is created. Any other GIC versions require the usage of KVM_CREATE_DEVICE, which also supports creating a GICv2. Using KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2. On s390, a dummy irq routing table is created. @@ -846,7 +840,7 @@ before KVM_CREATE_IRQCHIP can be used. ----------------- :Capability: KVM_CAP_IRQCHIP -:Architectures: x86, arm, arm64 +:Architectures: x86, arm64 :Type: vm ioctl :Parameters: struct kvm_irq_level :Returns: 0 on success, -1 on error @@ -870,7 +864,7 @@ capability is present (or unless it is not using the in-kernel irqchip, of course). -ARM/arm64 can signal an interrupt either at the CPU level, or at the +arm64 can signal an interrupt either at the CPU level, or at the in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for specific cpus. The irq field is interpreted like this:: @@ -896,7 +890,7 @@ When KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 is supported, the target vcpu is identified as (256 * vcpu2_index + vcpu_index). Otherwise, vcpu2_index must be zero. -Note that on arm/arm64, the KVM_CAP_IRQCHIP capability only conditions +Note that on arm64, the KVM_CAP_IRQCHIP capability only conditions injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always be used for a userspace interrupt controller. @@ -1087,7 +1081,7 @@ Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored. :Capability: KVM_CAP_VCPU_EVENTS :Extended by: KVM_CAP_INTR_SHADOW -:Architectures: x86, arm, arm64 +:Architectures: x86, arm64 :Type: vcpu ioctl :Parameters: struct kvm_vcpu_event (out) :Returns: 0 on success, -1 on error @@ -1146,8 +1140,8 @@ The following bits are defined in the flags field: fields contain a valid state. This bit will be set whenever KVM_CAP_EXCEPTION_PAYLOAD is enabled. -ARM/ARM64: -^^^^^^^^^^ +ARM64: +^^^^^^ If the guest accesses a device that is being emulated by the host kernel in such a way that a real device would generate a physical SError, KVM may make @@ -1206,7 +1200,7 @@ directly to the virtual CPU). :Capability: KVM_CAP_VCPU_EVENTS :Extended by: KVM_CAP_INTR_SHADOW -:Architectures: x86, arm, arm64 +:Architectures: x86, arm64 :Type: vcpu ioctl :Parameters: struct kvm_vcpu_event (in) :Returns: 0 on success, -1 on error @@ -1241,8 +1235,8 @@ can be set in the flags field to signal that the exception_has_payload, exception_payload, and exception.pending fields contain a valid state and shall be written into the VCPU. -ARM/ARM64: -^^^^^^^^^^ +ARM64: +^^^^^^ User space may need to inject several types of events to the guest. @@ -1449,7 +1443,7 @@ for vm-wide capabilities. --------------------- :Capability: KVM_CAP_MP_STATE -:Architectures: x86, s390, arm, arm64, riscv +:Architectures: x86, s390, arm64, riscv :Type: vcpu ioctl :Parameters: struct kvm_mp_state (out) :Returns: 0 on success; -1 on error @@ -1467,7 +1461,7 @@ Possible values are: ========================== =============================================== KVM_MP_STATE_RUNNABLE the vcpu is currently running - [x86,arm/arm64,riscv] + [x86,arm64,riscv] KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP) which has not yet received an INIT signal [x86] KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is @@ -1476,7 +1470,7 @@ Possible values are: is waiting for an interrupt [x86] KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector accessible via KVM_GET_VCPU_EVENTS) [x86] - KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv] + KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm64,riscv] KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390] KVM_MP_STATE_OPERATING the vcpu is operating (running or halted) [s390] @@ -1488,8 +1482,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures. -For arm/arm64/riscv: -^^^^^^^^^^^^^^^^^^^^ +For arm64/riscv: +^^^^^^^^^^^^^^^^ The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. @@ -1498,7 +1492,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. --------------------- :Capability: KVM_CAP_MP_STATE -:Architectures: x86, s390, arm, arm64, riscv +:Architectures: x86, s390, arm64, riscv :Type: vcpu ioctl :Parameters: struct kvm_mp_state (in) :Returns: 0 on success; -1 on error @@ -1510,8 +1504,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures. -For arm/arm64/riscv: -^^^^^^^^^^^^^^^^^^^^ +For arm64/riscv: +^^^^^^^^^^^^^^^^ The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. @@ -1780,14 +1774,14 @@ The flags bitmap is defined as:: ------------------------ :Capability: KVM_CAP_IRQ_ROUTING -:Architectures: x86 s390 arm arm64 +:Architectures: x86 s390 arm64 :Type: vm ioctl :Parameters: struct kvm_irq_routing (in) :Returns: 0 on success, -1 on error Sets the GSI routing table entries, overwriting any previously set entries. -On arm/arm64, GSI routing has the following limitation: +On arm64, GSI routing has the following limitation: - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. @@ -2855,7 +2849,7 @@ after pausing the vcpu, but before it is resumed. ------------------- :Capability: KVM_CAP_SIGNAL_MSI -:Architectures: x86 arm arm64 +:Architectures: x86 arm64 :Type: vm ioctl :Parameters: struct kvm_msi (in) :Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error @@ -3043,7 +3037,7 @@ into the hash PTE second double word). -------------- :Capability: KVM_CAP_IRQFD -:Architectures: x86 s390 arm arm64 +:Architectures: x86 s390 arm64 :Type: vm ioctl :Parameters: struct kvm_irqfd (in) :Returns: 0 on success, -1 on error @@ -3069,7 +3063,7 @@ Note that closing the resamplefd is not sufficient to disable the irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. -On arm/arm64, gsi routing being supported, the following can happen: +On arm64, gsi routing being supported, the following can happen: - in case no routing entry is associated to this gsi, injection fails - in case the gsi is associated to an irqchip routing entry, @@ -3325,7 +3319,7 @@ current state. "addr" is ignored. ---------------------- :Capability: basic -:Architectures: arm, arm64 +:Architectures: arm64 :Type: vcpu ioctl :Parameters: struct kvm_vcpu_init (in) :Returns: 0 on success; -1 on error @@ -3423,7 +3417,7 @@ Possible features: ----------------------------- :Capability: basic -:Architectures: arm, arm64 +:Architectures: arm64 :Type: vm ioctl :Parameters: struct kvm_vcpu_init (out) :Returns: 0 on success; -1 on error @@ -3452,7 +3446,7 @@ VCPU matching underlying host. --------------------- :Capability: basic -:Architectures: arm, arm64, mips +:Architectures: arm64, mips :Type: vcpu ioctl :Parameters: struct kvm_reg_list (in/out) :Returns: 0 on success; -1 on error @@ -3479,7 +3473,7 @@ KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. ----------------------------------------- :Capability: KVM_CAP_ARM_SET_DEVICE_ADDR -:Architectures: arm, arm64 +:Architectures: arm64 :Type: vm ioctl :Parameters: struct kvm_arm_device_address (in) :Returns: 0 on success, -1 on error @@ -3506,13 +3500,13 @@ can access emulated or directly exposed devices, which the host kernel needs to know about. The id field is an architecture specific identifier for a specific device. -ARM/arm64 divides the id field into two parts, a device id and an +arm64 divides the id field into two parts, a device id and an address type id specific to the individual device:: bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 | field: | 0x00000000 | device id | addr type id | -ARM/arm64 currently only require this when using the in-kernel GIC +arm64 currently only require this when using the in-kernel GIC support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id. When setting the base address for the guest's mapping of the VGIC virtual CPU and distributor interface, the ioctl @@ -3683,15 +3677,17 @@ The fields in each entry are defined as follows: 4.89 KVM_S390_MEM_OP -------------------- -:Capability: KVM_CAP_S390_MEM_OP +:Capability: KVM_CAP_S390_MEM_OP, KVM_CAP_S390_PROTECTED, KVM_CAP_S390_MEM_OP_EXTENSION :Architectures: s390 -:Type: vcpu ioctl +:Type: vm ioctl, vcpu ioctl :Parameters: struct kvm_s390_mem_op (in) :Returns: = 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM), > 0 if an exception occurred while walking the page tables -Read or write data from/to the logical (virtual) memory of a VCPU. +Read or write data from/to the VM's memory. +The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is +supported. Parameters are specified via the following structure:: @@ -3701,33 +3697,99 @@ Parameters are specified via the following structure:: __u32 size; /* amount of bytes */ __u32 op; /* type of operation */ __u64 buf; /* buffer in userspace */ - __u8 ar; /* the access register number */ - __u8 reserved[31]; /* should be set to 0 */ + union { + struct { + __u8 ar; /* the access register number */ + __u8 key; /* access key, ignored if flag unset */ + }; + __u32 sida_offset; /* offset into the sida */ + __u8 reserved[32]; /* ignored */ + }; }; -The type of operation is specified in the "op" field. It is either -KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or -KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The -KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check -whether the corresponding memory access would create an access exception -(without touching the data in the memory at the destination). In case an -access exception occurred while walking the MMU tables of the guest, the -ioctl returns a positive error number to indicate the type of exception. -This exception is also raised directly at the corresponding VCPU if the -flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field. - The start address of the memory region has to be specified in the "gaddr" field, and the length of the region in the "size" field (which must not be 0). The maximum value for "size" can be obtained by checking the KVM_CAP_S390_MEM_OP capability. "buf" is the buffer supplied by the userspace application where the read data should be written to for -KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written is -stored for a KVM_S390_MEMOP_LOGICAL_WRITE. When KVM_S390_MEMOP_F_CHECK_ONLY -is specified, "buf" is unused and can be NULL. "ar" designates the access -register number to be used; the valid range is 0..15. +a read access, or where the data that should be written is stored for +a write access. The "reserved" field is meant for future extensions. +Reserved and unused values are ignored. Future extension that add members must +introduce new flags. + +The type of operation is specified in the "op" field. Flags modifying +their behavior can be set in the "flags" field. Undefined flag bits must +be set to 0. + +Possible operations are: + * ``KVM_S390_MEMOP_LOGICAL_READ`` + * ``KVM_S390_MEMOP_LOGICAL_WRITE`` + * ``KVM_S390_MEMOP_ABSOLUTE_READ`` + * ``KVM_S390_MEMOP_ABSOLUTE_WRITE`` + * ``KVM_S390_MEMOP_SIDA_READ`` + * ``KVM_S390_MEMOP_SIDA_WRITE`` + +Logical read/write: +^^^^^^^^^^^^^^^^^^^ + +Access logical memory, i.e. translate the given guest address to an absolute +address given the state of the VCPU and use the absolute address as target of +the access. "ar" designates the access register number to be used; the valid +range is 0..15. +Logical accesses are permitted for the VCPU ioctl only. +Logical accesses are permitted for non-protected guests only. + +Supported flags: + * ``KVM_S390_MEMOP_F_CHECK_ONLY`` + * ``KVM_S390_MEMOP_F_INJECT_EXCEPTION`` + * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` + +The KVM_S390_MEMOP_F_CHECK_ONLY flag can be set to check whether the +corresponding memory access would cause an access exception; however, +no actual access to the data in memory at the destination is performed. +In this case, "buf" is unused and can be NULL. + +In case an access exception occurred during the access (or would occur +in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive +error number indicating the type of exception. This exception is also +raised directly at the corresponding VCPU if the flag +KVM_S390_MEMOP_F_INJECT_EXCEPTION is set. + +If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key +protection is also in effect and may cause exceptions if accesses are +prohibited given the access key designated by "key"; the valid range is 0..15. +KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION +is > 0. + +Absolute read/write: +^^^^^^^^^^^^^^^^^^^^ + +Access absolute memory. This operation is intended to be used with the +KVM_S390_MEMOP_F_SKEY_PROTECTION flag, to allow accessing memory and performing +the checks required for storage key protection as one operation (as opposed to +user space getting the storage keys, performing the checks, and accessing +memory thereafter, which could lead to a delay between check and access). +Absolute accesses are permitted for the VM ioctl if KVM_CAP_S390_MEM_OP_EXTENSION +is > 0. +Currently absolute accesses are not permitted for VCPU ioctls. +Absolute accesses are permitted for non-protected guests only. + +Supported flags: + * ``KVM_S390_MEMOP_F_CHECK_ONLY`` + * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` -The "reserved" field is meant for future extensions. It is not used by -KVM with the currently defined set of flags. +The semantics of the flags are as for logical accesses. + +SIDA read/write: +^^^^^^^^^^^^^^^^ + +Access the secure instruction data area which contains memory operands necessary +for instruction emulation for protected guests. +SIDA accesses are available if the KVM_CAP_S390_PROTECTED capability is available. +SIDA accesses are permitted for the VCPU ioctl only. +SIDA accesses are permitted for protected guests only. + +No flags are supported. 4.90 KVM_S390_GET_SKEYS ----------------------- @@ -4013,6 +4075,11 @@ x2APIC MSRs are always allowed, independent of the ``default_allow`` setting, and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base register. +.. warning:: + MSR accesses coming from nested vmentry/vmexit are not filtered. + This includes both writes to individual VMCS fields and reads/writes + through the MSR lists pointed to by the VMCS. + If a bit is within one of the defined ranges, read and write accesses are guarded by the bitmap's value for the MSR index if the kind of access is included in the ``struct kvm_msr_filter_range`` flags. If no range @@ -4726,7 +4793,7 @@ to I/O ports. ------------------------------------ :Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 -:Architectures: x86, arm, arm64, mips +:Architectures: x86, arm64, mips :Type: vm ioctl :Parameters: struct kvm_clear_dirty_log (in) :Returns: 0 on success, -1 on error @@ -4838,7 +4905,7 @@ version has the following quirks: 4.119 KVM_ARM_VCPU_FINALIZE --------------------------- -:Architectures: arm, arm64 +:Architectures: arm64 :Type: vcpu ioctl :Parameters: int feature (in) :Returns: 0 on success, -1 on error @@ -5225,6 +5292,10 @@ type values: KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO Sets the guest physical address of the vcpu_info for a given vCPU. + As with the shared_info page for the VM, the corresponding page may be + dirtied at any time if event channel interrupt delivery is enabled, so + userspace should always assume that the page is dirty without relying + on dirty logging. KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO Sets the guest physical address of an additional pvclock structure @@ -5920,7 +5991,7 @@ should put the acknowledged interrupt vector into the 'epr' field. If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered a system-level event using some architecture specific mechanism (hypercall -or some special instruction). In case of ARM/ARM64, this is triggered using +or some special instruction). In case of ARM64, this is triggered using HVC instruction based PSCI call from the vcpu. The 'type' field describes the system-level event type. The 'flags' field describes architecture specific flags for the system-level event. @@ -5939,6 +6010,11 @@ Valid values for 'type' are: to ignore the request, or to gather VM memory core dump and/or reset/shutdown of the VM. +Valid flags are: + + - KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (arm64 only) -- the guest issued + a SYSTEM_RESET2 call according to v1.1 of the PSCI specification. + :: /* KVM_EXIT_IOAPIC_EOI */ @@ -6013,7 +6089,7 @@ in send_page or recv a buffer to recv_page). __u64 fault_ipa; } arm_nisv; -Used on arm and arm64 systems. If a guest accesses memory not in a memslot, +Used on arm64 systems. If a guest accesses memory not in a memslot, KVM will typically return to userspace and ask it to do MMIO emulation on its behalf. However, for certain classes of instructions, no instruction decode (direction, length of memory access) is provided, and fetching and decoding @@ -6030,11 +6106,10 @@ did not fall within an I/O window. Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable this capability at VM creation. Once this is done, these types of errors will instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from -the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA -in the fault_ipa field. Userspace can either fix up the access if it's -actually an I/O access by decoding the instruction from guest memory (if it's -very brave) and continue executing the guest, or it can decide to suspend, -dump, or restart the guest. +the ESR_EL2 in the esr_iss field, and the faulting IPA in the fault_ipa field. +Userspace can either fix up the access if it's actually an I/O access by +decoding the instruction from guest memory (if it's very brave) and continue +executing the guest, or it can decide to suspend, dump, or restart the guest. Note that KVM does not skip the faulting instruction as it does for KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state @@ -6741,7 +6816,7 @@ and injected exceptions. 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 -:Architectures: x86, arm, arm64, mips +:Architectures: x86, arm64, mips :Parameters: args[0] whether feature should be enabled or not Valid flags are:: @@ -7011,6 +7086,56 @@ resource that is controlled with the H_SET_MODE hypercall. This capability allows a guest kernel to use a better-performance mode for handling interrupts and system calls. +7.31 KVM_CAP_DISABLE_QUIRKS2 +---------------------------- + +:Capability: KVM_CAP_DISABLE_QUIRKS2 +:Parameters: args[0] - set of KVM quirks to disable +:Architectures: x86 +:Type: vm + +This capability, if enabled, will cause KVM to disable some behavior +quirks. + +Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of +quirks that can be disabled in KVM. + +The argument to KVM_ENABLE_CAP for this capability is a bitmask of +quirks to disable, and must be a subset of the bitmask returned by +KVM_CHECK_EXTENSION. + +The valid bits in cap.args[0] are: + +=================================== ============================================ + KVM_X86_QUIRK_LINT0_REENABLED By default, the reset value for the LVT + LINT0 register is 0x700 (APIC_MODE_EXTINT). + When this quirk is disabled, the reset value + is 0x10000 (APIC_LVT_MASKED). + + KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW. + When this quirk is disabled, KVM does not + change the value of CR0.CD and CR0.NW. + + KVM_X86_QUIRK_LAPIC_MMIO_HOLE By default, the MMIO LAPIC interface is + available even when configured for x2APIC + mode. When this quirk is disabled, KVM + disables the MMIO LAPIC interface if the + LAPIC is in x2APIC mode. + + KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before + exiting to userspace for an OUT instruction + to port 0x7e. When this quirk is disabled, + KVM does not pre-increment %rip before + exiting to userspace. + + KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets + CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if + IA32_MISC_ENABLE[bit 18] (MWAIT) is set. + Additionally, when this quirk is disabled, + KVM clears CPUID.01H:ECX[bit 3] if + IA32_MISC_ENABLE[bit 18] is cleared. +=================================== ============================================ + 8. Other capabilities. ====================== @@ -7138,7 +7263,7 @@ reserved. 8.9 KVM_CAP_ARM_USER_IRQ ------------------------ -:Architectures: arm, arm64 +:Architectures: arm64 This capability, if KVM_CHECK_EXTENSION indicates that it is available, means that if userspace creates a VM without an in-kernel interrupt controller, it @@ -7265,7 +7390,7 @@ HvFlushVirtualAddressList, HvFlushVirtualAddressListEx. 8.19 KVM_CAP_ARM_INJECT_SERROR_ESR ---------------------------------- -:Architectures: arm, arm64 +:Architectures: arm64 This capability indicates that userspace can specify (via the KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it @@ -7575,3 +7700,71 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace the hypercalls whose corresponding bit is in the argument, and return ENOSYS for the others. + +8.35 KVM_CAP_PMU_CAPABILITY +--------------------------- + +:Capability KVM_CAP_PMU_CAPABILITY +:Architectures: x86 +:Type: vm +:Parameters: arg[0] is bitmask of PMU virtualization capabilities. +:Returns 0 on success, -EINVAL when arg[0] contains invalid bits + +This capability alters PMU virtualization in KVM. + +Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of +PMU virtualization capabilities that can be adjusted on a VM. + +The argument to KVM_ENABLE_CAP is also a bitmask and selects specific +PMU virtualization capabilities to be applied to the VM. This can +only be invoked on a VM prior to the creation of VCPUs. + +At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting +this capability will disable PMU virtualization for that VM. Usermode +should adjust CPUID leaf 0xA to reflect that the PMU is disabled. + +9. Known KVM API problems +========================= + +In some cases, KVM's API has some inconsistencies or common pitfalls +that userspace need to be aware of. This section details some of +these issues. + +Most of them are architecture specific, so the section is split by +architecture. + +9.1. x86 +-------- + +``KVM_GET_SUPPORTED_CPUID`` issues +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In general, ``KVM_GET_SUPPORTED_CPUID`` is designed so that it is possible +to take its result and pass it directly to ``KVM_SET_CPUID2``. This section +documents some cases in which that requires some care. + +Local APIC features +~~~~~~~~~~~~~~~~~~~ + +CPU[EAX=1]:ECX[21] (X2APIC) is reported by ``KVM_GET_SUPPORTED_CPUID``, +but it can only be enabled if ``KVM_CREATE_IRQCHIP`` or +``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)`` are used to enable in-kernel emulation of +the local APIC. + +The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature. + +CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``. +It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel +has enabled in-kernel emulation of the local APIC. + +Obsolete ioctls and capabilities +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +KVM_CAP_DISABLE_QUIRKS does not let userspace know which quirks are actually +available. Use ``KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2)`` instead if +available. + +Ordering of KVM_GET_*/KVM_SET_* ioctls +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TBD diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 60a29972d3f1..716aa3edae14 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -70,7 +70,7 @@ irqchip. -ENODEV PMUv3 not supported or GIC not initialized -ENXIO PMUv3 not properly configured or in-kernel irqchip not configured as required prior to calling this attribute - -EBUSY PMUv3 already initialized + -EBUSY PMUv3 already initialized or a VCPU has already run -EINVAL Invalid filter range ======= ====================================================== @@ -104,11 +104,43 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it isn't strictly speaking an event. Filtering the cycle counter is possible using event 0x11 (CPU_CYCLES). +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU +------------------------------------------ + +:Parameters: in kvm_device_attr.addr the address to an int representing the PMU + identifier. + +:Returns: + + ======= ==================================================== + -EBUSY PMUv3 already initialized, a VCPU has already run or + an event filter has already been set + -EFAULT Error accessing the PMU identifier + -ENXIO PMU not found + -ENODEV PMUv3 not supported or GIC not initialized + -ENOMEM Could not allocate memory + ======= ==================================================== + +Request that the VCPU uses the specified hardware PMU when creating guest events +for the purpose of PMU emulation. The PMU identifier can be read from the "type" +file for the desired PMU instance under /sys/devices (or, equivalent, +/sys/bus/even_source). This attribute is particularly useful on heterogeneous +systems where there are at least two CPU PMUs on the system. The PMU that is set +for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU +if a PMU event filter is already present. + +Note that KVM will not make any attempts to run the VCPU on the physical CPUs +associated with the PMU specified by this attribute. This is entirely left to +userspace. However, attempting to run the VCPU on a physical CPU not supported +by the PMU will fail and KVM_RUN will return with +exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting +hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and +the cpu field to the processor id. 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL ================================= -:Architectures: ARM, ARM64 +:Architectures: ARM64 2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER ----------------------------------------------------------------------------- diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst index b6833c7bb474..e0a2c74e1043 100644 --- a/Documentation/virt/kvm/index.rst +++ b/Documentation/virt/kvm/index.rst @@ -8,25 +8,13 @@ KVM :maxdepth: 2 api - amd-memory-encryption - cpuid - halt-polling - hypercalls - locking - mmu - msr - nested-vmx - ppc-pv - s390-diag - s390-pv - s390-pv-boot - timekeeping - vcpu-requests - - review-checklist + devices/index arm/index + s390/index + ppc-pv + x86/index - devices/index - - running-nested-guests + locking + vcpu-requests + review-checklist diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 5d27da356836..845a561629f1 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -210,32 +210,47 @@ time it will be set using the Dirty tracking mechanism described above. 3. Reference ------------ -:Name: kvm_lock +``kvm_lock`` +^^^^^^^^^^^^ + :Type: mutex :Arch: any :Protects: - vm_list -:Name: kvm_count_lock +``kvm_count_lock`` +^^^^^^^^^^^^^^^^^^ + :Type: raw_spinlock_t :Arch: any :Protects: - hardware virtualization enable/disable :Comment: 'raw' because hardware enabling/disabling must be atomic /wrt migration. -:Name: kvm_arch::tsc_write_lock -:Type: raw_spinlock +``kvm->mn_invalidate_lock`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:Type: spinlock_t +:Arch: any +:Protects: mn_active_invalidate_count, mn_memslots_update_rcuwait + +``kvm_arch::tsc_write_lock`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:Type: raw_spinlock_t :Arch: x86 :Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset} - tsc offset in vmcb :Comment: 'raw' because updating the tsc offsets must not be preempted. -:Name: kvm->mmu_lock -:Type: spinlock_t +``kvm->mmu_lock`` +^^^^^^^^^^^^^^^^^ +:Type: spinlock_t or rwlock_t :Arch: any :Protects: -shadow page/shadow tlb entry :Comment: it is a spinlock since it is used in mmu notifier. -:Name: kvm->srcu +``kvm->srcu`` +^^^^^^^^^^^^^ :Type: srcu lock :Arch: any :Protects: - kvm->memslots @@ -246,10 +261,20 @@ time it will be set using the Dirty tracking mechanism described above. The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu if it is needed by multiple functions. -:Name: blocked_vcpu_on_cpu_lock +``kvm->slots_arch_lock`` +^^^^^^^^^^^^^^^^^^^^^^^^ +:Type: mutex +:Arch: any (only needed on x86 though) +:Protects: any arch-specific fields of memslots that have to be modified + in a ``kvm->srcu`` read-side critical section. +:Comment: must be held before reading the pointer to the current memslots, + until after all changes to the memslots are complete + +``wakeup_vcpus_on_cpu_lock`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :Type: spinlock_t :Arch: x86 -:Protects: blocked_vcpu_on_cpu +:Protects: wakeup_vcpus_on_cpu :Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts. When VT-d posted-interrupts is supported and the VM has assigned devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu diff --git a/Documentation/virt/kvm/s390/index.rst b/Documentation/virt/kvm/s390/index.rst new file mode 100644 index 000000000000..605f488f0cc5 --- /dev/null +++ b/Documentation/virt/kvm/s390/index.rst @@ -0,0 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +KVM for s390 systems +==================== + +.. toctree:: + :maxdepth: 2 + + s390-diag + s390-pv + s390-pv-boot diff --git a/Documentation/virt/kvm/s390-diag.rst b/Documentation/virt/kvm/s390/s390-diag.rst index ca85f030eb0b..ca85f030eb0b 100644 --- a/Documentation/virt/kvm/s390-diag.rst +++ b/Documentation/virt/kvm/s390/s390-diag.rst diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390/s390-pv-boot.rst index 73a6083cb5e7..73a6083cb5e7 100644 --- a/Documentation/virt/kvm/s390-pv-boot.rst +++ b/Documentation/virt/kvm/s390/s390-pv-boot.rst diff --git a/Documentation/virt/kvm/s390-pv.rst b/Documentation/virt/kvm/s390/s390-pv.rst index 8e41a3b63fa5..8e41a3b63fa5 100644 --- a/Documentation/virt/kvm/s390-pv.rst +++ b/Documentation/virt/kvm/s390/s390-pv.rst diff --git a/Documentation/virt/kvm/vcpu-requests.rst b/Documentation/virt/kvm/vcpu-requests.rst index ad2915ef7020..db43ee571f5a 100644 --- a/Documentation/virt/kvm/vcpu-requests.rst +++ b/Documentation/virt/kvm/vcpu-requests.rst @@ -112,11 +112,10 @@ KVM_REQ_TLB_FLUSH choose to use the common kvm_flush_remote_tlbs() implementation will need to handle this VCPU request. -KVM_REQ_MMU_RELOAD +KVM_REQ_VM_DEAD - When shadow page tables are used and memory slots are removed it's - necessary to inform each VCPU to completely refresh the tables. This - request is used for that. + This request informs all VCPUs that the VM is dead and unusable, e.g. due to + fatal error or because the VM's state has been intentionally destroyed. KVM_REQ_UNBLOCK @@ -136,6 +135,16 @@ KVM_REQ_UNHALT such as a pending signal, which does not indicate the VCPU's halt emulation should stop, and therefore does not make the request. +KVM_REQ_OUTSIDE_GUEST_MODE + + This "request" ensures the target vCPU has exited guest mode prior to the + sender of the request continuing on. No action needs be taken by the target, + and so no request is actually logged for the target. This request is similar + to a "kick", but unlike a kick it guarantees the vCPU has actually exited + guest mode. A kick only guarantees the vCPU will exit at some point in the + future, e.g. a previous kick may have started the process, but there's no + guarantee the to-be-kicked vCPU has fully exited guest mode. + KVM_REQUEST_MASK ---------------- diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst index 1c6847fff304..1c6847fff304 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/x86/cpuid.rst index bda3e3e737d7..bda3e3e737d7 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/x86/cpuid.rst diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst new file mode 100644 index 000000000000..806f049b6975 --- /dev/null +++ b/Documentation/virt/kvm/x86/errata.rst @@ -0,0 +1,39 @@ + +======================================= +Known limitations of CPU virtualization +======================================= + +Whenever perfect emulation of a CPU feature is impossible or too hard, KVM +has to choose between not implementing the feature at all or introducing +behavioral differences between virtual machines and bare metal systems. + +This file documents some of the known limitations that KVM has in +virtualizing CPU features. + +x86 +=== + +``KVM_GET_SUPPORTED_CPUID`` issues +---------------------------------- + +x87 features +~~~~~~~~~~~~ + +Unlike most other CPUID feature bits, CPUID[EAX=7,ECX=0]:EBX[6] +(FDP_EXCPTN_ONLY) and CPUID[EAX=7,ECX=0]:EBX]13] (ZERO_FCS_FDS) are +clear if the features are present and set if the features are not present. + +Clearing these bits in CPUID has no effect on the operation of the guest; +if these bits are set on hardware, the features will not be present on +any virtual machine that runs on that hardware. + +**Workaround:** It is recommended to always set these bits in guest CPUID. +Note however that any software (e.g ``WIN87EM.DLL``) expecting these features +to be present likely predates these CPUID feature bits, and therefore +doesn't know to check for them anyway. + +Nested virtualization features +------------------------------ + +TBD + diff --git a/Documentation/virt/kvm/halt-polling.rst b/Documentation/virt/kvm/x86/halt-polling.rst index 4922e4a15f18..4922e4a15f18 100644 --- a/Documentation/virt/kvm/halt-polling.rst +++ b/Documentation/virt/kvm/x86/halt-polling.rst diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index e56fa8b9cfca..e56fa8b9cfca 100644 --- a/Documentation/virt/kvm/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst diff --git a/Documentation/virt/kvm/x86/index.rst b/Documentation/virt/kvm/x86/index.rst new file mode 100644 index 000000000000..7ff588826b9f --- /dev/null +++ b/Documentation/virt/kvm/x86/index.rst @@ -0,0 +1,19 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +KVM for x86 systems +=================== + +.. toctree:: + :maxdepth: 2 + + amd-memory-encryption + cpuid + errata + halt-polling + hypercalls + mmu + msr + nested-vmx + running-nested-guests + timekeeping diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst index 5b1ebad24c77..5b1ebad24c77 100644 --- a/Documentation/virt/kvm/mmu.rst +++ b/Documentation/virt/kvm/x86/mmu.rst diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/x86/msr.rst index 9315fc385fb0..9315fc385fb0 100644 --- a/Documentation/virt/kvm/msr.rst +++ b/Documentation/virt/kvm/x86/msr.rst diff --git a/Documentation/virt/kvm/nested-vmx.rst b/Documentation/virt/kvm/x86/nested-vmx.rst index ac2095d41f02..ac2095d41f02 100644 --- a/Documentation/virt/kvm/nested-vmx.rst +++ b/Documentation/virt/kvm/x86/nested-vmx.rst diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/x86/running-nested-guests.rst index bd70c69468ae..bd70c69468ae 100644 --- a/Documentation/virt/kvm/running-nested-guests.rst +++ b/Documentation/virt/kvm/x86/running-nested-guests.rst diff --git a/Documentation/virt/kvm/timekeeping.rst b/Documentation/virt/kvm/x86/timekeeping.rst index 21ae7efa29ba..21ae7efa29ba 100644 --- a/Documentation/virt/kvm/timekeeping.rst +++ b/Documentation/virt/kvm/x86/timekeeping.rst |