aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2011-02-09KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_indexJoerg Roedel
The gs_index loading code uses the swapgs instruction to switch to the user gs_base temporarily. This is unsave in an lightweight exit-path in KVM on AMD because the KERNEL_GS_BASE MSR is switches lazily. An NMI happening in the critical path of load_gs_index may use the wrong GS_BASE value then leading to unpredictable behavior, e.g. a triple-fault. This patch fixes the issue by making sure that load_gs_index is called only with a valid KERNEL_GS_BASE value loaded in KVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-13thp: mmu_notifier_test_youngAndrea Arcangeli
For GRU and EPT, we need gup-fast to set referenced bit too (this is why it's correct to return 0 when shadow_access_mask is zero, it requires gup-fast to set the referenced bit). qemu-kvm access already sets the young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow paging EPT minor fault we relay on gup-fast to signal the page is in use... We also need to check the young bits on the secondary pagetables for NPT and not nested shadow mmu as the data may never get accessed again by the primary pte. Without this closer accuracy, we'd have to remove the heuristic that avoids collapsing hugepages in hugepage virtual regions that have not even a single subpage in use. ->test_young is full backwards compatible with GRU and other usages that don't have young bits in pagetables set by the hardware and that should nuke the secondary mmu mappings when ->clear_flush_young runs just like EPT does. Removing the heuristic that checks the young bit in khugepaged/collapse_huge_page completely isn't so bad either probably but I thought it was worth it and this makes it reliable. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13thp: kvm mmu transparent hugepage supportAndrea Arcangeli
This should work for both hugetlbfs and transparent hugepages. [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: Initialize fpu state in preemptible context KVM: VMX: when entering real mode align segment base to 16 bytes KVM: MMU: handle 'map_writable' in set_spte() function KVM: MMU: audit: allow audit more guests at the same time KVM: Fetch guest cr3 from hardware on demand KVM: Replace reads of vcpu->arch.cr3 by an accessor KVM: MMU: only write protect mappings at pagetable level KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() KVM: MMU: Initialize base_role for tdp mmus KVM: VMX: Optimize atomic EFER load KVM: VMX: Add definitions for more vm entry/exit control bits KVM: SVM: copy instruction bytes from VMCB KVM: SVM: implement enhanced INVLPG intercept KVM: SVM: enhance mov DR intercept handler KVM: SVM: enhance MOV CR intercept handler KVM: SVM: add new SVM feature bit names KVM: cleanup emulate_instruction KVM: move complete_insn_gp() into x86.c KVM: x86: fix CR8 handling KVM guest: Fix kvm clock initialization when it's configured out ...
2011-01-12KVM: Initialize fpu state in preemptible contextAvi Kivity
init_fpu() (which is indirectly called by the fpu switching code) assumes it is in process context. Rather than makeing init_fpu() use an atomic allocation, which can cause a task to be killed, make sure the fpu is already initialized when we enter the run loop. KVM-Stable-Tag. Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org> Acked-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: VMX: when entering real mode align segment base to 16 bytesGleb Natapov
VMX checks that base is equal segment shifted 4 bits left. Otherwise guest entry fails. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: handle 'map_writable' in set_spte() functionXiao Guangrong
Move the operation of 'writable' to set_spte() to clean up code [avi: remove unneeded booleanification] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: audit: allow audit more guests at the same timeXiao Guangrong
It only allows to audit one guest in the system since: - 'audit_point' is a glob variable - mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled after a guest exited this patch fix those issues then allow to audit more guests at the same time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: Fetch guest cr3 from hardware on demandAvi Kivity
Instead of syncing the guest cr3 every exit, which is expensince on vmx with ept enabled, sync it only on demand. [sheng: fix incorrect cr3 seen by Windows XP] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: Replace reads of vcpu->arch.cr3 by an accessorAvi Kivity
This allows us to keep cr3 in the VMCS, later on. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: only write protect mappings at pagetable levelMarcelo Tosatti
If a pagetable contains a writeable large spte, all of its sptes will be write protected, including non-leaf ones, leading to endless pagefaults. Do not write protect pages above PT_PAGE_TABLE_LEVEL, as the spte fault paths assume non-leaf sptes are writable. Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()Avi Kivity
'error' is byte sized, so use a byte register constraint. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: MMU: Initialize base_role for tdp mmusAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: VMX: Optimize atomic EFER loadAvi Kivity
When NX is enabled on the host but not on the guest, we use the entry/exit msr load facility, which is slow. Optimize it to use entry/exit efer load, which is ~1200 cycles faster. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: SVM: copy instruction bytes from VMCBAndre Przywara
In case of a nested page fault or an intercepted #PF newer SVM implementations provide a copy of the faulting instruction bytes in the VMCB. Use these bytes to feed the instruction emulator and avoid the costly guest instruction fetch in this case. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: SVM: implement enhanced INVLPG interceptAndre Przywara
When the DecodeAssist feature is available, the linear address is provided in the VMCB on INVLPG intercepts. Use it directly to avoid any decoding and emulation. This is only useful for shadow paging, though. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: SVM: enhance mov DR intercept handlerAndre Przywara
Newer SVM implementations provide the GPR number in the VMCB, so that the emulation path is no longer necesarry to handle debug register access intercepts. Implement the handling in svm.c and use it when the info is provided. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: SVM: enhance MOV CR intercept handlerAndre Przywara
Newer SVM implementations provide the GPR number in the VMCB, so that the emulation path is no longer necesarry to handle CR register access intercepts. Implement the handling in svm.c and use it when the info is provided. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: SVM: add new SVM feature bit namesAndre Przywara
the recent APM Vol.2 and the recent AMD CPUID specification describe new CPUID features bits for SVM. Name them here for later usage. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: cleanup emulate_instructionAndre Przywara
emulate_instruction had many callers, but only one used all parameters. One parameter was unused, another one is now hidden by a wrapper function (required for a future addition anyway), so most callers use now a shorter parameter list. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: move complete_insn_gp() into x86.cAndre Przywara
move the complete_insn_gp() helper function out of the VMX part into the generic x86 part to make it usable by SVM. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: x86: fix CR8 handlingAndre Przywara
The handling of CR8 writes in KVM is currently somewhat cumbersome. This patch makes it look like the other CR register handlers and fixes a possible issue in VMX, where the RIP would be incremented despite an injected #GP. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()Takuya Yoshikawa
In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking slots_lock in the error handling path. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: return true when user space query KVM_CAP_USER_NMI extensionLai Jiangshan
userspace may check this extension in runtime. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: Correct kvm_pio tracepoint count fieldAvi Kivity
Currently, we record '1' for count regardless of the real count. Fix. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: Fix incorrect direct page write protection due to ro host pageAvi Kivity
If KVM sees a read-only host page, it will map it as read-only to prevent breaking a COW. However, if the page was part of a large guest page, KVM incorrectly extends the write protection to the entire large page frame instead of limiting it to the normal host page. This results in the instantiation of a new shadow page with read-only access. If this happens for a MOVS instruction that moves memory between two normal pages, within a single large page frame, and mapped within the guest as a large page, and if, in addition, the source operand is not writeable in the host (perhaps due to KSM), then KVM will instantiate a read-only direct shadow page, instantiate an spte for the source operand, then instantiate a new read/write direct shadow page and instantiate an spte for the destination operand. Since these two sptes are in different shadow pages, MOVS will never see them at the same time and the guest will not make progress. Fix by mapping the direct shadow page read/write, and only marking the host page read-only. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add xsetbv interceptJoerg Roedel
This patch implements the xsetbv intercept to the AMD part of KVM. This makes AVX usable in a save way for the guest on AVX capable AMD hardware. The patch is tested by using AVX in the guest and host in parallel and checking for data corruption. I also used the KVM xsave unit-tests and they all pass. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: Make the way of accessing lpage_info more genericTakuya Yoshikawa
Large page information has two elements but one of them, write_count, alone is accessed by a helper function. This patch replaces this helper function with more generic one which returns newly named kvm_lpage_info structure and use it to access the other element rmap_pde. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: VMX: add module parameter to avoid trapping HLT instructions (v5)Anthony Liguori
In certain use-cases, we want to allocate guests fixed time slices where idle guest cycles leave the machine idling. There are many approaches to achieve this but the most direct is to simply avoid trapping the HLT instruction which lets the guest directly execute the instruction putting the processor to sleep. Introduce this as a module-level option for kvm-vmx.ko since if you do this for one guest, you probably want to do it for all. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Implement Flush-By-Asid featureJoerg Roedel
This patch adds the new flush-by-asid of upcoming AMD processors to the KVM-AMD module. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Use svm_flush_tlb instead of force_new_asidJoerg Roedel
This patch replaces all calls to force_new_asid which are intended to flush the guest-tlb by the more appropriate function svm_flush_tlb. As a side-effect the force_new_asid function is removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Remove flush_guest_tlb functionJoerg Roedel
This function is unused and there is svm_flush_tlb which does the same. So this function can be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: retry #PF for softmmuXiao Guangrong
Retry #PF for softmmu only when the current vcpu has the same cr3 as the time when #PF occurs Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: fix accessed bit set on prefault pathXiao Guangrong
Retry #PF is the speculative path, so don't set the accessed bit Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: rename 'no_apf' to 'prefault'Xiao Guangrong
It's the speculative path if 'no_apf = 1' and we will specially handle this speculative path in the later patch, so 'prefault' is better to fit the sense. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for LBR stateJoerg Roedel
This patch implements the clean-bit for all LBR related state. This includes the debugctl, br_from, br_to, last_excp_from, and last_excp_to msrs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for CR2 registerJoerg Roedel
This patch implements the clean-bit for the cr2 register in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for Segements and CPLJoerg Roedel
This patch implements the clean-bit defined for the cs, ds, ss, an es segemnts and the current cpl saved in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for GDT and IDTJoerg Roedel
This patch implements the clean-bit for the base and limit of the gdt and idt in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for DR6 and DR7Joerg Roedel
This patch implements the clean-bit for the dr6 and dr7 debug registers in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for control registersJoerg Roedel
This patch implements the CRx clean-bit for the vmcb. This bit covers cr0, cr3, cr4, and efer. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for NPT stateJoerg Roedel
This patch implements the clean-bit for all nested paging related state in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for interrupt stateJoerg Roedel
This patch implements the clean-bit for all interrupt related state in the vmcb. This corresponds to vmcb offset 0x60-0x67. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for the ASIDJoerg Roedel
This patch implements the clean-bit for the asid in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for IOPM_BASE and MSRPM_BASEJoerg Roedel
This patch adds the clean bit for the physical addresses of the MSRPM and the IOPM. It does not need to be set in the code because the only place where these values are changed is the nested-svm vmrun and vmexit path. These functions already mark the complete VMCB as dirty. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bit for intercetps, tsc-offset and pause filter countJoerg Roedel
This patch adds the clean-bit for intercepts-vectors, the TSC offset and the pause-filter count to the appropriate places. The IO and MSR permission bitmaps are not subject to this bit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Add clean-bits infrastructure codeRoedel, Joerg
This patch adds the infrastructure for the implementation of the individual clean-bits. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: MMU: Avoid dropping accessed bit while removing write accessTakuya Yoshikawa
One more "KVM: MMU: Don't drop accessed bit while updating an spte." Sptes are accessed by both kvm and hardware. This patch uses update_spte() to fix the way of removing write access. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: VMX: Return 0 from a failed VMREADAvi Kivity
If we execute VMREAD during reboot we'll just skip over it. Instead of returning garbage, return 0, which has a much smaller chance of confusing the code. Otherwise we risk a flood of debug printk()s which block the reboot process if a serial console or netconsole is enabled. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12KVM: SVM: Use get_host_vmcb function in svm_get_msr for TSCJoerg Roedel
This patch replaces the open-coded vmcb-selection for the TSC calculation with the new get_host_vmcb helper function introduced in this patchset. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>