aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/xen
AgeCommit message (Collapse)Author
2011-03-11xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest ↵Konrad Rzeszutek Wilk
and fix overflow. If we have a guest that asked for: memory=1024 maxmem=2048 Which means we want 1GB now, and create pagetables so that we can expand up to 2GB, we would have this E820 layout: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps." we would mark the memory past the 1GB mark as unusuable resulting in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 0000000040000000 - 0000000080800000 (unusable) which meant that we could not balloon up anymore. We could balloon the guest down. The fix is to run the code introduced by the above mentioned patch only for the initial domain. We will have to revisit this once we start introducing a modified E820 for PCI passthrough so that we can utilize the P2M identity code. We also fix an overflow by having UL instead of ULL on 32-bit machines. [v2: Ian pointed to the overflow issue] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-22xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.Zhang, Fengzhe
With the hypervisor argument of dom0_mem=X we iterate over the physical (only for the initial domain) E820 and subtract the the size from each E820_RAM region the delta so that the cumulative size of all E820_RAM regions is equal to 'X'. This sometimes ends up with E820_RAM regions with zero size (which are removed by e820_sanitize) and E820_RAM that are smaller than physically. Later on the PCI API looks at the E820 and attempts to set up an resource region for the "PCI mem". The E820 (assume dom0_mem=1GB is set) compared to the physical looks as so: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000097c00 (usable) [ 0.000000] Xen: 0000000000097c00 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 00000000defafe00 (usable) +[ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS) [ 0.000000] Xen: 00000000defb1ea0 - 00000000e0000000 (reserved) [ 0.000000] Xen: 00000000f4000000 - 00000000f8000000 (reserved) .. And we get [ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:9efafe00) while it should have started at e0000000 (a nice big gap up to f4000000 exists). The "Allocating PCI" is part of the resource API. The users that end up using those PCI I/O regions usually supply their own BARs when calling the resource API (request_resource, or allocate_resource), but there are exceptions which provide an empty 'struct resource' and expect the API to provide the 'struct resource' to be populated with valid values. The one that triggered this bug was the intel AGP driver that requested a region for the flush page (intel_i9xx_setup_flush). Before this patch, when running under Xen hypervisor, the 'struct resource' returned could have (depending on the dom0_mem size) physical ranges of a 'System RAM' instead of 'I/O' regions. This ended up with the Hypervisor failing a request to populate PTE's with those PFNs as the domain did not have access to those 'System RAM' regions (rightly so). After this patch, the left-over E820_RAM region from the truncation, will be labeled as E820_UNUSABLE. The E820 will look as so: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000097c00 (usable) [ 0.000000] Xen: 0000000000097c00 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 00000000defafe00 (usable) +[ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) +[ 0.000000] Xen: 0000000040000000 - 00000000defafe00 (unusable) [ 0.000000] Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS) [ 0.000000] Xen: 00000000defb1ea0 - 00000000e0000000 (reserved) [ 0.000000] Xen: 00000000f4000000 - 00000000f8000000 (reserved) For more information: http://mid.gmane.org/1A42CE6F5F474C41B63392A5F80372B2335E978C@shsmsx501.ccr.corp.intel.com BugLink: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726 Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-27xen/setup: Route halt operations to safe_halt pvop.Stefano Stabellini
With this patch, the cpuidle driver does not load and does not issue the mwait operations. Instead the hypervisor is doing them (b/c we call the safe_halt pvops call). This fixes quite a lot of bootup issues wherein the user had to force interrupts for the continuation of the bootup. Details are discussed in: http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00535.html [v2: Wrote the commit description] Reported-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Tested-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-27xen/e820: Guard against E820_RAM not having page-aligned size or start.Stefano Stabellini
Under Dell Inspiron 1525, and Intel SandyBridge SDP's the BIOS e820 RAM is not page-aligned: [ 0.000000] Xen: 0000000000100000 - 00000000df66d800 (usable) We were not handling that and ended up setting up a pagetable that included up to df66e000 with the disastrous effect that when memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t)); tried to clear the page it would crash at the 2K mark. Initially reported by Michael Young @ http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00108.html The fix is to page-align the size and also take into consideration the start of the E820 (in case that is not page-aligned either). This fixes the bootup failure on those affected machines. This patch is a rework of the Micheal A Young initial patch and considers the case if the start is not page-aligned. Reported-by: Michael A Young <m.a.young@durham.ac.uk> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Michael A Young <m.a.young@durham.ac.uk>
2011-01-27xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn.Stefan Bader
In case the mfn_list does not have enough entries to fill a p2m page we do not want the entries from max_pfn up to the boundary to be filled with unknown values. Hence set them to INVALID_P2M_ENTRY. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-21Merge branch 'stable/bug-fixes-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: p2m: correctly initialize partial p2m leaf xen: fix non-ANSI function warning in irq.c
2011-01-21xen: p2m: correctly initialize partial p2m leafStefan Bader
After changing the p2m mapping to a tree by commit 58e05027b530ff081ecea68e38de8d59db8f87e0 xen: convert p2m to a 3 level tree and trying to boot a DomU with 615MB of memory, the following crash was observed in the dump: kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c0107397>] xen_set_pte+0x27/0x60 *pdpt = 0000000000000000 *pde = 0000000000000000 Adding further debug statements showed that when trying to set up pfn=0x26700 the returned mapping was invalid. pfn=0x266ff calling set_pte(0xc1fe77f8, 0x6b3003) pfn=0x26700 calling set_pte(0xc1fe7800, 0x3) Although the last_pfn obtained from the startup info is 0x26700, which should in turn not be hit, the additional 8MB which are added as extra memory normally seem to be ok. This lead to looking into the initial p2m tree construction, which uses the smaller value and assuming that there is other code handling the extra memory. When the p2m tree is set up, the leaves are directly pointed to the array which the domain builder set up. But if the mapping is not on a boundary that fits into one p2m page, this will result in the last leaf being only partially valid. And as the invalid entries are not initialized in that case, things go badly wrong. I am trying to fix that by checking whether the current leaf is a complete map and if not, allocate a completely new page and copy only the valid pointers there. This may not be the most efficient or elegant solution, but at least it seems to allow me booting DomUs with memory assignments all over the range. BugLink: http://bugs.launchpad.net/bugs/686692 [v2: Redid a bit of commit wording and fixed a compile warning] Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-20xen: fix non-ANSI function warning in irq.cRandy Dunlap
Fix sparse warning for non-ANSI function declaration: arch/x86/xen/irq.c:129:30: warning: non-ANSI function declaration of function 'xen_init_irq_ops' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-20lockdep: Move early boot local IRQ enable/disable status to init/main.cTejun Heo
During early boot, local IRQ is disabled until IRQ subsystem is properly initialized. During this time, no one should enable local IRQ and some operations which usually are not allowed with IRQ disabled, e.g. operations which might sleep or require communications with other processors, are allowed. lockdep tracked this with early_boot_irqs_off/on() callbacks. As other subsystems need this information too, move it to init/main.c and make it generally available. While at it, toggle the boolean to early_boot_irqs_disabled instead of enabled so that it can be initialized with %false and %true indicates the exceptional condition. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110120110635.GB6036@htj.dyndns.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-14xen: export arbitrary_virt_to_machineStephen Rothwell
Fixes this build error: ERROR: "arbitrary_virt_to_machine" [drivers/xen/xen-gntdev.ko] undefined! Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13Merge branch 'stable/gntdev' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m: Fix module linking error. xen p2m: clear the old pte when adding a page to m2p_override xen gntdev: use gnttab_map_refs and gnttab_unmap_refs xen: introduce gnttab_map_refs and gnttab_unmap_refs xen p2m: transparently change the p2m mappings in the m2p override xen/gntdev: Fix circular locking dependency xen/gntdev: stop using "token" argument xen: gntdev: move use of GNTMAP_contains_pte next to the map_op xen: add m2p override mechanism xen: move p2m handling to separate file xen/gntdev: add VM_PFNMAP to vma xen/gntdev: allow usermode to map granted pages xen: define gnttab_set_map_op/unmap_op Fix up trivial conflict in drivers/xen/Kconfig
2011-01-11xen/p2m: Fix module linking error.Konrad Rzeszutek Wilk
Fixes: ERROR: "m2p_find_override_pfn" [drivers/block/xen-blkfront.ko] undefined! Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-11xen p2m: clear the old pte when adding a page to m2p_overrideJeremy Fitzhardinge
When adding a page to m2p_override we change the p2m of the page so we need to also clear the old pte of the kernel linear mapping because it doesn't correspond anymore. When we remove the page from m2p_override we restore the original p2m of the page and we also restore the old pte of the kernel linear mapping. Before changing the p2m mappings in m2p_add_override and m2p_remove_override, check that the page passed as argument is valid and return an error if it is not. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-11xen p2m: transparently change the p2m mappings in the m2p overrideStefano Stabellini
In m2p_add_override store the original mfn into page->index and then change the p2m mapping, setting mfns as FOREIGN_FRAME. In m2p_remove_override restore the original mapping. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-11xen: add m2p override mechanismJeremy Fitzhardinge
Add a simple hashtable based mechanism to override some portions of the m2p, so that we can find out the pfn corresponding to an mfn of a granted page. In fact entries corresponding to granted pages in the m2p hold the original pfn value of the page in the source domain that granted it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-11xen: move p2m handling to separate fileJeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-10Merge branch 'stable/bug-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/event: validate irq before get evtchn by irq xen/fb: fix potential memory leak xen/fb: fix xenfb suspend/resume race. xen: disable ACPI NUMA for PV guests xen/irq: Cleanup the find_unbound_irq
2011-01-10Merge branch 'stable/generic' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/generic' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: HVM X2APIC support apic: Move hypervisor detection of x2apic to hypervisor.h
2011-01-10xen: disable ACPI NUMA for PV guestsIan Campbell
Xen does not currently expose PV-NUMA information to PV guests. Therefore disable NUMA for the time being to prevent the kernel picking up on an host-level NUMA information which it might come across in the firmware. [ Added comment - Jeremy ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-07xen: HVM X2APIC supportSheng Yang
This patch is similiar to Gleb Natapov's patch for KVM, which enable the hypervisor to emulate x2apic feature for the guest. By this way, the emulation of lapic would be simpler with x2apic interface(MSR), and faster. [v2: Re-organized 'xen_hvm_need_lapic' per Ian Campbell suggestion] Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-12-17Merge branch 'this_cpu_ops' into for-2.6.38Tejun Heo
2010-12-17xen: Use this_cpu_opsChristoph Lameter
Use this_cpu_ops to reduce code size and simplify things in various places. V3->V4: Move instance of this_cpu_inc_return to a later patchset so that this patch can be applied without infrastructure changes. Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-08Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and ↵Linus Torvalds
'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/pvclock: Zero last_value on resume * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf record: Fix eternal wait for stillborn child perf header: Don't assume there's no attr info if no sample ids is provided perf symbols: Figure out start address of kernel map from kallsyms perf symbols: Fix kallsyms kernel/module map splitting * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: Fix printk_needs_cpu() return value on offline cpus printk: Fix wake_up_klogd() vs cpu hotplug
2010-12-03Merge branch '2.6.37-rc4-pvhvm-fixes' of ↵Linus Torvalds
git://xenbits.xen.org/people/sstabellini/linux-pvhvm * '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: unplug the emulated devices at resume time xen: fix save/restore for PV on HVM guests with pirq remapping xen: resume the pv console for hvm guests too xen: fix MSI setup and teardown for PV on HVM guests xen: use PHYSDEVOP_get_free_pirq to implement find_unbound_pirq
2010-12-03Merge branches 'upstream/core' and 'upstream/bugfix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: allocate irq descs on any NUMA node xen: prevent crashes with non-HIGHMEM 32-bit kernels with largeish memory xen: use default_idle xen: clean up "extra" memory handling some more * 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: x86/32: perform initial startup on initial_page_table xen: don't bother to stop other cpus on shutdown/reboot
2010-12-02vmalloc: eagerly clear ptes on vunmapJeremy Fitzhardinge
On stock 2.6.37-rc4, running: # mount lilith:/export /mnt/lilith # find /mnt/lilith/ -type f -print0 | xargs -0 file crashes the machine fairly quickly under Xen. Often it results in oops messages, but the couple of times I tried just now, it just hung quietly and made Xen print some rude messages: (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1d7058 (pfn 18fa7) (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp 1000000000000000) for mfn 1d2e04 (pfn 1d1fb) (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04 Which means the domain tried to map a pagetable page RW, which would allow it to map arbitrary memory, so Xen stopped it. This is because vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had finished with them, and those pages got recycled as pagetable pages while still having these RW aliases. Removing those mappings immediately removes the Xen-visible aliases, and so it has no problem with those pages being reused as pagetable pages. Deferring the TLB flush doesn't upset Xen because it can flush the TLB itself as needed to maintain its invariants. When unmapping a region in the vmalloc space, clear the ptes immediately. There's no point in deferring this because there's no amortization benefit. The TLBs are left dirty, and they are flushed lazily to amortize the cost of the IPIs. This specific motivation for this patch is an oops-causing regression since 2.6.36 when using NFS under Xen, triggered by the NFS client's use of vm_map_ram() introduced in 56e4ebf877b60 ("NFS: readdir with vmapped pages") . XFS also uses vm_map_ram() and could cause similar problems. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-02xen: unplug the emulated devices at resume timeStefano Stabellini
Early after being resumed we need to unplug again the emulated devices. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-11-29xen: x86/32: perform initial startup on initial_page_tableIan Campbell
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-29xen: don't bother to stop other cpus on shutdown/rebootJeremy Fitzhardinge
Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's no need to do it manually. In any case it will fail because all the IPI irqs have been pulled down by this point, so the cross-CPU calls will simply hang forever. Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls were not synchronously waited for, so this wasn't apparent. However after that change the calls became synchronous leading to a hang on shutdown on multi-VCPU guests. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Alok Kataria <akataria@vmware.com>
2010-11-28x86/pvclock: Zero last_value on resumeJeremy Fitzhardinge
If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: remove duplicated #include xen: x86/32: perform initial startup on initial_page_table
2010-11-24xen: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in arch/x86/xen/setup.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-24xen: x86/32: perform initial startup on initial_page_tableIan Campbell
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-22xen: use default_idleJeremy Fitzhardinge
We just need the idle loop to drop into safe_halt, which default_idle() is perfectly capable of doing. There's no need to duplicate it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-22xen: clean up "extra" memory handling some moreJeremy Fitzhardinge
Make sure that extra_pages is added for all E820_RAM regions beyond mem_end - completely excluded regions as well as the remains of partially included regions. Also makes sure the extra region is not unnecessarily high, and simplifies the logic to decide which regions should be added. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-22Merge branches 'upstream/core', 'upstream/xenfs' and 'upstream/evtchn' into ↵Jeremy Fitzhardinge
upstream/for-linus * upstream/core: xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs. xen: set IO permission early (before early_cpu_init()) xen: re-enable boot-time ballooning xen/balloon: make sure we only include remaining extra ram xen/balloon: the balloon_lock is useless xen: add extra pages to balloon xen/events: use locked set|clear_bit() for cpu_evtchn_mask xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore xen: implement XENMEM_machphys_mapping * upstream/xenfs: Revert "xen/privcmd: create address space to allow writable mmaps" xen/xenfs: update xenfs_mount for new prototype xen: fix header export to userspace xen: set vma flag VM_PFNMAP in the privcmd mmap file_op xen: xenfs: privcmd: check put_user() return code * upstream/evtchn: xen: make evtchn's name less generic xen/evtchn: the evtchn device is non-seekable xen/evtchn: add missing static xen/evtchn: Fix name of Xen event-channel device xen/evtchn: don't do unbind_from_irqhandler under spinlock xen/evtchn: remove spurious barrier xen/evtchn: ports start enabled xen/evtchn: dynamically allocate port_user array xen/evtchn: track enabled state for each port
2010-11-22xen: set IO permission early (before early_cpu_init())Konrad Rzeszutek Wilk
This patch is based off "xen dom0: Set up basic IO permissions for dom0." by Juan Quintela <quintela@redhat.com>. On AMD machines when we boot the kernel as Domain 0 we get this nasty: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1-101116 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8130271b>] (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest (XEN) rax: 000000008000c068 rbx: ffffffff8186c680 rcx: 0000000000000068 (XEN) rdx: 0000000000000cf8 rsi: 000000000000c000 rdi: 0000000000000000 (XEN) rbp: ffffffff81801e98 rsp: ffffffff81801e50 r8: ffffffff81801eac (XEN) r9: ffffffff81801ea8 r10: ffffffff81801eb4 r11: 00000000ffffffff (XEN) r12: ffffffff8186c694 r13: ffffffff81801f90 r14: ffffffffffffffff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000221803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801e50: RIP points to read_pci_config() function. The issue is that we don't set IO permissions for the Linux kernel early enough. The call sequence used to be: xen_start_kernel() x86_init.oem.arch_setup = xen_setup_arch; setup_arch: - early_cpu_init - early_init_amd - read_pci_config - x86_init.oem.arch_setup [ xen_arch_setup ] - set IO permissions. We need to set the IO permissions earlier on, which this patch does. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-19xen: re-enable boot-time ballooningJeremy Fitzhardinge
Now that the balloon driver doesn't stumble over non-RAM pages, we can enable the extra space for ballooning. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-16Merge commit 'v2.6.37-rc2' into upstream/xenfsJeremy Fitzhardinge
* commit 'v2.6.37-rc2': (10093 commits) Linux 2.6.37-rc2 capabilities/syslog: open code cap_syslog logic to fix build failure i2c: Sanity checks on adapter registration i2c: Mark i2c_adapter.id as deprecated i2c: Drivers shouldn't include <linux/i2c-id.h> i2c: Delete unused adapter IDs i2c: Remove obsolete cleanup for clientdata include/linux/kernel.h: Move logging bits to include/linux/printk.h Fix gcc 4.5.1 miscompiling drivers/char/i8k.c (again) hwmon: (w83795) Check for BEEP pin availability hwmon: (w83795) Clear intrusion alarm immediately hwmon: (w83795) Read the intrusion state properly hwmon: (w83795) Print the actual temperature channels as sources hwmon: (w83795) List all usable temperature sources hwmon: (w83795) Expose fan control method hwmon: (w83795) Fix fan control mode attributes hwmon: (lm95241) Check validity of input values hwmon: Change mail address of Hans J. Koch PCI: sysfs: fix printk warnings GFS2: Fix inode deallocation race ...
2010-11-12xen: implement XENMEM_machphys_mappingIan Campbell
This hypercall allows Xen to specify a non-default location for the machine to physical mapping. This capability is used when running a 32 bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to exactly the size required. [ Impact: add Xen hypercall definitions ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-11-11xen: set vma flag VM_PFNMAP in the privcmd mmap file_opStefano Stabellini
Set VM_PFNMAP in the privcmd mmap file_op, rather than later in xen_remap_domain_mfn_range when it is too late because vma_wants_writenotify has already been called and vm_page_prot has already been modified. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-10xen: do not release any memory under 1M in domain 0Ian Campbell
We already deliberately setup a 1-1 P2M for the region up to 1M in order to allow code which assumes this region is already mapped to work without having to convert everything to ioremap. Domain 0 should not return any apparently unused memory regions (reserved or otherwise) in this region to Xen since the e820 may not accurately reflect what the BIOS has stashed in this region. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-29xen: correct size of level2_kernel_pgtIan Campbell
sizeof(pmd_t *) is 4 bytes on 32-bit PAE leading to an allocation of only 2048 bytes. The correct size is sizeof(pmd_t) giving us a full page allocation. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-28Merge branch 'stable/xen-pcifront-0.8.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: register xen pci notifier xen: initialize cpu masks for pv guests in xen_smp_init xen: add a missing #include to arch/x86/pci/xen.c xen: mask the MTRR feature from the cpuid xen: make hvc_xen console work for dom0. xen: add the direct mapping area for ISA bus access xen: Initialize xenbus for dom0. xen: use vcpu_ops to setup cpu masks xen: map a dummy page for local apic and ioapic in xen_set_fixmap xen: remap MSIs into pirqs when running as initial domain xen: remap GSIs as pirqs when running as initial domain xen: introduce XEN_DOM0 as a silent option xen: map MSIs into pirqs xen: support GSI -> pirq remapping in PV on HVM guests xen: add xen hvm acpi_register_gsi variant acpi: use indirect call to register gsi in different modes xen: implement xen_hvm_register_pirq xen: get the maximum number of pirqs from xen xen: support pirq != irq * 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits) X86/PCI: Remove the dependency on isapnp_disable. xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright. x86: xen: Sanitse irq handling (part two) swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it. MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer. xen/pci: Request ACS when Xen-SWIOTLB is activated. xen-pcifront: Xen PCI frontend driver. xenbus: prevent warnings on unhandled enumeration values xenbus: Xen paravirtualised PCI hotplug support. xen/x86/PCI: Add support for the Xen PCI subsystem x86: Introduce x86_msi_ops msi: Introduce default_[teardown|setup]_msi_irqs with fallback. x86/PCI: Export pci_walk_bus function. x86/PCI: make sure _PAGE_IOMAP it set on pci mappings x86/PCI: Clean up pci_cache_line_size xen: fix shared irq device passthrough xen: Provide a variant of xen_poll_irq with timeout. xen: Find an unbound irq number in reverse order (high to low). xen: statically initialize cpu_evtchn_mask_p ... Fix up trivial conflicts in drivers/pci/Makefile
2010-10-27Merge branch 'akpm-incoming-2'Linus Torvalds
* akpm-incoming-2: (139 commits) epoll: make epoll_wait() use the hrtimer range feature select: rename estimate_accuracy() to select_estimate_accuracy() Remove duplicate includes from many files ramoops: use the platform data structure instead of module params kernel/resource.c: handle reinsertion of an already-inserted resource kfifo: fix kfifo_alloc() to return a signed int value w1: don't allow arbitrary users to remove w1 devices alpha: remove dma64_addr_t usage mips: remove dma64_addr_t usage sparc: remove dma64_addr_t usage fuse: use release_pages() taskstats: use real microsecond granularity for CPU times taskstats: split fill_pid function taskstats: separate taskstats commands delayacct: align to 8 byte boundary on 64-bit systems delay-accounting: reimplement -c for getdelays.c to report information on a target command namespaces Kconfig: move namespace menu location after the cgroup namespaces Kconfig: remove the cgroup device whitelist experimental tag namespaces Kconfig: remove pointless cgroup dependency namespaces Kconfig: make namespace a submenu ...
2010-10-27Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu: Remove the multi-page alignment facility x86-32: Allocate irq stacks seperate from percpu area x86-32, mm: Remove duplicated #include x86, printk: Get rid of <0> from stack output x86, kexec: Make sure to stop all CPUs before exiting the kernel x86/vsmp: Eliminate kconfig dependency warning
2010-10-27Remove duplicate includes from many filesZimny Lech
Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge branches 'upstream/xenfs' and 'upstream/core' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen/privcmd: make privcmd visible in domU xen/privcmd: move remap_domain_mfn_range() to core xen code and export. privcmd: MMAPBATCH: Fix error handling/reporting xenbus: export xen_store_interface for xenfs xen/privcmd: make sure vma is ours before doing anything to it xen/privcmd: print SIGBUS faults xen/xenfs: set_page_dirty is supposed to return true if it dirties xen/privcmd: create address space to allow writable mmaps xen: add privcmd driver xen: add variable hypercall caller xen: add xen_set_domain_pte() xen: add /proc/xen/xsd_{kva,port} to xenfs * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits) xen: include xen/xen.h for definition of xen_initial_domain() xen: use host E820 map for dom0 xen: correctly rebuild mfn list list after migration. xen: improvements to VIRQ_DEBUG output xen: set up IRQ before binding virq to evtchn xen: ensure that all event channels start off bound to VCPU 0 xen/hvc: only notify if we actually sent something xen: don't add extra_pages for RAM after mem_end xen: add support for PAT xen: make sure xen_max_p2m_pfn is up to date xen: limit extra memory to a certain ratio of base xen: add extra pages for E820 RAM regions, even if beyond mem_end xen: make sure xen_extra_mem_start is beyond all non-RAM e820 xen: implement "extra" memory to reserve space for pages not present at boot xen: Use host-provided E820 map xen: don't map missing memory xen: defer building p2m mfn structures until kernel is mapped xen: add return value to set_phys_to_machine() xen: convert p2m to a 3 level tree xen: make install_p2mtop_page() static ... Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of 'reserve_early()' - in the new memblock world order it is now 'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.
2010-10-26xen: initialize cpu masks for pv guests in xen_smp_initStefano Stabellini
Pv guests don't have ACPI and need the cpu masks to be set correctly as early as possible so we call xen_fill_possible_map from xen_smp_init. On the other hand the initial domain supports ACPI so in this case we skip xen_fill_possible_map and rely on it. However Xen might limit the number of cpus usable by the domain, so we filter those masks during smp initialization using the VCPUOP_is_up hypercall. It is important that the filtering is done before xen_setup_vcpu_info_placement. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-10-25xen: include xen/xen.h for definition of xen_initial_domain()Ian Campbell
CC arch/x86/xen/setup.o arch/x86/xen/setup.c: In function 'xen_memory_setup': arch/x86/xen/setup.c:161: error: implicit declaration of function 'xen_initial_domain' Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>