powerpc: Prepare to support kernel handling of IOMMU map/unmap

The current VFIO-on-POWER implementation supports only user mode driven mapping, i.e. QEMU is sending requests to map/unmap pages. However this approach is really slow, so we want to move that to KVM. Since H_PUT_TCE can be extremely performance sensitive (especially with network adapters where each packet needs to be mapped/unmapped) we chose to implement that as a "fast" hypercall directly in "real mode" (processor still in the guest context but MMU off). To be able to do that, we need to provide some facilities to access the struct page count within that real mode environment as things like the sparsemem vmemmap mappings aren't accessible. This adds an API function realmode_pfn_to_page() to get page struct when MMU is off. This adds to MM a new function put_page_unless_one() which drops a page if counter is bigger than 1. It is going to be used when MMU is off (for example, real mode on PPC64) and we want to make sure that page release will not happen in real mode as it may crash the kernel in a horrible way. CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported. Cc: linux-mm@kvack.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
author: Alexey Kardashevskiy 2013-08-28 18:37:42 +1000
committer: Benjamin Herrenschmidt 2013-10-11 17:24:39 +1100
commit: 8e0861fa3c4edfc2f30dd4cf4d58d3929f7c1b23 (patch)
tree: 07a0b749245da2e29918d9f2121d7ca68e9b2763 /arch/powerpc/mm
parent: 81fcfb813fe99c30f77dd3ed9a4e541d14a9ed01 (diff)
1 files changed, 50 insertions, 1 deletions
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 8ed035d2edb5..e3734edffa69 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -304,5 +304,54 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
 }
-#endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
+/*
+ * We do not have access to the sparsemem vmemmap, so we fallback to
+ * walking the list of sparsemem blocks which we already maintain for
+ * the sake of crashdump. In the long run, we might want to maintain
+ * a tree if performance of that linear walk becomes a problem.
+ *
+ * realmode_pfn_to_page functions can fail due to:
+ * 1) As real sparsemem blocks do not lay in RAM continously (they
+ * are in virtual address space which is not available in the real mode),
+ * the requested page struct can be split between blocks so get_page/put_page
+ * may fail.
+ * 2) When huge pages are used, the get_page/put_page API will fail
+ * in real mode as the linked addresses in the page struct are virtual
+ * too.
+ */
+struct page *realmode_pfn_to_page(unsigned long pfn)
+{
+	struct vmemmap_backing *vmem_back;
+	struct page *page;
+	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
+	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
+
+	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
+		if (pg_va < vmem_back->virt_addr)
+			continue;
+
+		/* Check that page struct is not split between real pages */
+		if ((pg_va + sizeof(struct page)) >
+				(vmem_back->virt_addr + page_size))
+			return NULL;
+
+		page = (struct page *) (vmem_back->phys + pg_va -
+				vmem_back->virt_addr);
+		return page;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
+
+#elif defined(CONFIG_FLATMEM)
+
+struct page *realmode_pfn_to_page(unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	return page;
+}
+EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
+
+#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
author	Alexey Kardashevskiy	2013-08-28 18:37:42 +1000
committer	Benjamin Herrenschmidt	2013-10-11 17:24:39 +1100
commit	8e0861fa3c4edfc2f30dd4cf4d58d3929f7c1b23 (patch)
tree	07a0b749245da2e29918d9f2121d7ca68e9b2763 /arch/powerpc/mm
parent	81fcfb813fe99c30f77dd3ed9a4e541d14a9ed01 (diff)