aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)Author
2022-05-08powerpc/8xx: Simplify flush_tlb_kernel_range()Christophe Leroy
In the same spirit as commit 63f501e07a85 ("powerpc/8xx: Simplify TLB handling"), simplify flush_tlb_kernel_range() for 8xx. 8xx cannot be SMP, and has 'tlbie' and 'tlbia' instructions, so an inline version of flush_tlb_kernel_range() for 8xx is worth it. With this page, first leg of change_page_attr() is: 2c: 55 29 00 3c rlwinm r9,r9,0,0,30 30: 91 23 00 00 stw r9,0(r3) 34: 7c 00 22 64 tlbie r4,r0 38: 7c 00 04 ac hwsync 3c: 38 60 00 00 li r3,0 40: 4e 80 00 20 blr Before the patch it was: 30: 55 29 00 3c rlwinm r9,r9,0,0,30 34: 91 2a 00 00 stw r9,0(r10) 38: 94 21 ff f0 stwu r1,-16(r1) 3c: 7c 08 02 a6 mflr r0 40: 38 83 10 00 addi r4,r3,4096 44: 90 01 00 14 stw r0,20(r1) 48: 48 00 00 01 bl 48 <change_page_attr+0x48> 48: R_PPC_REL24 flush_tlb_kernel_range 4c: 80 01 00 14 lwz r0,20(r1) 50: 38 60 00 00 li r3,0 54: 7c 08 03 a6 mtlr r0 58: 38 21 00 10 addi r1,r1,16 5c: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d2610043419ce3e0e53a85386baf2c3625af5cfb.1647877442.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: Add missing headersChristophe Leroy
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h, asm/pci.h etc... Include the needed headers, and remove asm/prom.h when it was needed exclusively for pulling necessary headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: Remove asm/prom.h from all files that don't need itChristophe Leroy
Several files include asm/prom.h for no reason. Clean it up. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop change to prom_parse.c as reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
2022-05-05powerpc: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2022-05-05powerpc: Simplify and move arch_randomize_brk()Christophe Leroy
arch_randomize_brk() is only needed for hash on book3s/64, for other platforms the one provided by the default mmap layout is good enough. Move it to hash_utils.c and use randomize_page() like the generic one. And properly opt out the radix case instead of making an assumption on mmu_highuser_ssize. Also change to a 32M range like most other architectures instead of 8M. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eafa4d18ec8ac7b98dd02b40181e61643707cc7c.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Convert to default topdown mmap layoutChristophe Leroy
Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and remove arch/powerpc/mm/mmap.c This change reuses the generic framework added by commit 67f3977f805b ("arm64, mm: move generic mmap layout functions to mm") without any functional change. Comparison between powerpc implementation and the generic one: - mmap_is_legacy() is identical. - arch_mmap_rnd() does exactly the same allthough it's written slightly differently. - MIN_GAP and MAX_GAP are identical. - mmap_base() does the same but uses STACK_RND_MASK which provides the same values as stack_maxrandom_size(). - arch_pick_mmap_layout() is identical. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/518f9def87d3c889d5958103e7463cf45a2f673d.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Enable full randomisation of memory mappingsChristophe Leroy
Do like most other architectures and provide randomisation also to "legacy" memory mappings, by adding the random factor to mm->mmap_base in arch_pick_mmap_layout(). See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 and X86_32") for all explanations and benefits of that mmap randomisation. At the moment, slice_find_area_bottomup() doesn't use mm->mmap_base but uses the fixed TASK_UNMAPPED_BASE instead. slice_find_area_bottomup() being used as a fallback to slice_find_area_topdown(), it can't use mm->mmap_base directly. Instead of always using TASK_UNMAPPED_BASE as base address, leave it to the caller. When called from slice_find_area_topdown() TASK_UNMAPPED_BASE is used. Otherwise mm->mmap_base is used. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/417fb10dde828534c73a03138b49621d74f4e5be.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Move get_unmapped_area functions to slice.cChristophe Leroy
hugetlb_get_unmapped_area() is now identical to the generic version if only RADIX is enabled, so move it to slice.c and let it fallback on the generic one when HASH MMU is not compiled in. Do the same with arch_get_unmapped_area() and arch_get_unmapped_area_topdown(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b5d9c124e82889e0cb115c150915a0c0d84eb960.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Use generic_hugetlb_get_unmapped_area()Christophe Leroy
Use the generic version of arch_hugetlb_get_unmapped_area() which is now available at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05f77014c619061638ecc52a0a4136eb04cc2799.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Use generic_get_unmapped_area() and call it from ↵Christophe Leroy
arch_get_unmapped_area() Use the generic version of arch_get_unmapped_area() which is now available at all time instead of its copy radix__arch_get_unmapped_area() To allow that for PPC64, add arch_get_mmap_base() and arch_get_mmap_end() macros. Instead of setting mm->get_unmapped_area() to either arch_get_unmapped_area() or generic_get_unmapped_area(), always set it to arch_get_unmapped_area() and call generic_get_unmapped_area() from there when radix is enabled. Do the same with radix__arch_get_unmapped_area_topdown() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/393be1fa386446443682fdb74544d733f68ef3bb.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Remove CONFIG_PPC_MM_SLICESChristophe Leroy
CONFIG_PPC_MM_SLICES is always selected by hash book3s/64. CONFIG_PPC_MM_SLICES is never selected by other platforms. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dc2cdc204de8978574bf7c02329b6cfc4db0bce7.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Make slice specific to book3s/64Christophe Leroy
Since commit 555904d07eef ("powerpc/8xx: MM_SLICE is not needed anymore") only book3s/64 selects CONFIG_PPC_MM_SLICES. Move slice.c into mm/book3s64/ Move necessary stuff in asm/book3s/64/slice.h and remove asm/slice.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4a0d74ef1966a5902b5fd4ac4b513a760a6d675a.1649523076.git.christophe.leroy@csgroup.eu
2022-05-05powerpc/mm: Move vma_mmu_pagesize()Christophe Leroy
vma_mmu_pagesize() is only required for slices, otherwise there is a generic weak version doing the exact same thing. Move it to slice.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1302e000d529c93d07208f1fae90f938e7a551b4.1649523076.git.christophe.leroy@csgroup.eu
2022-05-04powerpc/book3e: Fix sparse report in mm/nohash/fsl_book3e.cChristophe Leroy
Make tlbcam_addrs[] static. Declare TLBCAM[] in mm/mmu_decl.h And use NULL instead of 0 as pointer when calling restore_to_as0(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fd0cc30b5556428ce1d8a1fc0f983b462e88a956.1647200488.git.christophe.leroy@csgroup.eu
2022-05-04powerpc/mm: Switch from __FUNCTION__ to __func__Dwaipayan Ray
__FUNCTION__ exists only for backwards compatibility reasons with old gcc versions. Replace it with __func__. Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> [chleroy: Fixed parenthesis alignment] Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210711084536.95394-1-dwaipayanray1@gmail.com
2022-04-07Revert "powerpc: Set max_mapnr correctly"Kefeng Wang
This reverts commit 602946ec2f90d5bd965857753880db29d2d9a1e9. If CONFIG_HIGHMEM is enabled, no highmem will be added with max_mapnr set to max_low_pfn, see mem_init(): for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) { ... free_highmem_page(); } Now that virt_addr_valid() has been fixed in the previous commit, we can revert the change to max_mapnr. Fixes: 602946ec2f90 ("powerpc: Set max_mapnr correctly") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by: Erhard F. <erhard_f@mailbox.org> [mpe: Update change log to reflect series reordering] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220406145802.538416-2-mpe@ellerman.id.au
2022-03-31powerpc/numa: Handle partially initialized numa nodesSrikar Dronamraju
With commit 09f49dca570a ("mm: handle uninitialized numa nodes gracefully") NODE_DATA for even a memoryless/cpuless node is partially initialized at boot time. Before onlining the node, current Powerpc code checks for NODE_DATA to be NULL. However since NODE_DATA is partially initialized, this check will end up always being false. This causes hotplugging a CPU to a memoryless/cpuless node to fail. Before adding CPUs: $ numactl -H available: 1 nodes (4) node 4 cpus: 0 1 2 3 4 5 6 7 node 4 size: 97372 MB node 4 free: 95545 MB node distances: node 4 4: 10 $ lparstat System Configuration type=Dedicated mode=Capped smt=8 lcpu=1 mem=99709440 kB cpus=0 ent=1.00 %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 2.66 2.67 0.16 94.51 0.00 0.00 5.33 0.00 67749 0 After hotplugging 32 cores: $ numactl -H node 4 cpus: 0 1 2 3 4 5 6 7 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 node 4 size: 97372 MB node 4 free: 93636 MB node distances: node 4 4: 10 $ lparstat System Configuration type=Dedicated mode=Capped smt=8 lcpu=33 mem=99709440 kB cpus=0 ent=33.00 %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 0.04 0.02 0.00 99.94 0.00 0.00 0.06 0.00 1128751 3 As we can see numactl is listing only 8 cores while lparstat is showing 33 cores. Also dmesg is showing messages like: [ 2261.318350 ] BUG: arch topology borken [ 2261.318357 ] the DIE domain not a subset of the NODE domain Fixes: 09f49dca570a ("mm: handle uninitialized numa nodes gracefully") Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220330135123.1868197-1-srikar@linux.vnet.ibm.com
2022-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: "This is the material which was staged after willystuff in linux-next. Subsystems affected by this patch series: mm (debug, selftests, pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise), and selftests" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits) selftests: kselftest framework: provide "finished" helper mm: madvise: MADV_DONTNEED_LOCKED mm: fix race between MADV_FREE reclaim and blkdev direct IO read mm: generalize ARCH_HAS_FILTER_PGPROT mm: unmap_mapping_range_tree() with i_mmap_rwsem shared mm: warn on deleting redirtied only if accounted mm/huge_memory: remove stale locking logic from __split_huge_pmd() mm/huge_memory: remove stale page_trans_huge_mapcount() mm/swapfile: remove stale reuse_swap_page() mm/khugepaged: remove reuse_swap_page() usage mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() mm: streamline COW logic in do_swap_page() mm: slightly clarify KSM logic in do_swap_page() mm: optimize do_wp_page() for fresh pages in local LRU pagevecs mm: optimize do_wp_page() for exclusive pages in the swapcache mm/huge_memory: make is_transparent_hugepage() static userfaultfd/selftests: enable hugetlb remap and remove event testing selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test mm: enable MADV_DONTNEED for hugetlb mappings kasan: disable LOCKDEP when printing reports ...
2022-03-25Merge tag 'powerpc-5.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Livepatch support for 32-bit is probably the standout new feature, otherwise mostly just lots of bits and pieces all over the board. There's a series of commits cleaning up function descriptor handling, which touches a few other arches as well as LKDTM. It has acks from Arnd, Kees and Helge. Summary: - Enforce kernel RO, and implement STRICT_MODULE_RWX for 603. - Add support for livepatch to 32-bit. - Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. - Merge vdso64 and vdso32 into a single directory. - Fix build errors with newer binutils. - Add support for UADDR64 relocations, which are emitted by some toolchains. This allows powerpc to build with the latest lld. - Fix (another) potential userspace r13 corruption in transactional memory handling. - Cleanups of function descriptor handling & related fixes to LKDTM. Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET, Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar, Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek, Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding, Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean, Wedson Almeida Filho, and YueHaibing" * tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits) powerpc/pseries: Fix use after free in remove_phb_dynamic() powerpc/time: improve decrementer clockevent processing powerpc/time: Fix KVM host re-arming a timer beyond decrementer range powerpc/tm: Fix more userspace r13 corruption powerpc/xive: fix return value of __setup handler powerpc/64: Add UADDR64 relocation support powerpc: 8xx: fix a return value error in mpc8xx_pic_init powerpc/ps3: remove unneeded semicolons powerpc/64: Force inlining of prevent_user_access() and set_kuap() powerpc/bitops: Force inlining of fls() powerpc: declare unmodified attribute_group usages const powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n powerpc/secvar: fix refcount leak in format_show() powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E powerpc: Move C prototypes out of asm-prototypes.h powerpc/kexec: Declare kexec_paca static powerpc/smp: Declare current_set static powerpc: Cleanup asm-prototypes.c powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S ...
2022-03-24mm/migration: add trace events for THP migrationsAnshuman Khandual
Patch series "mm/migration: Add trace events", v3. This adds trace events for all migration scenarios including base page, THP and HugeTLB. This patch (of 3): This adds two trace events for PMD based THP migration without split. These events closely follow the implementation details like setting and removing of PMD migration entries, which are essential operations for THP migration. This moves CREATE_TRACE_POINTS into generic THP from powerpc for these new trace events to be available on other platforms as well. Link: https://lkml.kernel.org/r/1643368182-9588-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1643368182-9588-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-08powerpc: Move C prototypes out of asm-prototypes.hChristophe Leroy
We originally added asm-prototypes.h in commit 42f5b4cacd78 ("powerpc: Introduce asm-prototypes.h"). It's purpose was for prototypes of C functions that are only called from asm, in order to fix sparse warnings about missing prototypes. A few months later Nick added a different use case in commit 4efca4ed05cb ("kbuild: modversions for EXPORT_SYMBOL() for asm") for C prototypes for exported asm functions. This is basically the inverse of our original usage. Since then we've added various prototypes to asm-prototypes.h for both reasons, meaning we now need to unstitch it all. Dispatch prototypes of C functions into relevant headers and keep only the prototypes for functions defined in assembly. For the time being, leave prom_init() there because moving it into asm/prom.h or asm/setup.h conflicts with drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowrom.o This will be fixed later by untaggling asm/pci.h and asm/prom.h or by renaming the function in shadowrom.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62d46904eca74042097acf4cb12c175e3067f3d1.1646413435.git.christophe.leroy@csgroup.eu
2022-03-08powerpc/64s: Don't use DSISR for SLB faultsMichael Ellerman
Since commit 46ddcb3950a2 ("powerpc/mm: Show if a bad page fault on data is read or write.") we use page_fault_is_write(regs->dsisr) in __bad_page_fault() to determine if the fault is for a read or write, and change the message printed accordingly. But SLB faults, aka Data Segment Interrupts, don't set DSISR (Data Storage Interrupt Status Register) to a useful value. All ISA versions from v2.03 through v3.1 specify that the Data Segment Interrupt sets DSISR "to an undefined value". As far as I can see there's no mention of SLB faults setting DSISR in any BookIV content either. This manifests as accesses that should be a read being incorrectly reported as writes, for example, using the xmon "dump" command: 0:mon> d 0x5deadbeef0000000 5deadbeef0000000 [359526.415354][ C6] BUG: Unable to handle kernel data access on write at 0x5deadbeef0000000 [359526.415611][ C6] Faulting instruction address: 0xc00000000010a300 cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf400] pc: c00000000010a300: mread+0x90/0x190 If we disassemble the PC, we see a load instruction: 0:mon> di c00000000010a300 c00000000010a300 89490000 lbz r10,0(r9) We can also see in exceptions-64s.S that the data_access_slb block doesn't set IDSISR=1, which means it doesn't load DSISR into pt_regs. So the value we're using to determine if the fault is a read/write is some stale value in pt_regs from a previous page fault. Rework the printing logic to separate the SLB fault case out, and only print read/write in the cases where we can determine it. The result looks like eg: 0:mon> d 0x5deadbeef0000000 5deadbeef0000000 [ 721.779525][ C6] BUG: Unable to handle kernel data access at 0x5deadbeef0000000 [ 721.779697][ C6] Faulting instruction address: 0xc00000000014cbe0 cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390] 0:mon> d 0 0000000000000000 [ 742.793242][ C6] BUG: Kernel NULL pointer dereference at 0x00000000 [ 742.793316][ C6] Faulting instruction address: 0xc00000000014cbe0 cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390] Fixes: 46ddcb3950a2 ("powerpc/mm: Show if a bad page fault on data is read or write.") Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20220222113449.319193-1-mpe@ellerman.id.au
2022-03-03mm: don't include <linux/memremap.h> in <linux/mm.h>Christoph Hellwig
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-01powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()Daniel Henrique Barboza
Executing node_set_online() when nid = NUMA_NO_NODE results in an undefined behavior. node_set_online() will call node_set_state(), into __node_set(), into set_bit(), and since NUMA_NO_NODE is -1 we'll end up doing a negative shift operation inside arch/powerpc/include/asm/bitops.h. This potential UB was detected running a kernel with CONFIG_UBSAN. The behavior was introduced by commit 10f78fd0dabb ("powerpc/numa: Fix a regression on memoryless node 0"), where the check for nid > 0 was removed to fix a problem that was happening with nid = 0, but the result is that now we're trying to online NUMA_NO_NODE nids as well. Checking for nid >= 0 will allow node 0 to be onlined while avoiding this UB with NUMA_NO_NODE. Fixes: 10f78fd0dabb ("powerpc/numa: Fix a regression on memoryless node 0") Reported-by: Ping Fang <pifang@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224182312.1012527-1-danielhb413@gmail.com
2022-02-24powerpc/64s/hash: Make hash faults work in NMI contextNicholas Piggin
Hash faults are not resoved in NMI context, instead causing the access to fail. This is done because perf interrupts can get backtraces including walking the user stack, and taking a hash fault on those could deadlock on the HPTE lock if the perf interrupt hits while the same HPTE lock is being held by the hash fault code. The user-access for the stack walking will notice the access failed and deal with that in the perf code. The reason to allow perf interrupts in is to better profile hash faults. The problem with this is any hash fault on a kernel access that happens in NMI context will crash, because kernel accesses must not fail. Hard lockups, system reset, machine checks that access vmalloc space including modules and including stack backtracing and symbol lookup in modules, per-cpu data, etc could all run into this problem. Fix this by disallowing perf interrupts in the hash fault code (the direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs to extend to the preload case). This simplifies the tricky logic in hash faults and perf, at the cost of reduced profiling of hash faults. perf can still latch addresses when interrupts are disabled, it just won't get the stack trace at that point, so it would still find hot spots, just sometimes with confusing stack chains. An alternative could be to allow perf interrupts here but always do the slowpath stack walk if we are in nmi context, but that slows down all perf interrupt stack walking on hash though and it does not remove as much tricky code. Reported-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com
2022-02-12powerpc/mm: Update default hugetlb size earlyAneesh Kumar K.V
commit: d9c234005227 ("Do not depend on MAX_ORDER when grouping pages by mobility") introduced pageblock_order which will be used to group pages better. The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT should be set before we call set_pageblock_order. set_pageblock_order happens early in the boot and default hugetlb page size should be initialized before that to compute the right pageblock_order value. Currently, default hugetlbe page size is set via arch_initcalls which happens late in the boot as shown via the below callstack: [c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8 [c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320 [c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8 [c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c [c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64 and the pageblock_order initialization is done early during the boot. [c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64 [c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268 [c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328 [c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480 [c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934 [c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98 delaying default hugetlb page size initialization implies the kernel will initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal value for mobility grouping. IIUC we always had this issue. But it was not a problem for hash translation mode because (MAX_ORDER - 1) is the same as HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix, HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be 5 instead of 8. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220211065215.101767-1-aneesh.kumar@linux.ibm.com
2022-02-12powerpc/32s: Enable STRICT_MODULE_RWX for the 603 coreChristophe Leroy
The book3s/32 MMU doesn't support per page execution protection and doesn't support RO protection for kernel pages. However, on the 603 which implements software loaded TLBs, execution protection is honored by the TLB Miss handler which doesn't load Instruction TLB for non executable pages. And RO protection is honored by clearing the C bit for RO pages, leading to DSI. So on the 603, STRICT_MODULE_RWX is possible without much effort. Don't disable STRICT_MODULE_RWX on book3s/32 and print a warning in case STRICT_MODULE_RWX has been selected and the platform has a Hardware HASH MMU. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1e6162f334167e75f1140082932e3a354b16daba.1642413973.git.christophe.leroy@csgroup.eu
2022-02-12powerpc: Add set_memory_{p/np}() and remove set_memory_attr()Christophe Leroy
set_memory_attr() was implemented by commit 4d1755b6a762 ("powerpc/mm: implement set_memory_attr()") because the set_memory_xx() couldn't be used at that time to modify memory "on the fly" as explained it the commit. But set_memory_attr() uses set_pte_at() which leads to warnings when CONFIG_DEBUG_VM is selected, because set_pte_at() is unexpected for updating existing page table entries. The check could be bypassed by using __set_pte_at() instead, as it was the case before commit c988cfd38e48 ("powerpc/32: use set_memory_attr()") but since commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") it is now possible to use set_memory_xx() functions to update page table entries "on the fly" because the update is now atomic. For DEBUG_PAGEALLOC we need to clear and set back _PAGE_PRESENT. Add set_memory_np() and set_memory_p() for that. Replace all uses of set_memory_attr() by the relevant set_memory_xx() and remove set_memory_attr(). Fixes: c988cfd38e48 ("powerpc/32: use set_memory_attr()") Cc: stable@vger.kernel.org Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Maxime Bizon <mbizon@freebox.fr> Reviewed-by: Russell Currey <ruscur@russell.cc> Depends-on: 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cda2b44b55c96f9ac69fa92e68c01084ec9495c5.1640344012.git.christophe.leroy@csgroup.eu
2022-02-12powerpc/set_memory: Avoid spinlock recursion in change_page_attr()Christophe Leroy
Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") included a spin_lock() to change_page_attr() in order to safely perform the three step operations. But then commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") modify it to use pte_update() and do the operation safely against concurrent access. In the meantime, Maxime reported some spinlock recursion. [ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217 [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0 [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523 [ 15.373350] Workqueue: events do_free_init [ 15.377615] Call Trace: [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable) [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4 [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0 [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8 [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94 [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134 [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8 [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8 [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94 [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8 [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8 [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210 [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c Remove the read / modify / write sequence to make the operation atomic and remove the spin_lock() in change_page_attr(). To do the operation atomically, we can't use pte modification helpers anymore. Because all platforms have different combination of bits, it is not easy to use those bits directly. But all have the _PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare two sets to know which bits are set or cleared. For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you know which bit gets cleared and which bit get set when changing exec permission. Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/ Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.1640344012.git.christophe.leroy@csgroup.eu
2022-02-03powerpc/kasan: Fix early region not updated correctlyChen Jingwen
The shadow's page table is not updated when PTE_RPN_SHIFT is 24 and PAGE_SHIFT is 12. It not only causes false positives but also false negative as shown the following text. Fix it by bringing the logic of kasan_early_shadow_page_entry here. 1. False Positive: ================================================================== BUG: KASAN: vmalloc-out-of-bounds in pcpu_alloc+0x508/0xa50 Write of size 16 at addr f57f3be0 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-12267-gdebe436e77c7 #1 Call Trace: [c80d1c20] [c07fe7b8] dump_stack_lvl+0x4c/0x6c (unreliable) [c80d1c40] [c02ff668] print_address_description.constprop.0+0x88/0x300 [c80d1c70] [c02ff45c] kasan_report+0x1ec/0x200 [c80d1cb0] [c0300b20] kasan_check_range+0x160/0x2f0 [c80d1cc0] [c03018a4] memset+0x34/0x90 [c80d1ce0] [c0280108] pcpu_alloc+0x508/0xa50 [c80d1d40] [c02fd7bc] __kmem_cache_create+0xfc/0x570 [c80d1d70] [c0283d64] kmem_cache_create_usercopy+0x274/0x3e0 [c80d1db0] [c2036580] init_sd+0xc4/0x1d0 [c80d1de0] [c00044a0] do_one_initcall+0xc0/0x33c [c80d1eb0] [c2001624] kernel_init_freeable+0x2c8/0x384 [c80d1ef0] [c0004b14] kernel_init+0x24/0x170 [c80d1f10] [c001b26c] ret_from_kernel_thread+0x5c/0x64 Memory state around the buggy address: f57f3a80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f57f3b00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 >f57f3b80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ f57f3c00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f57f3c80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ================================================================== 2. False Negative (with KASAN tests): ================================================================== Before fix: ok 45 - kmalloc_double_kzfree # vmalloc_oob: EXPECTATION FAILED at lib/test_kasan.c:1039 KASAN failure expected in "((volatile char *)area)[3100]", but none occurred not ok 46 - vmalloc_oob not ok 1 - kasan ================================================================== After fix: ok 1 - kasan Fixes: cbd18991e24fe ("powerpc/mm: Fix an Oops in kasan_mmu_init()") Cc: stable@vger.kernel.org # 5.4.x Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211229035226.59159-1-chenjingwen6@huawei.com
2022-02-02powerpc/ptdump: Fix sparse warning in hashpagetable.cMichael Ellerman
As reported by sparse: arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 degrades to integer arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 degrades to integer arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in assignment (different base types) arch/powerpc/mm/ptdump/hashpagetable.c:267:36: expected unsigned long long [usertype] arch/powerpc/mm/ptdump/hashpagetable.c:267:36: got restricted __be64 [usertype] v arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in assignment (different base types) arch/powerpc/mm/ptdump/hashpagetable.c:268:36: expected unsigned long long [usertype] arch/powerpc/mm/ptdump/hashpagetable.c:268:36: got restricted __be64 [usertype] r The values returned by plpar_pte_read_4() are CPU endian, not __be64, so assigning them to struct hash_pte confuses sparse. As a minimal fix open code a struct to hold the values with CPU endian types. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202053039.691917-1-mpe@ellerman.id.au
2022-01-24powerpc/fixmap: Fix VM debug warning on unmapChristophe Leroy
Unmapping a fixmap entry is done by calling __set_fixmap() with FIXMAP_PAGE_CLEAR as flags. Today, powerpc __set_fixmap() calls map_kernel_page(). map_kernel_page() is not happy when called a second time for the same page. WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0xc/0x1e8 CPU: 0 PID: 1 Comm: swapper Not tainted 5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty #682 NIP: c0017cd4 LR: c00187f0 CTR: 00000010 REGS: e1011d50 TRAP: 0700 Not tainted (5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 42000208 XER: 00000000 GPR00: c0165fec e1011e10 c14c0000 c0ee2550 ff800000 c0f3d000 00000000 c001686c GPR08: 00001000 b00045a9 00000001 c0f58460 c0f50000 00000000 c0007e10 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 c0ee2550 00000000 c0f57000 00000ff8 00000000 ff800000 NIP [c0017cd4] set_pte_at+0xc/0x1e8 LR [c00187f0] map_kernel_page+0x9c/0x100 Call Trace: [e1011e10] [c0736c68] vsnprintf+0x358/0x6c8 (unreliable) [e1011e30] [c0165fec] __set_fixmap+0x30/0x44 [e1011e40] [c0c13bdc] early_iounmap+0x11c/0x170 [e1011e70] [c0c06cb0] ioremap_legacy_serial_console+0x88/0xc0 [e1011e90] [c0c03634] do_one_initcall+0x80/0x178 [e1011ef0] [c0c0385c] kernel_init_freeable+0xb4/0x250 [e1011f20] [c0007e34] kernel_init+0x24/0x140 [e1011f30] [c0016268] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7fe3fb78 48019689 80010014 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 81250000 712a0001 41820008 <0fe00000> 9421ffe0 93e1001c 48000030 Implement unmap_kernel_page() which clears an existing pte. Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b0b752f6f6ecc60653e873f385c6f0dce4e9ab6a.1638789098.git.christophe.leroy@csgroup.eu
2022-01-23Merge tag 'powerpc-5.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A series of bpf fixes, including an oops fix and some codegen fixes. - Fix a regression in syscall_get_arch() for compat processes. - Fix boot failure on some 32-bit systems with KASAN enabled. - A couple of other build/minor fixes. Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin. * tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Mask SRR0 before checking against the masked NIP powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64 powerpc/32s: Fix kasan_init_region() for KASAN powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32 powerpc/audit: Fix syscall_get_arch() powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06 tools/bpf: Rename 'struct event' to avoid naming conflict powerpc/bpf: Update ldimm64 instructions during extra pass powerpc32/bpf: Fix codegen for bpf-to-bpf calls bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-16powerpc/32s: Fix kasan_init_region() for KASANChristophe Leroy
It has been reported some configuration where the kernel doesn't boot with KASAN enabled. This is due to wrong BAT allocation for the KASAN area: ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xcfffffff 0x00000000 256M Kernel rw m 1: 0xd0000000-0xdfffffff 0x10000000 256M Kernel rw m 2: 0xe0000000-0xefffffff 0x20000000 256M Kernel rw m 3: 0xf8000000-0xf9ffffff 0x2a000000 32M Kernel rw m 4: 0xfa000000-0xfdffffff 0x2c000000 64M Kernel rw m A BAT must have both virtual and physical addresses alignment matching the size of the BAT. This is not the case for BAT 4 above. Fix kasan_init_region() by using block_size() function that is in book3s32/mmu.c. To be able to reuse it here, make it non static and change its name to bat_block_size() in order to avoid name conflict with block_size() defined in <linux/blkdev.h> Also reuse find_free_bat() to avoid an error message from setbat() when no BAT is available. And allocate memory outside of linear memory mapping to avoid wasting that precious space. With this change we get correct alignment for BATs and KASAN shadow memory is allocated outside the linear memory space. ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xcfffffff 0x00000000 256M Kernel rw 1: 0xd0000000-0xdfffffff 0x10000000 256M Kernel rw 2: 0xe0000000-0xefffffff 0x20000000 256M Kernel rw 3: 0xf8000000-0xfbffffff 0x7c000000 64M Kernel rw 4: 0xfc000000-0xfdffffff 0x7a000000 32M Kernel rw Fixes: 7974c4732642 ("powerpc/32s: Implement dedicated kasan_init_region()") Cc: stable@vger.kernel.org Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7a50ef902494d1325227d47d33dada01e52e5518.1641818726.git.christophe.leroy@csgroup.eu
2022-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...
2022-01-15mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bitQi Zheng
Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") allowed VM_FAULT_RETRY for multiple times, the FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page fault path, so the following check is no longer needed: flags & FAULT_FLAG_ALLOW_RETRY So just remove it. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14Merge tag 'powerpc-5.17-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Optimise radix KVM guest entry/exit by 2x on Power9/Power10. - Allow firmware to tell us whether to disable the entry and uaccess flushes on Power10 or later CPUs. - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits. - Several fixes and improvements to our hard lockup watchdog. - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit. - Allow building the 64-bit Book3S kernel without hash MMU support, ie. Radix only. - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit). - Add new encodings for perf_mem_data_src.mem_hops field, and use them on Power10. - A series of small performance improvements to 64-bit interrupt entry. - Several commits fixing issues when building with the clang integrated assembler. - Many other small features and fixes. Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child, Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring, Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool, Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang Guang. * tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits) powerpc/xmon: Dump XIVE information for online-only processors. powerpc/opal: use default_groups in kobj_type powerpc/cacheinfo: use default_groups in kobj_type powerpc/sched: Remove unused TASK_SIZE_OF powerpc/xive: Add missing null check after calling kmalloc powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API selftests/powerpc: Add a test of sigreturning to an unaligned address powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings powerpc/64s: Mask NIP before checking against SRR0 powerpc/perf: Fix spelling of "its" powerpc/32: Fix boot failure with GCC latent entropy plugin powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests powerpc/code-patching: Move code patching selftests in its own file powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h powerpc/code-patching: Move patch_exception() outside code-patching.c powerpc/code-patching: Use test_trampoline for prefixed patch test powerpc/code-patching: Fix patch_branch() return on out-of-range failure powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling powerpc/code-patching: Fix unmap_patch_area() error handling powerpc/code-patching: Fix error handling in do_patch_instruction() ...
2022-01-12Merge tag 'devicetree-for-5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "Bindings: - DT schema conversions for Samsung clocks, RNG bindings, Qcom Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl, Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb - DT schema conversions for Broadcom platforms: interrupt controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux, iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON, SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB arbiter, and SATA - Add binding schemas for Tegra210 EMC table, TI DC-DC converters, - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues - More fixes due to 'unevaluatedProperties' enabling - Data type fixes and clean-ups of binding examples found in preparation to move to validating DTB files directly (instead of intermediate YAML representation. - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus - Add various new compatible strings DT core: - Silence a warning for overlapping reserved memory regions - Reimplement unittest overlay tracking - Fix stack frame size warning in unittest - Clean-ups of early FDT scanning functions - Fix handling of "linux,usable-memory-range" on EFI booted systems - Add support for 'fail' status on CPU nodes - Improve error message in of_phandle_iterator_next() - kbuild: Disable duplicate unit-address warnings for disabled nodes" * tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits) dt-bindings: net: mdio: Drop resets/reset-names child properties dt-bindings: clock: samsung: convert S5Pv210 to dtschema dt-bindings: clock: samsung: convert Exynos5410 to dtschema dt-bindings: clock: samsung: convert Exynos5260 to dtschema dt-bindings: clock: samsung: extend Exynos7 bindings with UFS dt-bindings: clock: samsung: convert Exynos7 to dtschema dt-bindings: clock: samsung: convert Exynos5433 to dtschema dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712 dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp' dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example dt-bindings: clock: imx5: Drop clock consumer node from example dt-bindings: Drop required 'interrupt-parent' dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance' dt-bindings: net: wireless: mt76: Fix 8-bit property sizes dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry dt-bindings: net: stm32-dwmac: Make each example a separate entry dt-bindings: net: Cleanup MDIO node schemas ...
2021-12-23powerpc/code-patching: Move patch_exception() outside code-patching.cChristophe Leroy
patch_exception() is dedicated to book3e/64 is nothing more than a normal use of patch_branch(), so move it into a place dedicated to book3e/64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0968622b98b1fb51838c35b844c42ad6609de62e.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23powerpc/code-patching: Remove init_mem_is_freeChristophe Leroy
A new state has been added by commit d2635f2012a4 ("mm: create a new system state and fix core_kernel_text()"). That state tells when initmem is about to be released and is redundant with init_mem_is_free. Remove init_mem_is_free. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ad8c3ccb39c8edaa89fd3eda1cc7218baea1cde5.1638446239.git.christophe.leroy@csgroup.eu
2021-12-23powerpc/mm/book3s64/hash: Switch pre 2.06 tlbiel to .longAlexey Kardashevskiy
The llvm integrated assembler does not recognise the ISA 2.05 tlbiel version. Work around it by switching to .long when an old arch level detected. Signed-off-by: Daniel Axtens <dja@axtens.net> [aik: did "Eventually do this more smartly"] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-7-aik@ozlabs.ru
2021-12-23powerpc/mm: Switch obsolete dssall to .longAlexey Kardashevskiy
The dssall ("Data Stream Stop All") instruction is obsolete altogether with other Data Cache Instructions since ISA 2.03 (year 2006). LLVM IAS does not support it but PPC970 seems to be using it. This switches dssall to .long as there is no much point in fixing LLVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23powerpc/mm: Add __init attribute to eligible functionsNick Child
Some functions defined in 'arch/powerpc/mm' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-4-nick.child@ibm.com
2021-12-21powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversionMichael Ellerman
In note_prot_wx() we bail out without reporting anything if CONFIG_PPC_DEBUG_WX is disabled. But CONFIG_PPC_DEBUG_WX was removed in the conversion to generic ptdump, we now need to use CONFIG_DEBUG_WX instead. Fixes: e084728393a5 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/20211203124112.2912562-1-mpe@ellerman.id.au
2021-12-20powerpc/64s/radix: Fix huge vmap false positiveNicholas Piggin
pmd_huge() is defined to false when HUGETLB_PAGE is not configured, but the vmap code still installs huge PMDs. This leads to false bad PMD errors when vunmapping because it is not seen as a huge PTE, and the bad PMD check catches it. The end result may not be much more serious than some bad pmd warning messages, because the pmd_none_or_clear_bad() does what we wanted and clears the huge PTE anyway. Fix this by checking pmd_is_leaf(), which checks for a PTE regardless of config options. The whole huge/large/leaf stuff is a tangled mess but that's kernel-wide and not something we can improve much in arch/powerpc code. pmd_page(), pud_page(), etc., called by vmalloc_to_page() on huge vmaps can similarly trigger a false VM_BUG_ON when CONFIG_HUGETLB_PAGE=n, so those checks are adjusted. The checks were added by commit d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk"), while implementing a similar fix for other page table walking functions. Fixes: d909f9109c30 ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216103342.609192-1-npiggin@gmail.com
2021-12-16of/fdt: Rework early_init_dt_scan_chosen() to call directlyRob Herring
Use of the of_scan_flat_dt() function predates libfdt and is discouraged as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_chosen() to be called directly and use libfdt. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Link: https://lore.kernel.org/r/20211118181213.1433346-2-robh@kernel.org
2021-12-09powerpc/inst: Optimise copy_inst_from_kernel_nofault()Christophe Leroy
copy_inst_from_kernel_nofault() uses copy_from_kernel_nofault() to copy one or two 32bits words. This means calling an out-of-line function which itself calls back copy_from_kernel_nofault_allowed() then performs a generic copy with loops. Rewrite copy_inst_from_kernel_nofault() to do everything at a single place and use __get_kernel_nofault() directly to perform single accesses without loops. Allthough the generic function uses pagefault_disable(), it is not required on powerpc because do_page_fault() bails earlier when a kernel mode fault happens on a kernel address. As the function has now become very small, inline it. With this change, on an 8xx the time spent in the loop in ftrace_replace_code() is reduced by 23% at function tracer activation and 27% at nop tracer activation. The overall time to activate function tracer (measured with shell command 'time') is 570ms before the patch and 470ms after the patch. Even vmlinux size is reduced (by 152 instruction). Before the patch: 00000018 <copy_inst_from_kernel_nofault>: 18: 94 21 ff e0 stwu r1,-32(r1) 1c: 7c 08 02 a6 mflr r0 20: 38 a0 00 04 li r5,4 24: 93 e1 00 1c stw r31,28(r1) 28: 7c 7f 1b 78 mr r31,r3 2c: 38 61 00 08 addi r3,r1,8 30: 90 01 00 24 stw r0,36(r1) 34: 48 00 00 01 bl 34 <copy_inst_from_kernel_nofault+0x1c> 34: R_PPC_REL24 copy_from_kernel_nofault 38: 2c 03 00 00 cmpwi r3,0 3c: 40 82 00 0c bne 48 <copy_inst_from_kernel_nofault+0x30> 40: 81 21 00 08 lwz r9,8(r1) 44: 91 3f 00 00 stw r9,0(r31) 48: 80 01 00 24 lwz r0,36(r1) 4c: 83 e1 00 1c lwz r31,28(r1) 50: 38 21 00 20 addi r1,r1,32 54: 7c 08 03 a6 mtlr r0 58: 4e 80 00 20 blr After the patch (before inlining): 00000018 <copy_inst_from_kernel_nofault>: 18: 3d 20 b0 00 lis r9,-20480 1c: 7c 04 48 40 cmplw r4,r9 20: 7c 69 1b 78 mr r9,r3 24: 41 80 00 14 blt 38 <copy_inst_from_kernel_nofault+0x20> 28: 81 44 00 00 lwz r10,0(r4) 2c: 38 60 00 00 li r3,0 30: 91 49 00 00 stw r10,0(r9) 34: 4e 80 00 20 blr 38: 38 60 ff de li r3,-34 3c: 4e 80 00 20 blr 40: 38 60 ff f2 li r3,-14 44: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Add clang workaround, with version check as suggested by Nathan] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0d5b12183d5176dd702d29ad94c39c384e51c78f.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/inst: Define ppc_inst_tChristophe Leroy
In order to stop using 'struct ppc_inst' on PPC32, define a ppc_inst_t typedef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATsChristophe Leroy
Today we have the following IBATs allocated: ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 4M Kernel x m 1: 0xc0400000-0xc05fffff 0x00400000 2M Kernel x m 2: 0xc0600000-0xc06fffff 0x00600000 1M Kernel x m 3: 0xc0700000-0xc077ffff 0x00700000 512K Kernel x m 4: 0xc0780000-0xc079ffff 0x00780000 128K Kernel x m 5: 0xc07a0000-0xc07bffff 0x007a0000 128K Kernel x m 6: - 7: - The two 128K should be a single 256K instead. When _etext is not aligned to 128Kbytes, the system will allocate all necessary BATs to the lower 128Kbytes boundary, then allocate an additional 128Kbytes BAT for the remaining block. Instead, align the top to 128Kbytes so that the function directly allocates a 256Kbytes last block: ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 4M Kernel x m 1: 0xc0400000-0xc05fffff 0x00400000 2M Kernel x m 2: 0xc0600000-0xc06fffff 0x00600000 1M Kernel x m 3: 0xc0700000-0xc077ffff 0x00700000 512K Kernel x m 4: 0xc0780000-0xc07bffff 0x00780000 256K Kernel x m 5: - 6: - 7: - Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ab58b296832b0ec650e2203200e060adbcb2677d.1637930421.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Wire-up KUAP on book3e/64Christophe Leroy
This adds KUAP support to book3e/64. This is done by reading the content of SPRN_MAS1 and checking the TID at the time user pgtable is loaded. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e2c2c9375afd4bbc06aa904d0103a5f5102a2b1a.1634627931.git.christophe.leroy@csgroup.eu