aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2022-05-13percpu: improve percpu_alloc_percpu event traceVasily Averin
Add call_site, bytes_alloc and gfp_flags fields to the output of the percpu_alloc_percpu ftrace event: mkdir-4393 [001] 169.334788: percpu_alloc_percpu: call_site=mem_cgroup_css_alloc+0xa6 reserved=0 is_atomic=0 size=2408 align=8 base_addr=0xffffc7117fc00000 off=402176 ptr=0x3dc867a62300 bytes_alloc=14448 gfp_flags=GFP_KERNEL_ACCOUNT This is required to track memcg-accounted percpu allocations. Link: https://lkml.kernel.org/r/a07be858-c8a3-7851-9086-e3262cbcf707@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/damon/reclaim: use resource_size function on resource objectJiapeng Chong
Fix the following coccicheck warnings: ./mm/damon/reclaim.c:241:30-33: WARNING: Suspicious code. resource_size is maybe missing with res. Link: https://lkml.kernel.org/r/20220507032512.129598-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: "Boehme, Markus" <markubo@amazon.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: functions may simplify the use of return valuesLi kunyu
p4d_clear_huge may be optimized for void return type and function usage. vunmap_p4d_range function saves a few steps here. Link: https://lkml.kernel.org/r/20220507150630.90399-1-kunyu@nfschina.com Signed-off-by: Li kunyu <kunyu@nfschina.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_table_check: move pxx_user_accessible_page into x86Kefeng Wang
The pxx_user_accessible_page() checks the PTE bit, it's architecture-specific code, move them into x86's pgtable.h. These helpers are being moved out to make the page table check framework platform independent. Link: https://lkml.kernel.org/r/20220507110114.4128854-3-tongtiangen@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_table_check: using PxD_SIZE instead of PxD_PAGE_SIZETong Tiangen
Patch series "mm: page_table_check: add support on arm64 and riscv", v7. Page table check performs extra verifications at the time when new pages become accessible from the userspace by getting their page table entries (PTEs PMDs etc.) added into the table. It is supported on X86[1]. This patchset made some simple changes and make it easier to support new architecture, then we support this feature on ARM64 and RISCV. [1]https://lore.kernel.org/lkml/20211123214814.3756047-1-pasha.tatashin@soleen.com/ This patch (of 6): Compared with PxD_PAGE_SIZE, which is defined and used only on X86, PxD_SIZE is more common in each architecture. Therefore, it is more reasonable to use PxD_SIZE instead of PxD_PAGE_SIZE in page_table_check.c. At the same time, it is easier to support page table check in other architectures. The substitution has no functional impact on the x86. Link: https://lkml.kernel.org/r/20220507110114.4128854-1-tongtiangen@huawei.com Link: https://lkml.kernel.org/r/20220507110114.4128854-2-tongtiangen@huawei.com Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/migrate: convert move_to_new_page() into move_to_new_folio()Matthew Wilcox (Oracle)
Pass in the folios that we already have in each caller. Saves a lot of calls to compound_head(). Link: https://lkml.kernel.org/r/20220504182857.4013401-27-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()Matthew Wilcox (Oracle)
shmem_swapin_page() only brings in order-0 pages, which are folios by definition. Link: https://lkml.kernel.org/r/20220504182857.4013401-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: convert shmem_getpage_gfp to use a folioMatthew Wilcox (Oracle)
Rename shmem_alloc_and_acct_page() to shmem_alloc_and_acct_folio() and have it return a folio, then use a folio throuughout shmem_getpage_gfp(). It continues to return a struct page. Link: https://lkml.kernel.org/r/20220504182857.4013401-23-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: convert shmem_alloc_and_acct_page to use a folioMatthew Wilcox (Oracle)
Convert shmem_alloc_hugepage() to return the folio that it uses and use a folio throughout shmem_alloc_and_acct_page(). Continue to return a page from shmem_alloc_and_acct_page() for now. Link: https://lkml.kernel.org/r/20220504182857.4013401-22-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: add shmem_alloc_folio()Matthew Wilcox (Oracle)
Call vma_alloc_folio() directly instead of alloc_page_vma(). Add a shmem_alloc_page() wrapper to avoid changing the callers. Link: https://lkml.kernel.org/r/20220504182857.4013401-21-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: turn shmem_should_replace_page into shmem_should_replace_folioMatthew Wilcox (Oracle)
This is a straightforward conversion. Link: https://lkml.kernel.org/r/20220504182857.4013401-20-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: convert shmem_add_to_page_cache to take a folioMatthew Wilcox (Oracle)
Shrinks shmem_add_to_page_cache() by 16 bytes. All the callers grow, but this is temporary as they will all be converted to folios soon. Link: https://lkml.kernel.org/r/20220504182857.4013401-19-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: use a folio in shmem_unused_huge_shrinkMatthew Wilcox (Oracle)
When calling split_huge_page() we usually have to find the precise page, but that's not necessary here because we only need to unlock and put the folio afterwards. Saves 231 bytes of text (20% of this function). Link: https://lkml.kernel.org/r/20220504182857.4013401-17-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: remove remaining uses of page in shrink_page_listMatthew Wilcox (Oracle)
These are all straightforward conversions to the folio API. Link: https://lkml.kernel.org/r/20220504182857.4013401-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: convert the activate_locked portion of shrink_page_list to foliosMatthew Wilcox (Oracle)
This accounts the number of pages activated correctly for large folios. Link: https://lkml.kernel.org/r/20220504182857.4013401-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: move initialisation of mapping downMatthew Wilcox (Oracle)
Now that we don't interrogate the BDI for congestion, we can delay looking up the folio's mapping until we've got further through the function, reducing register pressure and saving a call to folio_mapping for folios we're adding to the swap cache. Link: https://lkml.kernel.org/r/20220504182857.4013401-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: convert lazy freeing to foliosMatthew Wilcox (Oracle)
Remove a hidden call to compound_head(), and account nr_pages instead of a single page. This matches the code in lru_lazyfree_fn() that accounts nr_pages to PGLAZYFREE. Link: https://lkml.kernel.org/r/20220504182857.4013401-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: convert page buffer handling to use foliosMatthew Wilcox (Oracle)
This mostly just removes calls to compound_head() although nr_reclaimed should be incremented by the number of pages, not just 1. Link: https://lkml.kernel.org/r/20220504182857.4013401-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: convert dirty page handling to foliosMatthew Wilcox (Oracle)
Mostly this just eliminates calls to compound_head(), but NR_VMSCAN_IMMEDIATE was being incremented by 1 instead of by nr_pages. Link: https://lkml.kernel.org/r/20220504182857.4013401-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13swap: convert add_to_swap() to take a folioMatthew Wilcox (Oracle)
The only caller already has a folio available, so this saves a conversion. Also convert the return type to boolean. Link: https://lkml.kernel.org/r/20220504182857.4013401-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13swap: turn get_swap_page() into folio_alloc_swap()Matthew Wilcox (Oracle)
This removes an assumption that a large folio is HPAGE_PMD_NR pages in size. Link: https://lkml.kernel.org/r/20220504182857.4013401-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: convert the writeback handling in shrink_page_list() to foliosMatthew Wilcox (Oracle)
Slightly more efficient due to fewer calls to compound_head(). Link: https://lkml.kernel.org/r/20220504182857.4013401-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13vmscan: use folio_mapped() in shrink_page_list()Matthew Wilcox (Oracle)
Remove some legacy function calls. Link: https://lkml.kernel.org/r/20220504182857.4013401-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: remove alloc_pages_vma()Matthew Wilcox (Oracle)
All callers have now been converted to use vma_alloc_folio(), so convert the body of alloc_pages_vma() to allocate folios instead. Link: https://lkml.kernel.org/r/20220504182857.4013401-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/huge_memory: convert do_huge_pmd_anonymous_page() to use vma_alloc_folio()Matthew Wilcox (Oracle)
Remove the use of this old API, eliminating a call to prep_transhuge_page(). Link: https://lkml.kernel.org/r/20220504182857.4013401-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13shmem: convert shmem_alloc_hugepage() to use vma_alloc_folio()Matthew Wilcox (Oracle)
Patch series "Folio patches for 5.19", v2. This patch (of 26): For now, return the head page of the folio, but remove use of the old alloc_pages_vma() API. Link: https://lkml.kernel.org/r/20220504182857.4013401-1-willy@infradead.org Link: https://lkml.kernel.org/r/20220504182857.4013401-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: remove duplicate include in memory.cWan Jiabing
Fix following checkincludes.pl warning: mm/memory.c: linux/mm_inline.h is included more than once. The include is in line 44. Remove the duplicated here. Link: https://lkml.kernel.org/r/20220427064717.803019-1-wanjiabing@vivo.com Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: don't use NUMA_NO_NODE as indicator of page on different nodeWei Yang
Now we are sure there is at least one page on page_list, so it is safe to get the nid of it. This means it is not necessary to use NUMA_NO_NODE as an indicator for the beginning of iteration or a page on different node. Link: https://lkml.kernel.org/r/20220429014426.29223-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: filter empty page_list at the beginningWei Yang
node_page_list would always be !empty on finishing the loop, except page_list is empty. Let's handle empty page_list before doing any real work including touching PF_MEMALLOC flag. Link: https://lkml.kernel.org/r/20220429014426.29223-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: use helper folio_is_file_lru()Miaohe Lin
Use helper folio_is_file_lru() to check whether folio is file lru. Minor readability improvement. [linmiaohe@huawei.com: use folio_is_file_lru()] Link: https://lkml.kernel.org/r/20220428105802.21389-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220425111232.23182-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Huang, Ying <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: remove obsolete comment in kswapd_runMiaohe Lin
Since commit 6b700b5b3c59 ("mm/vmscan.c: remove cpu online notification for now"), cpu online notification is removed. So kswapd won't move to proper cpus if cpus are hot-added. Remove this obsolete comment. Link: https://lkml.kernel.org/r/20220425111232.23182-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Huang, Ying <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: take all base pages of THP into account when race with ↵Miaohe Lin
speculative reference If the page has buffers, shrink_page_list will try to free the buffer mappings associated with the page and try to free the page as well. In the rare race with speculative reference, the page will be freed shortly by speculative reference. But nr_reclaimed is not incremented correctly when we come across the THP. We need to account all the base pages in this case. Link: https://lkml.kernel.org/r/20220425111232.23182-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Huang, Ying <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: introduce helper function reclaim_page_list()Miaohe Lin
Introduce helper function reclaim_page_list() to eliminate the duplicated code of doing shrink_page_list() and putback_lru_page. Also we can separate node reclaim from node page list operation this way. No functional change intended. Link: https://lkml.kernel.org/r/20220425111232.23182-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Huang, Ying <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: add a comment about MADV_FREE pages check in ↵Miaohe Lin
folio_check_dirty_writeback Patch series "A few cleanup and fixup patches for vmscan This series contains a few patches to remove obsolete comment, introduce helper to remove duplicated code and so no. Also we take all base pages of THP into account in rare race condition. More details can be found in the respective changelogs. This patch (of 6): The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to follow. Add a comment to make the code clear. Link: https://lkml.kernel.org/r/20220425111232.23182-2-linmiaohe@huawei.com Suggested-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: not necessary to re-init the list for each iterationWei Yang
node_page_list is defined with LIST_HEAD and be cleaned until list_empty. So it is not necessary to re-init it again. [akpm@linux-foundation.org: remove unneeded braces] Link: https://lkml.kernel.org/r/20220426021743.21007-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: convert sysfs input to bool using kstrtobool()Jagdish Gediya
Sysfs input conversion to corrosponding bool value e.g. "false" or "0" to false, "true" or "1" to true are currently handled through strncmp at multiple places. Use kstrtobool() to convert sysfs input to bool value. [akpm@linux-foundation.org: propagate kstrtobool() return value, per Andy] Link: https://lkml.kernel.org/r/20220426180203.70782-2-jvgediya@linux.ibm.com Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Fitzgerald <rf@opensource.cirrus.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmscan: take min_slab_pages into account when try to call shrink_nodeMiaohe Lin
Since commit 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()"), slab reclaim and lru page reclaim are done together in the shrink_node. So we should take min_slab_pages into account when try to call shrink_node. Link: https://lkml.kernel.org/r/20220425112118.20924-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: cma: use pageblock_order as the single alignmentZi Yan
Now alloc_contig_range() works at pageblock granularity. Change CMA allocation, which uses alloc_contig_range(), to use pageblock_nr_pages alignment. Link: https://lkml.kernel.org/r/20220425143118.2850746-6-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_isolation: enable arbitrary range page isolation.Zi Yan
Now start_isolate_page_range() is ready to handle arbitrary range isolation, so move the alignment check/adjustment into the function body. Do the same for its counterpart undo_isolate_page_range(). alloc_contig_range(), its caller, can pass an arbitrary range instead of a MAX_ORDER_NR_PAGES aligned one. Link: https://lkml.kernel.org/r/20220425143118.2850746-5-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: make alloc_contig_range work at pageblock granularityZi Yan
alloc_contig_range() worked at MAX_ORDER_NR_PAGES granularity to avoid merging pageblocks with different migratetypes. It might unnecessarily convert extra pageblocks at the beginning and at the end of the range. Change alloc_contig_range() to work at pageblock granularity. Special handling is needed for free pages and in-use pages across the boundaries of the range specified by alloc_contig_range(). Because these= Partially isolated pages causes free page accounting issues. The free pages will be split and freed into separate migratetype lists; the in-use= Pages will be migrated then the freed pages will be handled in the aforementioned way. [ziy@nvidia.com: fix deadlock/crash] Link: https://lkml.kernel.org/r/23A7297E-6C84-4138-A9FE-3598234004E6@nvidia.com Link: https://lkml.kernel.org/r/20220425143118.2850746-4-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_isolation: check specified range for unmovable pagesZi Yan
Enable set_migratetype_isolate() to check specified range for unmovable pages during isolation to prepare arbitrary range page isolation. The functionality will take effect in upcoming commits by adjusting the callers of start_isolate_page_range(), which uses set_migratetype_isolate(). For example, alloc_contig_range(), which calls start_isolate_page_range(), accepts unaligned ranges, but because page isolation is currently done at MAX_ORDER_NR_PAEGS granularity, pages that are out of the specified range but withint MAX_ORDER_NR_PAEGS alignment might be attempted for isolation and the failure of isolating these unrelated pages fails the whole operation undesirably. Link: https://lkml.kernel.org/r/20220425143118.2850746-3-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.cZi Yan
Patch series "Use pageblock_order for cma and alloc_contig_range alignment", v11. This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER adjustable at boot time[1]. The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range() isolates pageblocks to remove free memory from buddy allocator but isolating only a subset of pageblocks within a page spanning across multiple pageblocks causes free page accounting issues. Isolated page might not be put into the right free list, since the code assumes the migratetype of the first pageblock as the whole free page migratetype. This is based on the discussion at [2]. To remove the requirement, this patchset: 1. isolates pages at pageblock granularity instead of max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages); 2. splits free pages across the specified range or migrates in-use pages across the specified range then splits the freed page to avoid free page accounting issues (it happens when multiple pageblocks within a single page have different migratetypes); 3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned range during isolation to avoid alloc_contig_range() failure when pageblocks within a MAX_ORDER - 1 aligned range are allocated separately. 4. returns pages not in the range as it did before. One optimization might come later: 1. make MIGRATE_ISOLATE a separate bit to be able to restore the original migratetypes when isolation fails in the middle of the range. [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/ [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/ This patch (of 6): has_unmovable_pages() is only used in mm/page_isolation.c. Move it from mm/page_alloc.c and make it static. Link: https://lkml.kernel.org/r/20220425143118.2850746-2-zi.yan@sent.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Eric Ren <renzhengeek@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Minchan Kim <minchan@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/uffd: move USERFAULTFD configs into mm/Peter Xu
We used to have USERFAULTFD configs stored in init/. It makes sense as a start because that's the default place for storing syscall related configs. However userfaultfd evolved a bit in the past few years and some more config options were added. They're no longer related to syscalls and start to be not suitable to be kept in the init/ directory anymore, because they're pure mm concepts. But it's not ideal either to keep the userfaultfd configs separate from each other. Hence this patch moves the userfaultfd configs under init/ to be under mm/ so that we'll start to group all userfaultfd configs together. We do have quite a few examples of syscall related configs that are not put under init/Kconfig: FTRACE_SYSCALLS, SWAP, FILE_LOCKING, MEMFD_CREATE.. They all reside in the dir where they're more suitable for the concept. So it seems there's no restriction to keep the role of having syscall related CONFIG_* under init/ only. Link: https://lkml.kernel.org/r/20220420144823.35277-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: enable PTE markers by defaultPeter Xu
Enable PTE markers by default. On x86_64 it means it'll auto-enable PTE_MARKER_UFFD_WP as well. [peterx@redhat.com: hide PTE_MARKER option] Link: https://lkml.kernel.org/r/20220419202531.27415-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220405014929.15158-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/uffd: enable write protection for shmem & hugetlbfsPeter Xu
We've had all the necessary changes ready for both shmem and hugetlbfs. Turn on all the shmem/hugetlbfs switches for userfaultfd-wp. We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too because all existing types now support write protection mode. Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h. Link: https://lkml.kernel.org/r/20220405014926.15101-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/khugepaged: don't recycle vma pgtable if uffd-wp registeredPeter Xu
When we're trying to collapse a 2M huge shmem page, don't retract pgtable pmd page if it's registered with uffd-wp, because that pgtable could have pte markers installed. Recycling of that pgtable means we'll lose the pte markers. That could cause data loss for an uffd-wp enabled application on shmem. Instead of disabling khugepaged on these files, simply skip retracting these special VMAs, then the page cache can still be merged into a huge thp, and other mm/vma can still map the range of file with a huge thp when proper. Note that checking VM_UFFD_WP needs to be done with mmap_sem held for write, that avoids race like: khugepaged user thread ========== =========== check VM_UFFD_WP, not set UFFDIO_REGISTER with uffd-wp on shmem wr-protect some pages (install markers) take mmap_sem write lock erase pmd and free pmd page --> pte markers are dropped unnoticed! Link: https://lkml.kernel.org/r/20220405014921.14994-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/hugetlb: handle uffd-wp during fork()Peter Xu
Firstly, we'll need to pass in dst_vma into copy_hugetlb_page_range() because for uffd-wp it's the dst vma that matters on deciding how we should treat uffd-wp protected ptes. We should recognize pte markers during fork and do the pte copy if needed. [lkp@intel.com: vma_needs_copy can be static] Link: https://lkml.kernel.org/r/Ylb0CGeFJlc4EzLk@7ec4ff11d4ae Link: https://lkml.kernel.org/r/20220405014918.14932-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/hugetlb: only drop uffd-wp special pte if requiredPeter Xu
As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if unmapping an entire vma or synchronized such that faults can not race with the unmap operation. This requires passing zap_flags all the way to the lowest level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent faults. The exception is hole punch which will first unmap without any synchronization. Later when hole punch actually removes the page from the file, it will check to see if there was a subsequent fault and if so take the hugetlb fault mutex while unmapping again. This second unmap will pass in ZAP_FLAG_DROP_MARKER. The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when unmap a hugetlb range" is (IMHO): we should never reach a state when a page fault could errornously fault in a page-cache page that was wr-protected to be writable, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_MARKER when hugetlbfs_punch_hole() calls hugetlb_vmdelete_list(), because if a page faults after that call and before remove_inode_hugepages() is executed, the page cache can be mapped writable again in the small racy window, that can cause unexpected data overwritten. [peterx@redhat.com: fix sparse warning] Link: https://lkml.kernel.org/r/Ylcdw8I1L5iAoWhb@xz-m1.local [akpm@linux-foundation.org: move zap_flags_t from mm.h to mm_types.h to fix build issues] Link: https://lkml.kernel.org/r/20220405014915.14873-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/hugetlb: allow uffd wr-protect none ptesPeter Xu
Teach hugetlbfs code to wr-protect none ptes just in case the page cache existed for that pte. Meanwhile we also need to be able to recognize a uffd-wp marker pte and remove it for uffd_wp_resolve. Since at it, introduce a variable "psize" to replace all references to the huge page size fetcher. Link: https://lkml.kernel.org/r/20220405014912.14815-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/hugetlb: handle pte markers in page faultsPeter Xu
Allow hugetlb code to handle pte markers just like none ptes. It's mostly there, we just need to make sure we don't assume hugetlb_no_page() only handles none pte, so when detecting pte change we should use pte_same() rather than pte_none(). We need to pass in the old_pte to do the comparison. Check the original pte to see whether it's a pte marker, if it is, we should recover uffd-wp bit on the new pte to be installed, so that the next write will be trapped by uffd. Link: https://lkml.kernel.org/r/20220405014909.14761-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>