aboutsummaryrefslogtreecommitdiff
path: root/mm/vmstat.c
diff options
context:
space:
mode:
authorHuang Ying2023-10-16 13:30:00 +0800
committerAndrew Morton2023-10-25 16:47:10 -0700
commit51a755c56dc05a8b31ed28d24f28354946dc7529 (patch)
tree8a459c4def7c181504bb68eee58073bd18004988 /mm/vmstat.c
parent90b41691b9881376fe784e13b5766ec3676fdb55 (diff)
mm: tune PCP high automatically
The target to tune PCP high automatically is as follows, - Minimize allocation/freeing from/to shared zone - Minimize idle pages in PCP - Minimize pages in PCP if the system free pages is too few To reach these target, a tuning algorithm as follows is designed, - When we refill PCP via allocating from the zone, increase PCP high. Because if we had larger PCP, we could avoid to allocate from the zone. - In periodic vmstat updating kworker (via refresh_cpu_vm_stats()), decrease PCP high to try to free possible idle PCP pages. - When page reclaiming is active for the zone, stop increasing PCP high in allocating path, decrease PCP high and free some pages in freeing path. So, the PCP high can be tuned to the page allocating/freeing depth of workloads eventually. One issue of the algorithm is that if the number of pages allocated is much more than that of pages freed on a CPU, the PCP high may become the maximal value even if the allocating/freeing depth is small. But this isn't a severe issue, because there are no idle pages in this case. One alternative choice is to increase PCP high when we drain PCP via trying to free pages to the zone, but don't increase PCP high during PCP refilling. This can avoid the issue above. But if the number of pages allocated is much less than that of pages freed on a CPU, there will be many idle pages in PCP and it is hard to free these idle pages. 1/8 (>> 3) of PCP high will be decreased periodically. The value 1/8 is kind of arbitrary. Just to make sure that the idle PCP pages will be freed eventually. On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. With the patch, the build time decreases 3.5%. The cycles% of the spinlock contention (mostly for zone lock) decreases from 11.0% to 0.5%. The number of PCP draining for high order pages freeing (free_high) decreases 65.6%. The number of pages allocated from zone (instead of from PCP) decreases 83.9%. Link: https://lkml.kernel.org/r/20231016053002.756205-8-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Suggested-by: Mel Gorman <mgorman@techsingularity.net> Suggested-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/vmstat.c')
-rw-r--r--mm/vmstat.c8
1 files changed, 4 insertions, 4 deletions
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 88ea95d4221c..359460deb377 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -816,9 +816,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
for_each_populated_zone(zone) {
struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats;
-#ifdef CONFIG_NUMA
struct per_cpu_pages __percpu *pcp = zone->per_cpu_pageset;
-#endif
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
int v;
@@ -834,10 +832,12 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
#endif
}
}
-#ifdef CONFIG_NUMA
if (do_pagesets) {
cond_resched();
+
+ changes += decay_pcp_high(zone, this_cpu_ptr(pcp));
+#ifdef CONFIG_NUMA
/*
* Deal with draining the remote pageset of this
* processor
@@ -866,8 +866,8 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
drain_zone_pages(zone, this_cpu_ptr(pcp));
changes++;
}
- }
#endif
+ }
}
for_each_online_pgdat(pgdat) {