From 4e096ae1801e24b338e02715c65c3ffa8883ba5d Mon Sep 17 00:00:00 2001 From: Matthew Wilcox (Oracle) Date: Sat, 13 May 2023 01:11:01 +0100 Subject: mm: convert migrate_pages() to work on folios Almost all of the callers & implementors of migrate_pages() were already converted to use folios. compaction_alloc() & compaction_free() are trivial to convert a part of this patch and not worth splitting out. Link: https://lkml.kernel.org/r/20230513001101.276972-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: "Huang, Ying" Signed-off-by: Andrew Morton --- Documentation/mm/page_migration.rst | 7 +++---- Documentation/translations/zh_CN/mm/page_migration.rst | 2 +- 2 files changed, 4 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/mm/page_migration.rst b/Documentation/mm/page_migration.rst index 313dce18893e..e35af7805be5 100644 --- a/Documentation/mm/page_migration.rst +++ b/Documentation/mm/page_migration.rst @@ -73,14 +73,13 @@ In kernel use of migrate_pages() It also prevents the swapper or other scans from encountering the page. -2. We need to have a function of type new_page_t that can be +2. We need to have a function of type new_folio_t that can be passed to migrate_pages(). This function should figure out - how to allocate the correct new page given the old page. + how to allocate the correct new folio given the old folio. 3. The migrate_pages() function is called which attempts to do the migration. It will call the function to allocate - the new page for each page that is considered for - moving. + the new folio for each folio that is considered for moving. How migrate_pages() works ========================= diff --git a/Documentation/translations/zh_CN/mm/page_migration.rst b/Documentation/translations/zh_CN/mm/page_migration.rst index 076081dc1635..f95063826a15 100644 --- a/Documentation/translations/zh_CN/mm/page_migration.rst +++ b/Documentation/translations/zh_CN/mm/page_migration.rst @@ -55,7 +55,7 @@ mbind()设置一个新的内存策略。一个进程的页面也可以通过sys_ 消失。它还可以防止交换器或其他扫描器遇到该页。 -2. 我们需要有一个new_page_t类型的函数,可以传递给migrate_pages()。这个函数应该计算 +2. 我们需要有一个new_folio_t类型的函数,可以传递给migrate_pages()。这个函数应该计算 出如何在给定的旧页面中分配正确的新页面。 3. migrate_pages()函数被调用,它试图进行迁移。它将调用该函数为每个被考虑迁移的页面分 -- cgit v1.2.3 From e0e0b4126c1f1effd480777507a61bd09360dc8f Mon Sep 17 00:00:00 2001 From: Lars R. Damerow Date: Wed, 24 May 2023 11:17:33 -0700 Subject: mm/memcontrol: export memcg.swap watermark via sysfs for v2 memcg This patch is similar to commit 8e20d4b33266 ("mm/memcontrol: export memcg->watermark via sysfs for v2 memcg"), but exports the swap counter's watermark. We allocate jobs to our compute farm using heuristics determined by memory and swap usage from previous jobs. Tracking the peak swap usage for new jobs is important for determining when jobs are exceeding their expected bounds, or when our baseline metrics are getting outdated. Our toolset was written to use the "memory.memsw.max_usage_in_bytes" file in cgroups v1, and altering it to poll cgroups v2's "memory.swap.current" would give less accurate results as well as add complication to the code. Having this watermark exposed in sysfs is much preferred. Link: https://lkml.kernel.org/r/20230524181734.125696-1-lars@pixar.com Signed-off-by: Lars R. Damerow Acked-by: Johannes Weiner Cc: Jonathan Corbet Cc: Michal Hocko Cc: Muchun Song Cc: Roman Gushchin Cc: Shakeel Butt Cc: Tejun Heo Cc: Zefan Li Signed-off-by: Andrew Morton --- Documentation/admin-guide/cgroup-v2.rst | 7 +++++++ mm/memcontrol.c | 13 +++++++++++++ 2 files changed, 20 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index f67c0829350b..1ffe019483ac 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1582,6 +1582,13 @@ PAGE_SIZE multiple when read back. Healthy workloads are not expected to reach this limit. + memory.swap.peak + A read-only single value file which exists on non-root + cgroups. + + The max swap usage recorded for the cgroup and its + descendants since the creation of the cgroup. + memory.swap.max A read-write single value file which exists on non-root cgroups. The default is "max". diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6a3d4ce87b8a..6ee433be4c3b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7622,6 +7622,14 @@ static u64 swap_current_read(struct cgroup_subsys_state *css, return (u64)page_counter_read(&memcg->swap) * PAGE_SIZE; } +static u64 swap_peak_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + return (u64)memcg->swap.watermark * PAGE_SIZE; +} + static int swap_high_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -7700,6 +7708,11 @@ static struct cftype swap_files[] = { .seq_show = swap_max_show, .write = swap_max_write, }, + { + .name = "swap.peak", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = swap_peak_read, + }, { .name = "swap.events", .flags = CFTYPE_NOT_ON_ROOT, -- cgit v1.2.3 From c6bb975aa60bef4b689f9811d888c79d721077e3 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:05 +0000 Subject: Docs/mm/damon/faq: remove old questions Patch series "Docs/mm/damon: Minor fixes and design doc update". Some of the DAMON documents are outdated, or having minor typos or grammar erros. Especially, the design doc has not updated for DAMOS, which is an important part of DAMON. Fix the minor issues and update documents. This patch (of 10): The first two questions of DAMON faqs have raised when DAMON patches were first submitted. More than one year has passed since DAMON patches get merged in the mainline, and that kind of questions are not asked nowadays. Remove the questions. Link: https://lkml.kernel.org/r/20230525214314.5204-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230525214314.5204-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/faq.rst | 23 ----------------------- 1 file changed, 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/mm/damon/faq.rst b/Documentation/mm/damon/faq.rst index dde7e2414ee6..3279dc7a8211 100644 --- a/Documentation/mm/damon/faq.rst +++ b/Documentation/mm/damon/faq.rst @@ -4,29 +4,6 @@ Frequently Asked Questions ========================== -Why a new subsystem, instead of extending perf or other user space tools? -========================================================================= - -First, because it needs to be lightweight as much as possible so that it can be -used online, any unnecessary overhead such as kernel - user space context -switching cost should be avoided. Second, DAMON aims to be used by other -programs including the kernel. Therefore, having a dependency on specific -tools like perf is not desirable. These are the two biggest reasons why DAMON -is implemented in the kernel space. - - -Can 'idle pages tracking' or 'perf mem' substitute DAMON? -========================================================= - -Idle page tracking is a low level primitive for access check of the physical -address space. 'perf mem' is similar, though it can use sampling to minimize -the overhead. On the other hand, DAMON is a higher-level framework for the -monitoring of various address spaces. It is focused on memory management -optimization and provides sophisticated accuracy/overhead handling mechanisms. -Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of -DAMON's output, but cannot substitute DAMON. - - Does DAMON support virtual memory only? ======================================= -- cgit v1.2.3 From 73dc57e4ef49a69681d351e0e8fe6fbfb1ba8d92 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:06 +0000 Subject: Docs/mm/damon/maintainer-profile: fix typos and grammar errors Fix a few typos and grammar erros in DAMON Maintainer Profile document. Link: https://lkml.kernel.org/r/20230525214314.5204-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/maintainer-profile.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst index 24a202f03de8..a84c14e59053 100644 --- a/Documentation/mm/damon/maintainer-profile.rst +++ b/Documentation/mm/damon/maintainer-profile.rst @@ -3,7 +3,7 @@ DAMON Maintainer Entry Profile ============================== -The DAMON subsystem covers the files that listed in 'DATA ACCESS MONITOR' +The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR' section of 'MAINTAINERS' file. The mailing lists for the subsystem are damon@lists.linux.dev and @@ -15,7 +15,7 @@ SCM Trees There are multiple Linux trees for DAMON development. Patches under development or testing are queued in damon/next [2]_ by the DAMON maintainer. -Suffieicntly reviewed patches will be queued in mm-unstable [1]_ by the memory +Sufficiently reviewed patches will be queued in mm-unstable [1]_ by the memory management subsystem maintainer. After more sufficient tests, the patches will be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the memory management subsystem maintainer. -- cgit v1.2.3 From 45b849df7d0ea9d65b8f088b6b86d12d7f1bdee6 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:07 +0000 Subject: Docs/mm/damon/design: add a section for overall architecture The design doc is missing overall picture of DAMON. Add a section for overall architeucture and layers. Link: https://lkml.kernel.org/r/20230525214314.5204-4-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 0cff6fac6b7e..3b4ce873fa71 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -4,6 +4,21 @@ Design ====== + +Overall Architecture +==================== + +DAMON subsystem is configured with three layers including + +- Operations Set: Implements fundamental operations for DAMON that depends on + the given monitoring target address-space and available set of + software/hardware primitives, +- Core: Implements core logics including monitoring overhead/accurach control + and access-aware system operations on top of the operations set layer, and +- Modules: Implements kernel modules for various purposes that provides + interfaces for the user space, on top of the core layer. + + Configurable Layers =================== -- cgit v1.2.3 From e168962dbf7f0a6aba26174b3b09d0c313c6f7f5 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:08 +0000 Subject: Docs/mm/damon/design: update the layout based on the layers DAMON design document is describing only the operations set layer and monitoring part of the core logic. Update the layout based on the DAMON's layers, so that more parts of DAMON including DAMOS core logic and DAMON modules can easily be added. Link: https://lkml.kernel.org/r/20230525214314.5204-5-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 3b4ce873fa71..eaf52f3a9144 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -19,8 +19,8 @@ DAMON subsystem is configured with three layers including interfaces for the user space, on top of the core layer. -Configurable Layers -=================== +Configurable Operations Set +--------------------------- DAMON provides data access monitoring functionality while making the accuracy and the overhead controllable. The fundamental access monitorings require @@ -42,8 +42,8 @@ Also, if some architectures or devices support special optimized access check primitives, those will be easily configurable. -Reference Implementations of Address Space Specific Monitoring Operations -========================================================================= +Operations Set Layer +==================== The monitoring operations are defined in two parts: @@ -105,8 +105,12 @@ conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, as Idle page tracking does. -Address Space Independent Core Mechanisms -========================================= +Core Logics +=========== + + +Monitoring +---------- Below four sections describe each of the DAMON core mechanisms and the five monitoring attributes, ``sampling interval``, ``aggregation interval``, @@ -115,7 +119,7 @@ regions``. Access Frequency Monitoring ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of DAMON says what pages are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting @@ -142,7 +146,7 @@ size of the target workload grows. Region Based Sampling ---------------------- +~~~~~~~~~~~~~~~~~~~~~ To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that assumed to have the same access frequencies into a region. As long as the @@ -159,7 +163,7 @@ assumption is not guaranteed. Adaptive Regions Adjustment ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), @@ -178,7 +182,7 @@ keeping the bounds users set for their trade-off. Dynamic Target Space Updates Handling -------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The monitoring target address range could dynamically changed. For example, virtual memory could be dynamically mapped and unmapped. Physical memory could -- cgit v1.2.3 From 69e7b88cea29f10bb8707f4056cba1e2d04cb894 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:09 +0000 Subject: Docs/mm/damon/design: rewrite configurable layers The 'Configurable Operations Set' section is a little bit outdated. Update the text. Link: https://lkml.kernel.org/r/20230525214314.5204-6-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index eaf52f3a9144..4a22bab124cf 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -22,24 +22,23 @@ DAMON subsystem is configured with three layers including Configurable Operations Set --------------------------- -DAMON provides data access monitoring functionality while making the accuracy -and the overhead controllable. The fundamental access monitorings require -primitives that dependent on and optimized for the target address space. On -the other hand, the accuracy and overhead tradeoff mechanism, which is the core -of DAMON, is in the pure logic space. DAMON separates the two parts in -different layers and defines its interface to allow various low level -primitives implementations configurable with the core logic. We call the low -level primitives implementations monitoring operations. - -Due to this separated design and the configurable interface, users can extend -DAMON for any address space by configuring the core logics with appropriate -monitoring operations. If appropriate one is not provided, users can implement -the operations on their own. +For data access monitoring and additional low level work, DAMON needs a set of +implementations for specific operations that are dependent on and optimized for +the given target address space. On the other hand, the accuracy and overhead +tradeoff mechanism, which is the core logic of DAMON, is in the pure logic +space. DAMON separates the two parts in different layers, namely DAMON +Operations Set and DAMON Core Logics Layers, respectively. It further defines +the interface between the layers to allow various operations sets to be +configured with the core logic. + +Due to this design, users can extend DAMON for any address space by configuring +the core logic to use the appropriate operations set. If any appropriate set +is unavailable, users can implement one on their own. For example, physical memory, virtual memory, swap space, those for specific processes, NUMA nodes, files, and backing memory devices would be supportable. -Also, if some architectures or devices support special optimized access check -primitives, those will be easily configurable. +Also, if some architectures or devices supporting special optimized access +check primitives, those will be easily configurable. Operations Set Layer -- cgit v1.2.3 From eaabfa4321a66b771479ce92bb5ad493762f0f0a Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:10 +0000 Subject: Docs/mm/damon/design: add a section for the relation between Core and Modules layer Add overall desription of the interface and the relation between the Core and the Modules layer under 'Overall Architecture' section. Link: https://lkml.kernel.org/r/20230525214314.5204-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 4a22bab124cf..41abd0430dd7 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -41,6 +41,16 @@ Also, if some architectures or devices supporting special optimized access check primitives, those will be easily configurable. +Programmable Modules +-------------------- + +Core layer of DAMON is implemented as a framework, and exposes its application +programming interface to all kernel space components such as subsystems and +modules. For common use cases of DAMON, DAMON subsystem provides kernel +modules that built on top of the core layer using the API, which can be easily +used by the user space end users. + + Operations Set Layer ==================== -- cgit v1.2.3 From 2dc4e6a509aef2d06d9cccfb2aad4cec92752a39 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:11 +0000 Subject: Docs/mm/damon/design: add sections for basic parts of DAMOS DAMOS is an important part of DAMON, but the design doc is not covering it. Add sections for covering the basic part of DAMOS. Link: https://lkml.kernel.org/r/20230525214314.5204-8-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 41abd0430dd7..9f9253529c3d 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -202,3 +202,73 @@ monitoring operations to check dynamic changes including memory mapping changes and applies it to monitoring operations-related data structures such as the abstracted monitoring target memory area only for each of a user-specified time interval (``update interval``). + + +Operation Schemes +----------------- + +One common purpose of data access monitoring is access-aware system efficiency +optimizations. For example, + + paging out memory regions that are not accessed for more than two minutes + +or + + using THP for memory regions that are larger than 2 MiB and showing a high + access frequency for more than one minute. + +One straightforward approach for such schemes would be profile-guided +optimizations. That is, getting data access monitoring results of the +workloads or the system using DAMON, finding memory regions of special +characteristics by profiling the monitoring results, and making system +operation changes for the regions. The changes could be made by modifying or +providing advice to the software (the application and/or the kernel), or +reconfiguring the hardware. Both offline and online approaches could be +available. + +Among those, providing advice to the kernel at runtime would be flexible and +effective, and therefore widely be used. However, implementing such schemes +could impose unnecessary redundancy and inefficiency. The profiling could be +redundant if the type of interest is common. Exchanging the information +including monitoring results and operation advice between kernel and user +spaces could be inefficient. + +To allow users to reduce such redundancy and inefficiencies by offloading the +works, DAMON provides a feature called Data Access Monitoring-based Operation +Schemes (DAMOS). It lets users specify their desired schemes at a high +level. For such specifications, DAMON starts monitoring, finds regions having +the access pattern of interest, and applies the user-desired operation actions +to the regions as soon as found. + + +Operation Action +~~~~~~~~~~~~~~~~ + +The management action that the users desire to apply to the regions of their +interest. For example, paging out, prioritizing for next reclamation victim +selection, advising ``khugepaged`` to collapse or split, or doing nothing but +collecting statistics of the regions. + +The list of supported actions is defined in DAMOS, but the implementation of +each action is in the DAMON operations set layer because the implementation +normally depends on the monitoring target address space. For example, the code +for paging specific virtual address ranges out would be different from that for +physical address ranges. And the monitoring operations implementation sets are +not mandated to support all actions of the list. Hence, the availability of +specific DAMOS action depends on what operations set is selected to be used +together. + +Applying an action to a region is considered as changing the region's +characteristics. Hence, DAMOS resets the age of regions when an action is +applied to those. + + +Target Access Pattern +~~~~~~~~~~~~~~~~~~~~~ + +The access pattern of the schemes' interest. The patterns are constructed with +the properties that DAMON's monitoring results provide, specifically the size, +the access frequency, and the age. Users can describe their access pattern of +interest by setting minimum and maximum values of the three properties. If a +region's three properties are in the ranges, DAMOS classifies it as one of the +regions that the scheme is having an interest in. -- cgit v1.2.3 From b138878609be31615309af6400bf05937e076525 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:12 +0000 Subject: Docs/mm/damon/design: add sections for advanced features of DAMOS Add sections for advanced features of DAMOS including quotas, prioritization, watermarks, and filters of DAMOS on the design document. Link: https://lkml.kernel.org/r/20230525214314.5204-9-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 86 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 9f9253529c3d..706dbc17c6cb 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -272,3 +272,89 @@ the access frequency, and the age. Users can describe their access pattern of interest by setting minimum and maximum values of the three properties. If a region's three properties are in the ranges, DAMOS classifies it as one of the regions that the scheme is having an interest in. + + +Quotas +~~~~~~ + +DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if +the target access pattern is not properly tuned. For example, if a huge memory +region having the access pattern of interest is found, applying the scheme's +action to all pages of the huge region could consume unacceptably large system +resources. Preventing such issues by tuning the access pattern could be +challenging, especially if the access patterns of the workloads are highly +dynamic. + +To mitigate that situation, DAMOS provides an upper-bound overhead control +feature called quotas. It lets users specify an upper limit of time that DAMOS +can use for applying the action, and/or a maximum bytes of memory regions that +the action can be applied within a user-specified time duration. + + +Prioritization +^^^^^^^^^^^^^^ + +A mechanism for making a good decision under the quotas. When the action +cannot be applied to all regions of interest due to the quotas, DAMOS +prioritizes regions and applies the action to only regions having high enough +priorities so that it will not exceed the quotas. + +The prioritization mechanism should be different for each action. For example, +rarely accessed (colder) memory regions would be prioritized for page-out +scheme action. In contrast, the colder regions would be deprioritized for huge +page collapse scheme action. Hence, the prioritization mechanisms for each +action are implemented in each DAMON operations set, together with the actions. + +Though the implementation is up to the DAMON operations set, it would be common +to calculate the priority using the access pattern properties of the regions. +Some users would want the mechanisms to be personalized for their specific +case. For example, some users would want the mechanism to weigh the recency +(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users +to specify the weight of each access pattern property and passes the +information to the underlying mechanism. Nevertheless, how and even whether +the weight will be respected are up to the underlying prioritization mechanism +implementation. + + +Watermarks +~~~~~~~~~~ + +Conditional DAMOS (de)activation automation. Users might want DAMOS to run +only under certain situations. For example, when a sufficient amount of free +memory is guaranteed, running a scheme for proactive reclamation would only +consume unnecessary system resources. To avoid such consumption, the user would +need to manually monitor some metrics such as free memory ratio, and turn +DAMON/DAMOS on or off. + +DAMOS allows users to offload such works using three watermarks. It allows the +users to configure the metric of their interest, and three watermark values, +namely high, middle, and low. If the value of the metric becomes above the +high watermark or below the low watermark, the scheme is deactivated. If the +metric becomes below the mid watermark but above the low watermark, the scheme +is activated. If all schemes are deactivated by the watermarks, the monitoring +is also deactivated. In this case, the DAMON worker thread only periodically +checks the watermarks and therefore incurs nearly zero overhead. + + +Filters +~~~~~~~ + +Non-access pattern-based target memory regions filtering. If users run +self-written programs or have good profiling tools, they could know something +more than the kernel, such as future access patterns or some special +requirements for specific types of memory. For example, some users may know +only anonymous pages can impact their program's performance. They can also +have a list of latency-critical processes. + +To let users optimize DAMOS schemes with such special knowledge, DAMOS provides +a feature called DAMOS filters. The feature allows users to set an arbitrary +number of filters for each scheme. Each filter specifies the type of target +memory, and whether it should exclude the memory of the type (filter-out), or +all except the memory of the type (filter-in). + +As of this writing, anonymous page type and memory cgroup type are supported by +the feature. Some filter target types can require additional arguments. For +example, the memory cgroup filter type asks users to specify the file path of +the memory cgroup for the filter. Hence, users can apply specific schemes to +only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages +excluding those of specific cgroups, and any combination of those. -- cgit v1.2.3 From f508a0fbd3807b9aa157612bba7cf2bafa9dbc97 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:13 +0000 Subject: Docs/mm/damon/design: add a section for DAMON core API Add a section covering the API of DAMON core layer on the design document. Link: https://lkml.kernel.org/r/20230525214314.5204-10-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 706dbc17c6cb..0ccdd2f6af9f 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -358,3 +358,15 @@ example, the memory cgroup filter type asks users to specify the file path of the memory cgroup for the filter. Hence, users can apply specific schemes to only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages excluding those of specific cgroups, and any combination of those. + + +Application Programming Interface +--------------------------------- + +The programming interface for kernel space data access-aware applications. +DAMON is a framework, so it does nothing by itself. Instead, it only helps +other kernel components such as subsystems and modules building their data +access-aware applications using DAMON's core features. For this, DAMON exposes +its all features to other kernel components via its application programming +interface, namely ``include/linux/damon.h``. Please refer to the API +:doc:`document ` for details of the interface. -- cgit v1.2.3 From da9698105c7a84f601e81ea1f5f39ba53f6b9d7c Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 25 May 2023 21:43:14 +0000 Subject: Docs/mm/damon/design: add a section for the modules layer Add a section for covering DAMON modules layer to the design document. Link: https://lkml.kernel.org/r/20230525214314.5204-11-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 61 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 0ccdd2f6af9f..da110e89cab4 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -370,3 +370,64 @@ access-aware applications using DAMON's core features. For this, DAMON exposes its all features to other kernel components via its application programming interface, namely ``include/linux/damon.h``. Please refer to the API :doc:`document ` for details of the interface. + + +Modules +======= + +Because the core of DAMON is a framework for kernel components, it doesn't +provide any direct interface for the user space. Such interfaces should be +implemented by each DAMON API user kernel components, instead. DAMON subsystem +itself implements such DAMON API user modules, which are supposed to be used +for general purpose DAMON control and special purpose data access-aware system +operations, and provides stable application binary interfaces (ABI) for the +user space. The user space can build their efficient data access-aware +applications using the interfaces. + + +General Purpose User Interface Modules +-------------------------------------- + +DAMON modules that provide user space ABIs for general purpose DAMON usage in +runtime. + +DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs +interface' are DAMON API user kernel modules that provide ABIs to the +user-space. Please note that DAMON debugfs interface is currently deprecated. + +Like many other ABIs, the modules create files on sysfs and debugfs, allow +users to specify their requests to and get the answers from DAMON by writing to +and reading from the files. As a response to such I/O, DAMON user interface +modules control DAMON and retrieve the results as user requested via the DAMON +API, and return the results to the user-space. + +The ABIs are designed to be used for user space applications development, +rather than human beings' fingers. Human users are recommended to use such +user space tools. One such Python-written user space tool is available at +Github (https://github.com/awslabs/damo), Pypi +(https://pypistats.org/packages/damo), and Fedora +(https://packages.fedoraproject.org/pkgs/python-damo/damo/). + +Please refer to the ABI :doc:`document ` for +details of the interfaces. + + +Special-Purpose Access-aware Kernel Modules +------------------------------------------- + +DAMON modules that provide user space ABI for specific purpose DAMON usage. + +DAMON sysfs/debugfs user interfaces are for full control of all DAMON features +in runtime. For each special-purpose system-wide data access-aware system +operations such as proactive reclamation or LRU lists balancing, the interfaces +could be simplified by removing unnecessary knobs for the specific purpose, and +extended for boot-time and even compile time control. Default values of DAMON +control parameters for the usage would also need to be optimized for the +purpose. + +To support such cases, yet more DAMON API user kernel modules that provide more +simple and optimized user space interfaces are available. Currently, two +modules for proactive reclamation and LRU lists manipulation are provided. For +more detail, please read the usage documents for those +(:doc:`/admin-guide/mm/damon/reclaim` and +:doc:`/admin-guide/mm/damon/lru_sort`). -- cgit v1.2.3 From 01d6c48a828b4c1cda2fadcb811b432b757bdf8e Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Tue, 6 Jun 2023 00:16:36 -0700 Subject: Documentation: kselftest: "make headers" is a prerequisite As per a discussion with Muhammad Usama Anjum [1], the following is how one is supposed to build selftests: make headers && make -C tools/testing/selftests/mm However, that's not yet documented anywhere. So add it to Documentation/dev-tools/kselftest.rst . [1] https://lore.kernel.org/all/bf910fa5-0c96-3707-cce4-5bcc656b6274@collabora.com/ Link: https://lkml.kernel.org/r/20230606071637.267103-11-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: David Hildenbrand Tested-by: Muhammad Usama Anjum Cc: Peter Xu Cc: Jonathan Corbet Cc: Nathan Chancellor Cc: Shuah Khan Signed-off-by: Andrew Morton --- Documentation/dev-tools/kselftest.rst | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 12b575b76b20..6e35d042199c 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -36,6 +36,7 @@ Running the selftests (hotplug tests are run in limited mode) To build the tests:: + $ make headers $ make -C tools/testing/selftests To run the tests:: -- cgit v1.2.3 From 0d940a9b270b9220dcff74d8e9123c9788365751 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 8 Jun 2023 18:10:32 -0700 Subject: mm/pgtable: allow pte_offset_map[_lock]() to fail Make pte_offset_map() a wrapper for __pte_offset_map() (optionally outputs pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for __pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c. __pte_offset_map() do pmdval validation (including pmd_clear_bad() when pmd_bad()), returning NULL if pmdval is not for a page table. __pte_offset_map_lock() verify pmdval unchanged after getting the lock, trying again if it changed. No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done to cover the imminent case, but we expect to generalize it later, and it makes a mess of where to do the pmd_bad() clearing. Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(), without actually taking the lock. This will be preferred to open uses of pte_lockptr(), because (when split ptlock is in page table's struct page) it points to the right lock for the returned pte pointer, even if *pmd gets changed racily afterwards. Update corresponding Documentation. Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet: they have to wait until all architectures are balancing pte_offset_map()s with pte_unmap()s (as in the arch series posted earlier). But comment where they will go, so that it's easy to add them for experiments. And only when those are in place can transient racy failure cases be enabled. Add more safety for the PAE mismatched pmd_low pmd_high case at that time. Link: https://lkml.kernel.org/r/2929bfd-9893-a374-e463-4c3127ff9b9d@google.com Signed-off-by: Hugh Dickins Cc: Alistair Popple Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christophe Leroy Cc: Christoph Hellwig Cc: David Hildenbrand Cc: "Huang, Ying" Cc: Ira Weiny Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Ryan Roberts Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Signed-off-by: Andrew Morton --- Documentation/mm/split_page_table_lock.rst | 17 +++++---- include/linux/mm.h | 27 +++++++++----- include/linux/pgtable.h | 22 ++++++++---- mm/pgtable-generic.c | 56 ++++++++++++++++++++++++++++++ 4 files changed, 101 insertions(+), 21 deletions(-) (limited to 'Documentation') diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock. There are helpers to lock/unlock a table and other accessor functions: - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() diff --git a/include/linux/mm.h b/include/linux/mm.h index 66032f0d515c..a08dc8cc48fb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2827,14 +2827,25 @@ static inline void pgtable_pte_page_dtor(struct page *page) dec_lruvec_page_state(page, NR_PAGETABLE); } -#define pte_offset_map_lock(mm, pmd, address, ptlp) \ -({ \ - spinlock_t *__ptl = pte_lockptr(mm, pmd); \ - pte_t *__pte = pte_offset_map(pmd, address); \ - *(ptlp) = __ptl; \ - spin_lock(__ptl); \ - __pte; \ -}) +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); +static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr) +{ + return __pte_offset_map(pmd, addr, NULL); +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); +static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pte_t *pte; + + __cond_lock(*ptlp, pte = __pte_offset_map_lock(mm, pmd, addr, ptlp)); + return pte; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94235ff2706e..3fabbb018557 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -94,14 +94,22 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) #define pte_offset_kernel pte_offset_kernel #endif -#if defined(CONFIG_HIGHPTE) -#define pte_offset_map(dir, address) \ - ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ - pte_index((address))) -#define pte_unmap(pte) kunmap_local((pte)) +#ifdef CONFIG_HIGHPTE +#define __pte_map(pmd, address) \ + ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) +#define pte_unmap(pte) do { \ + kunmap_local((pte)); \ + /* rcu_read_unlock() to be added later */ \ +} while (0) #else -#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) -#define pte_unmap(pte) ((void)(pte)) /* NOP */ +static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) +{ + return pte_offset_kernel(pmd, address); +} +static inline void pte_unmap(pte_t *pte) +{ + /* rcu_read_unlock() to be added later */ +} #endif /* Find an entry in the second-level page table.. */ diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d2fc52bffafc..c7ab18a5fb77 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -10,6 +10,8 @@ #include #include #include +#include +#include #include #include @@ -229,3 +231,57 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, } #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) +{ + pmd_t pmdval; + + /* rcu_read_lock() to be added later */ + pmdval = pmdp_get_lockless(pmd); + if (pmdvalp) + *pmdvalp = pmdval; + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) + goto nomap; + if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + goto nomap; + if (unlikely(pmd_bad(pmdval))) { + pmd_clear_bad(pmd); + goto nomap; + } + return __pte_map(&pmdval, addr); +nomap: + /* rcu_read_unlock() to be added later */ + return NULL; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pmd_t pmdval; + pte_t *pte; + + pte = __pte_offset_map(pmd, addr, &pmdval); + if (likely(pte)) + *ptlp = pte_lockptr(mm, &pmdval); + return pte; +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + spinlock_t *ptl; + pmd_t pmdval; + pte_t *pte; +again: + pte = __pte_offset_map(pmd, addr, &pmdval); + if (unlikely(!pte)) + return pte; + ptl = pte_lockptr(mm, &pmdval); + spin_lock(ptl); + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { + *ptlp = ptl; + return pte; + } + pte_unmap_unlock(pte, ptl); + goto again; +} -- cgit v1.2.3 From 6c77b607ee26472fb945aa41734281c39d06d68f Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Wed, 14 Jun 2023 22:36:12 +0800 Subject: mm: kill lock|unlock_page_memcg() Since commit c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()"), no more user, kill lock_page_memcg() and unlock_page_memcg(). Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Johannes Weiner Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- Documentation/admin-guide/cgroup-v1/memory.rst | 2 +- include/linux/memcontrol.h | 12 +----------- mm/filemap.c | 2 +- mm/memcontrol.c | 18 ++++-------------- mm/page-writeback.c | 6 +++--- 5 files changed, 10 insertions(+), 30 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 47d1d7d932a8..fabaad3fd9c2 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -297,7 +297,7 @@ Lock order is as follows:: Page lock (PG_locked bit of page->flags) mm->page_table_lock or split pte_lock - lock_page_memcg (memcg->move_lock) + folio_memcg_lock (memcg->move_lock) mapping->i_pages lock lruvec->lru_lock. diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 00a88cf947e1..c3d3a0c09315 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -419,7 +419,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio) * * - the folio lock * - LRU isolation - * - lock_page_memcg() + * - folio_memcg_lock() * - exclusive reference * - mem_cgroup_trylock_pages() * @@ -949,8 +949,6 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); void folio_memcg_lock(struct folio *folio); void folio_memcg_unlock(struct folio *folio); -void lock_page_memcg(struct page *page); -void unlock_page_memcg(struct page *page); void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val); @@ -1438,14 +1436,6 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { } -static inline void lock_page_memcg(struct page *page) -{ -} - -static inline void unlock_page_memcg(struct page *page) -{ -} - static inline void folio_memcg_lock(struct folio *folio) { } diff --git a/mm/filemap.c b/mm/filemap.c index 00933089b8b6..758bbdf300e7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -117,7 +117,7 @@ * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) * ->inode->i_lock (page_remove_rmap->set_page_dirty) - * ->memcg->move_lock (page_remove_rmap->lock_page_memcg) + * ->memcg->move_lock (page_remove_rmap->folio_memcg_lock) * bdi.wb->list_lock (zap_pte_range->set_page_dirty) * ->inode->i_lock (zap_pte_range->set_page_dirty) * ->private_lock (zap_pte_range->block_dirty_folio) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 93056918e956..cf06b1c9b3bb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2148,17 +2148,12 @@ again: * When charge migration first begins, we can have multiple * critical sections holding the fast-path RCU lock and one * holding the slowpath move_lock. Track the task who has the - * move_lock for unlock_page_memcg(). + * move_lock for folio_memcg_unlock(). */ memcg->move_lock_task = current; memcg->move_lock_flags = flags; } -void lock_page_memcg(struct page *page) -{ - folio_memcg_lock(page_folio(page)); -} - static void __folio_memcg_unlock(struct mem_cgroup *memcg) { if (memcg && memcg->move_lock_task == current) { @@ -2186,11 +2181,6 @@ void folio_memcg_unlock(struct folio *folio) __folio_memcg_unlock(folio_memcg(folio)); } -void unlock_page_memcg(struct page *page) -{ - folio_memcg_unlock(page_folio(page)); -} - struct memcg_stock_pcp { local_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ @@ -2866,7 +2856,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) * * - the page lock * - LRU isolation - * - lock_page_memcg() + * - folio_memcg_lock() * - exclusive reference * - mem_cgroup_trylock_pages() */ @@ -5829,7 +5819,7 @@ static int mem_cgroup_move_account(struct page *page, * with (un)charging, migration, LRU putback, or anything else * that would rely on a stable page's memory cgroup. * - * Note that lock_page_memcg is a memcg lock, not a page lock, + * Note that folio_memcg_lock is a memcg lock, not a page lock, * to save space. As soon as we switch page's memory cgroup to a * new memcg that isn't locked, the above state can change * concurrently again. Make sure we're truly done with it. @@ -6320,7 +6310,7 @@ static void mem_cgroup_move_charge(void) { lru_add_drain_all(); /* - * Signal lock_page_memcg() to take the memcg's move_lock + * Signal folio_memcg_lock() to take the memcg's move_lock * while we're moving its pages to another memcg. Then wait * for already started RCU-only updates to finish. */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index db7943999007..1d17fb1ec863 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2597,7 +2597,7 @@ EXPORT_SYMBOL(noop_dirty_folio); /* * Helper function for set_page_dirty family. * - * Caller must hold lock_page_memcg(). + * Caller must hold folio_memcg_lock(). * * NOTE: This relies on being atomic wrt interrupts. */ @@ -2631,7 +2631,7 @@ static void folio_account_dirtied(struct folio *folio, /* * Helper function for deaccounting dirty page without writeback. * - * Caller must hold lock_page_memcg(). + * Caller must hold folio_memcg_lock(). */ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb) { @@ -2650,7 +2650,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb) * If warn is true, then emit a warning if the folio is not uptodate and has * not been truncated. * - * The caller must hold lock_page_memcg(). Most callers have the folio + * The caller must hold folio_memcg_lock(). Most callers have the folio * locked. A few have the folio blocked from truncation through other * means (eg zap_vma_pages() has it mapped and is holding the page table * lock). This can also be called from mark_buffer_dirty(), which I -- cgit v1.2.3 From 452c03fdbed0d19f907c877a6a9edd226b1ebad9 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Wed, 14 Jun 2023 11:51:16 +0200 Subject: kasan: add support for kasan.fault=panic_on_write KASAN's boot time kernel parameter 'kasan.fault=' currently supports 'report' and 'panic', which results in either only reporting bugs or also panicking on reports. However, some users may wish to have more control over when KASAN reports result in a kernel panic: in particular, KASAN reported invalid _writes_ are of special interest, because they have greater potential to corrupt random kernel memory or be more easily exploited. To panic on invalid writes only, introduce 'kasan.fault=panic_on_write', which allows users to choose to continue running on invalid reads, but panic only on invalid writes. Link: https://lkml.kernel.org/r/20230614095158.1133673-1-elver@google.com Signed-off-by: Marco Elver Reviewed-by: Alexander Potapenko Cc: Aleksandr Nogikh Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Taras Madan Cc: Vincenzo Frascino Signed-off-by: Andrew Morton --- Documentation/dev-tools/kasan.rst | 7 ++++--- mm/kasan/report.c | 31 ++++++++++++++++++++++++++----- 2 files changed, 30 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index e66916a483cd..7f37a46af574 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -107,9 +107,10 @@ effectively disables ``panic_on_warn`` for KASAN reports. Alternatively, independent of ``panic_on_warn``, the ``kasan.fault=`` boot parameter can be used to control panic and reporting behaviour: -- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN - report or also panic the kernel (default: ``report``). The panic happens even - if ``kasan_multi_shot`` is enabled. +- ``kasan.fault=report``, ``=panic``, or ``=panic_on_write`` controls whether + to only print a KASAN report, panic the kernel, or panic the kernel on + invalid writes only (default: ``report``). The panic happens even if + ``kasan_multi_shot`` is enabled. Software and Hardware Tag-Based KASAN modes (see the section about various modes below) support altering stack trace collection behavior: diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 84d9f3b37014..ca4b6ff080a6 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -43,6 +43,7 @@ enum kasan_arg_fault { KASAN_ARG_FAULT_DEFAULT, KASAN_ARG_FAULT_REPORT, KASAN_ARG_FAULT_PANIC, + KASAN_ARG_FAULT_PANIC_ON_WRITE, }; static enum kasan_arg_fault kasan_arg_fault __ro_after_init = KASAN_ARG_FAULT_DEFAULT; @@ -57,6 +58,8 @@ static int __init early_kasan_fault(char *arg) kasan_arg_fault = KASAN_ARG_FAULT_REPORT; else if (!strcmp(arg, "panic")) kasan_arg_fault = KASAN_ARG_FAULT_PANIC; + else if (!strcmp(arg, "panic_on_write")) + kasan_arg_fault = KASAN_ARG_FAULT_PANIC_ON_WRITE; else return -EINVAL; @@ -211,7 +214,7 @@ static void start_report(unsigned long *flags, bool sync) pr_err("==================================================================\n"); } -static void end_report(unsigned long *flags, const void *addr) +static void end_report(unsigned long *flags, const void *addr, bool is_write) { if (addr) trace_error_report_end(ERROR_DETECTOR_KASAN, @@ -220,8 +223,18 @@ static void end_report(unsigned long *flags, const void *addr) spin_unlock_irqrestore(&report_lock, *flags); if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags)) check_panic_on_warn("KASAN"); - if (kasan_arg_fault == KASAN_ARG_FAULT_PANIC) + switch (kasan_arg_fault) { + case KASAN_ARG_FAULT_DEFAULT: + case KASAN_ARG_FAULT_REPORT: + break; + case KASAN_ARG_FAULT_PANIC: panic("kasan.fault=panic set ...\n"); + break; + case KASAN_ARG_FAULT_PANIC_ON_WRITE: + if (is_write) + panic("kasan.fault=panic_on_write set ...\n"); + break; + } add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); lockdep_on(); report_suppress_stop(); @@ -536,7 +549,11 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty print_report(&info); - end_report(&flags, ptr); + /* + * Invalid free is considered a "write" since the allocator's metadata + * updates involves writes. + */ + end_report(&flags, ptr, true); } /* @@ -570,7 +587,7 @@ bool kasan_report(const void *addr, size_t size, bool is_write, print_report(&info); - end_report(&irq_flags, (void *)addr); + end_report(&irq_flags, (void *)addr, is_write); out: user_access_restore(ua_flags); @@ -596,7 +613,11 @@ void kasan_report_async(void) pr_err("Asynchronous fault: no details available\n"); pr_err("\n"); dump_stack_lvl(KERN_ERR); - end_report(&flags, NULL); + /* + * Conservatively set is_write=true, because no details are available. + * In this mode, kasan.fault=panic_on_write is like kasan.fault=panic. + */ + end_report(&flags, NULL, true); } #endif /* CONFIG_KASAN_HW_TAGS */ -- cgit v1.2.3 From b16b54c9db8bb4ed34a489ed56cb33d08a12da1a Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:36 +0000 Subject: Docs/mm/damon/design: document 'age' of region Patch series "Docs/{mm,admin-guide}damon: update design and usage docs". Update DAMON design and usage documents for outdated and unnecessarily duplicated parts. This patch (of 7): The 'age' of each region in DAMON monitoring results is an important concept for both monitoring part and DAMOS. And DAMOS section of the design document is mentioning it. However, the age itself is not explained in the document. Add a section for that. Link: https://lkml.kernel.org/r/20230616191742.87531-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230616191742.87531-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/mm/damon/design.rst | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation') diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index da110e89cab4..a98af99bb705 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -190,6 +190,20 @@ In this way, DAMON provides its best-effort quality and minimal overhead while keeping the bounds users set for their trade-off. +Age Tracking +~~~~~~~~~~~~ + +By analyzing the monitoring results, users can also find how long the current +access pattern of a region has maintained. That could be used for good +understanding of the access pattern. For example, page placement algorithm +utilizing both the frequency and the recency could be implemented using that. +To make such access pattern maintained period analysis easier, DAMON maintains +yet another counter called ``age`` in each region. For each ``aggregation +interval``, DAMON checks if the region's size and access frequency +(``nr_accesses``) has significantly changed. If so, the counter is reset to +zero. Otherwise, the counter is increased. + + Dynamic Target Space Updates Handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- cgit v1.2.3 From 60eb644b012799562d1f7b9238434fe89ed3a940 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:37 +0000 Subject: Docs/admin-guide/mm/damon/start: update DAMOS example command DAMON user-space tool, damo, has deprecated[1] its old DAMOS schemes specification format. However, an example of DAMON documentation is still using it. Update the example to use one of the alternative options. [1] https://github.com/awslabs/damo/commit/e9950ae68f6c Link: https://lkml.kernel.org/r/20230616191742.87531-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/start.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst index 9f88afc734da..7aa0071ff1c3 100644 --- a/Documentation/admin-guide/mm/damon/start.rst +++ b/Documentation/admin-guide/mm/damon/start.rst @@ -119,9 +119,9 @@ set size has chronologically changed.:: Data Access Pattern Aware Memory Management =========================================== -Below three commands make every memory region of size >=4K that doesn't -accessed for >=60 seconds in your workload to be swapped out. :: +Below command makes every memory region of size >=4K that has not accessed for +>=60 seconds in your workload to be swapped out. :: - $ echo "#min-size max-size min-acc max-acc min-age max-age action" > test_scheme - $ echo "4K max 0 0 60s max pageout" >> test_scheme - $ damo schemes -c test_scheme + $ sudo damo schemes --damos_access_rate 0 0 --damos_sz_region 4K max \ + --damos_age 60s max --damos_action pageout \ + -- cgit v1.2.3 From cc5ece5979dadc76ff9b9c7ac1795327a4f1ecb1 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:38 +0000 Subject: Docs/admin-guide/mm/damon/usage: fix typos in references and commas Fix typos including a unnecessary comma and incomplete ':ref:' keywords. Link: https://lkml.kernel.org/r/20230616191742.87531-4-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 9b823fec974d..d2435dcc22f4 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -139,7 +139,7 @@ scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based operating scheme action tried regions directory for each DAMON-based operation scheme of the kdamond. For details of the DAMON-based operation scheme action tried regions directory, please refer to -:ref:tried_regions section `. +:ref:`tried_regions section `. If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. @@ -282,7 +282,7 @@ memory regions having specific access pattern of the interest. The keywords that can be written to and read from the file and their meaning are as below. Note that support of each action depends on the running DAMON operations set -`implementation `. +:ref:`implementation `. - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Supported by ``vaddr`` and ``fvaddr`` operations set. @@ -432,8 +432,7 @@ starting from ``0`` under this directory. Each directory contains files exposing detailed information about each of the memory region that the corresponding scheme's ``action`` has tried to be applied under this directory, during next :ref:`aggregation interval `. The -information includes address range, ``nr_accesses``, , and ``age`` of the -region. +information includes address range, ``nr_accesses``, and ``age`` of the region. The directories will be removed when another special keyword, ``clear_schemes_tried_regions``, is written to the relevant -- cgit v1.2.3 From ddb7d012b1018c820848be051f84e962fd30dd78 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:39 +0000 Subject: Docs/admin-guide/mm/damon/usage: remove unnecessary sentences about supported address spaces Brief explanation of DAMON user space tool and sysfs interface are unnecessarily and repeatedly mentioning the list of address spaces that DAMON is supporting. Remove those. Link: https://lkml.kernel.org/r/20230616191742.87531-5-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index d2435dcc22f4..82a79838a47d 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -10,9 +10,8 @@ DAMON provides below interfaces for different users. `This `_ is for privileged people such as system administrators who want a just-working human-friendly interface. Using this, users can use the DAMON’s major features in a human-friendly way. - It may not be highly tuned for special cases, though. It supports both - virtual and physical address spaces monitoring. For more detail, please - refer to its `usage document + It may not be highly tuned for special cases, though. For more detail, + please refer to its `usage document `_. - *sysfs interface.* :ref:`This ` is for privileged user space programmers who @@ -20,10 +19,9 @@ DAMON provides below interfaces for different users. features by reading from and writing to special sysfs files. Therefore, you can write and use your personalized DAMON sysfs wrapper programs that reads/writes the sysfs files instead of you. The `DAMON user space tool - `_ is one example of such programs. It - supports both virtual and physical address spaces monitoring. Note that this - interface provides only simple :ref:`statistics ` for the - monitoring results. For detailed monitoring results, DAMON provides a + `_ is one example of such programs. Note + that this interface provides only simple :ref:`statistics ` for + the monitoring results. For detailed monitoring results, DAMON provides a :ref:`tracepoint `. - *debugfs interface. (DEPRECATED!)* :ref:`This ` is almost identical to :ref:`sysfs interface -- cgit v1.2.3 From 01e08737daed0135c20c0844272a358d63a9cafe Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:40 +0000 Subject: Docs/admin-guide/mm/damon/usage: link design document for DAMOS The background and concept of DAMOS is redundantly documented, in the design document and the usage document. Replace the duplicated ones in usage document with links to the design document. Link: https://lkml.kernel.org/r/20230616191742.87531-6-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 104 ++++++++++----------------- Documentation/mm/damon/design.rst | 14 ++++ 2 files changed, 51 insertions(+), 67 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 82a79838a47d..ea6a5dc8930e 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -257,12 +257,9 @@ be equal or smaller than ``start`` of directory ``N+1``. contexts//schemes/ --------------------- -For usual DAMON-based data access aware memory management optimizations, users -would normally want the system to apply a memory management action to a memory -region of a specific access pattern. DAMON receives such formalized operation -schemes from the user and applies those to the target memory regions. Users -can get and set the schemes by reading from and writing to files under this -directory. +The directory for DAMON-based Operation Schemes (:ref:`DAMOS +`). Users can get and set the schemes by reading from and +writing to files under this directory. In the beginning, this directory has only one file, ``nr_schemes``. Writing a number (``N``) to the file creates the number of child directories named ``0`` @@ -275,9 +272,9 @@ In each scheme directory, five directories (``access_pattern``, ``quotas``, ``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file (``action``) exist. -The ``action`` file is for setting and getting what action you want to apply to -memory regions having specific access pattern of the interest. The keywords -that can be written to and read from the file and their meaning are as below. +The ``action`` file is for setting and getting the scheme's :ref:`action +`. The keywords that can be written to and read +from the file and their meaning are as below. Note that support of each action depends on the running DAMON operations set :ref:`implementation `. @@ -302,10 +299,8 @@ Note that support of each action depends on the running DAMON operations set schemes//access_pattern/ --------------------------- -The target access pattern of each DAMON-based operation scheme is constructed -with three ranges including the size of the region in bytes, number of -monitored accesses per aggregate interval, and number of aggregated intervals -for the age of the region. +The directory for the target access :ref:`pattern +` of the given DAMON-based operation scheme. Under the ``access_pattern`` directory, three directories (``sz``, ``nr_accesses``, and ``age``) each having two files (``min`` and ``max``) @@ -316,18 +311,8 @@ to and reading from the ``min`` and ``max`` files under ``sz``, schemes//quotas/ ------------------- -Optimal ``target access pattern`` for each ``action`` is workload dependent, so -not easy to find. Worse yet, setting a scheme of some action too aggressive -can cause severe overhead. To avoid such overhead, users can limit time and -size quota for each scheme. In detail, users can ask DAMON to try to use only -up to specific time (``time quota``) for applying the action, and to apply the -action to only up to specific amount (``size quota``) of memory regions having -the target access pattern within a given time interval (``reset interval``). - -When the quota limit is expected to be exceeded, DAMON prioritizes found memory -regions of the ``target access pattern`` based on their size, access frequency, -and age. For personalized prioritization, users can set the weights for the -three properties. +The directory for the :ref:`quotas ` of the given +DAMON-based operation scheme. Under ``quotas`` directory, three files (``ms``, ``bytes``, ``reset_interval_ms``) and one directory (``weights``) having three files @@ -335,23 +320,20 @@ Under ``quotas`` directory, three files (``ms``, ``bytes``, You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and ``reset interval`` in milliseconds by writing the values to the three files, -respectively. You can also set the prioritization weights for size, access -frequency, and age in per-thousand unit by writing the values to the three -files under the ``weights`` directory. +respectively. You can also set the :ref:`prioritization weights +` for size, access frequency, and age +in per-thousand unit by writing the values to the three files under the +``weights`` directory. schemes//watermarks/ ----------------------- -To allow easy activation and deactivation of each scheme based on system -status, DAMON provides a feature called watermarks. The feature receives five -values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The -``metric`` is the system metric such as free memory ratio that can be measured. -If the metric value of the system is higher than the value in ``high`` or lower -than ``low`` at the memoent, the scheme is deactivated. If the value is lower -than ``mid``, the scheme is activated. +The directory for the :ref:`watermarks ` of the +given DAMON-based operation scheme. Under the watermarks directory, five files (``metric``, ``interval_us``, -``high``, ``mid``, and ``low``) for setting each value exist. You can set and +``high``, ``mid``, and ``low``) for setting the metric, the time interval +between check of the metric, and the three watermarks exist. You can set and get the five values by writing to the files, respectively. Keywords and meanings of those that can be written to the ``metric`` file are @@ -365,12 +347,8 @@ The ``interval`` should written in microseconds unit. schemes//filters/ -------------------- -Users could know something more than the kernel for specific types of memory. -In the case, users could do their own management for the memory and hence -doesn't want DAMOS bothers that. Users could limit DAMOS by setting the access -pattern of the scheme and/or the monitoring regions for the purpose, but that -can be inefficient in some cases. In such cases, users could set non-access -pattern driven filters using files in this directory. +The directory for the :ref:`filters ` of the given +DAMON-based operation scheme. In the beginning, this directory has only one file, ``nr_filters``. Writing a number (``N``) to the file creates the number of child directories named ``0`` @@ -597,15 +575,10 @@ update. Schemes ------- -For usual DAMON-based data access aware memory management optimizations, users -would simply want the system to apply a memory management action to a memory -region of a specific access pattern. DAMON receives such formalized operation -schemes from the user and applies those to the target processes. - -Users can get and set the schemes by reading from and writing to ``schemes`` -debugfs file. Reading the file also shows the statistics of each scheme. To -the file, each of the schemes should be represented in each line in below -form:: +Users can get and set the DAMON-based operation :ref:`schemes +` by reading from and writing to ``schemes`` debugfs file. +Reading the file also shows the statistics of each scheme. To the file, each +of the schemes should be represented in each line in below form:: @@ -614,8 +587,9 @@ You can disable schemes by simply writing an empty string to the file. Target Access Pattern ~~~~~~~~~~~~~~~~~~~~~ -The ```` is constructed with three ranges in below -form:: +The target access :ref:`pattern ` of the +scheme. The ```` is constructed with three ranges in +below form:: min-size max-size min-acc max-acc min-age max-age @@ -628,9 +602,9 @@ closed interval. Action ~~~~~~ -The ```` is a predefined integer for memory management actions, which -DAMON will apply to the regions having the target access pattern. The -supported numbers and their meanings are as below. +The ```` is a predefined integer for memory management :ref:`actions +`. The supported numbers and their meanings are as +below. - 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if ``target`` is ``paddr``. @@ -646,10 +620,8 @@ supported numbers and their meanings are as below. Quota ~~~~~ -Optimal ``target access pattern`` for each ``action`` is workload dependent, so -not easy to find. Worse yet, setting a scheme of some action too aggressive -can cause severe overhead. To avoid such overhead, users can limit time and -size quota for the scheme via the ```` in below form:: +Users can set the :ref:`quotas ` of the given scheme +via the ```` in below form:: @@ -659,19 +631,17 @@ the action to memory regions of the ``target access pattern`` within the ```` bytes of memory regions within the ````. Setting both ```` and ```` zero disables the quota limits. -When the quota limit is expected to be exceeded, DAMON prioritizes found memory -regions of the ``target access pattern`` based on their size, access frequency, -and age. For personalized prioritization, users can set the weights for the -three properties in ```` in below form:: +For the :ref:`prioritization `, users +can set the weights for the three properties in ```` in below +form:: Watermarks ~~~~~~~~~~ -Some schemes would need to run based on current value of the system's specific -metrics like free memory ratio. For such cases, users can specify watermarks -for the condition.:: +Users can specify :ref:`watermarks ` of the +given scheme via ```` in below form:: diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index a98af99bb705..4bfdf1d30c4a 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -218,6 +218,8 @@ abstracted monitoring target memory area only for each of a user-specified time interval (``update interval``). +.. _damon_design_damos: + Operation Schemes ----------------- @@ -255,6 +257,8 @@ the access pattern of interest, and applies the user-desired operation actions to the regions as soon as found. +.. _damon_design_damos_action: + Operation Action ~~~~~~~~~~~~~~~~ @@ -277,6 +281,8 @@ characteristics. Hence, DAMOS resets the age of regions when an action is applied to those. +.. _damon_design_damos_access_pattern: + Target Access Pattern ~~~~~~~~~~~~~~~~~~~~~ @@ -288,6 +294,8 @@ region's three properties are in the ranges, DAMOS classifies it as one of the regions that the scheme is having an interest in. +.. _damon_design_damos_quotas: + Quotas ~~~~~~ @@ -305,6 +313,8 @@ can use for applying the action, and/or a maximum bytes of memory regions that the action can be applied within a user-specified time duration. +.. _damon_design_damos_quotas_prioritization: + Prioritization ^^^^^^^^^^^^^^ @@ -330,6 +340,8 @@ the weight will be respected are up to the underlying prioritization mechanism implementation. +.. _damon_design_damos_watermarks: + Watermarks ~~~~~~~~~~ @@ -350,6 +362,8 @@ is also deactivated. In this case, the DAMON worker thread only periodically checks the watermarks and therefore incurs nearly zero overhead. +.. _damon_design_damos_filters: + Filters ~~~~~~~ -- cgit v1.2.3 From 67c34f6c6af8f3bdf794a6b91d9f063faeac0575 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:41 +0000 Subject: Docs/admin-guide/mm/damon/usage: clarify quotas and watermarks sysfs interface Explanation of DAMOS quotas and watermarks are not clearly explaining the meaning and expectation of each file. Add more clarification for those. Link: https://lkml.kernel.org/r/20230616191742.87531-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index ea6a5dc8930e..9d3ebd70772f 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -306,7 +306,8 @@ Under the ``access_pattern`` directory, three directories (``sz``, ``nr_accesses``, and ``age``) each having two files (``min`` and ``max``) exist. You can set and get the access pattern for the given scheme by writing to and reading from the ``min`` and ``max`` files under ``sz``, -``nr_accesses``, and ``age`` directories, respectively. +``nr_accesses``, and ``age`` directories, respectively. Note that the ``min`` +and the ``max`` form a closed interval. schemes//quotas/ ------------------- @@ -320,7 +321,13 @@ Under ``quotas`` directory, three files (``ms``, ``bytes``, You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and ``reset interval`` in milliseconds by writing the values to the three files, -respectively. You can also set the :ref:`prioritization weights +respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds +for applying the ``action`` to memory regions of the ``access_pattern``, and to +apply the action to only up to ``bytes`` bytes of memory regions within the +``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the +quota limits. + +You can also set the :ref:`prioritization weights ` for size, access frequency, and age in per-thousand unit by writing the values to the three files under the ``weights`` directory. -- cgit v1.2.3 From ff71f26f9774267bcdfa4759ea8879562f079da4 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 16 Jun 2023 19:17:42 +0000 Subject: Docs/admin-guide/mm/damon/usage: update the ways for getting monitoring results The recommended ways for getting DAMON monitoring results are using tried_regions sysfs directory for partial snapshot of the results, and DAMON tracepoint for full record of the results. However, the tried_regions sysfs directory usage has not sufficiently updated on some sections of the DAMON usage document. Update those. Link: https://lkml.kernel.org/r/20230616191742.87531-8-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 9d3ebd70772f..2d495fa85a0e 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -19,10 +19,7 @@ DAMON provides below interfaces for different users. features by reading from and writing to special sysfs files. Therefore, you can write and use your personalized DAMON sysfs wrapper programs that reads/writes the sysfs files instead of you. The `DAMON user space tool - `_ is one example of such programs. Note - that this interface provides only simple :ref:`statistics ` for - the monitoring results. For detailed monitoring results, DAMON provides a - :ref:`tracepoint `. + `_ is one example of such programs. - *debugfs interface. (DEPRECATED!)* :ref:`This ` is almost identical to :ref:`sysfs interface `. This is deprecated, so users should move to the @@ -421,6 +418,11 @@ The directories will be removed when another special keyword, ``clear_schemes_tried_regions``, is written to the relevant ``kdamonds//state`` file. +The expected usage of this directory is investigations of schemes' behaviors, +and query-like efficient data access monitoring results retrievals. For the +latter use case, in particular, users can set the ``action`` as ``stat`` and +set the ``access pattern`` as their interested pattern that they want to query. + tried_regions// ------------------ @@ -771,10 +773,12 @@ root directory only. Tracepoint for Monitoring Results ================================= -DAMON provides the monitoring results via a tracepoint, -``damon:damon_aggregated``. While the monitoring is turned on, you could -record the tracepoint events and show results using tracepoint supporting tools -like ``perf``. For example:: +Users can get the monitoring results via the :ref:`tried_regions +` or a tracepoint, ``damon:damon_aggregated``. +While the tried regions directory is useful for getting a snapshot, the +tracepoint is useful for getting a full record of the results. While the +monitoring is turned on, you could record the tracepoint events and show +results using tracepoint supporting tools like ``perf``. For example:: # echo on > monitor_on # perf record -e damon:damon_aggregated & -- cgit v1.2.3 From 8c293a6353d663879ec0e5db1db052319ca2100f Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Tue, 20 Jun 2023 19:17:35 +0200 Subject: kasan, doc: note kasan.fault=panic_on_write behaviour for async modes Note the behaviour of kasan.fault=panic_on_write for async modes, since all asynchronous faults will result in panic (even if they are reads). Link: https://lkml.kernel.org/r/ZJHfL6vavKUZ3Yd8@elver.google.com Fixes: 452c03fdbed0 ("kasan: add support for kasan.fault=panic_on_write") Signed-off-by: Marco Elver Reviewed-by: Andrey Konovalov Cc: Aleksandr Nogikh Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Taras Madan Cc: Vincenzo Frascino Signed-off-by: Andrew Morton --- Documentation/dev-tools/kasan.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 7f37a46af574..f4acf9c2e90f 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -110,7 +110,9 @@ parameter can be used to control panic and reporting behaviour: - ``kasan.fault=report``, ``=panic``, or ``=panic_on_write`` controls whether to only print a KASAN report, panic the kernel, or panic the kernel on invalid writes only (default: ``report``). The panic happens even if - ``kasan_multi_shot`` is enabled. + ``kasan_multi_shot`` is enabled. Note that when using asynchronous mode of + Hardware Tag-Based KASAN, ``kasan.fault=panic_on_write`` always panics on + asynchronously checked accesses (including reads). Software and Hardware Tag-Based KASAN modes (see the section about various modes below) support altering stack trace collection behavior: -- cgit v1.2.3