diff options
author | Linus Torvalds | 2023-06-28 10:28:11 -0700 |
---|---|---|
committer | Linus Torvalds | 2023-06-28 10:28:11 -0700 |
commit | 6e17c6de3ddf3073741d9c91a796ee696914d8a0 (patch) | |
tree | 2c425707f78642625dbe2c824c7fded2021e3dc7 /Documentation | |
parent | 6aeadf7896bff4ca230702daba8788455e6b866e (diff) | |
parent | acc72d59c7509540c27c49625cb4b5a8db1f1a84 (diff) |
Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs
- Yosry has also eliminated cgroup's atomic rstat flushing
- Nhat Pham adds the new cachestat() syscall. It provides userspace
with the ability to query pagecache status - a similar concept to
mincore() but more powerful and with improved usability
- Mel Gorman provides more optimizations for compaction, reducing the
prevalence of page rescanning
- Lorenzo Stoakes has done some maintanance work on the
get_user_pages() interface
- Liam Howlett continues with cleanups and maintenance work to the
maple tree code. Peng Zhang also does some work on maple tree
- Johannes Weiner has done some cleanup work on the compaction code
- David Hildenbrand has contributed additional selftests for
get_user_pages()
- Thomas Gleixner has contributed some maintenance and optimization
work for the vmalloc code
- Baolin Wang has provided some compaction cleanups,
- SeongJae Park continues maintenance work on the DAMON code
- Huang Ying has done some maintenance on the swap code's usage of
device refcounting
- Christoph Hellwig has some cleanups for the filemap/directio code
- Ryan Roberts provides two patch series which yield some
rationalization of the kernel's access to pte entries - use the
provided APIs rather than open-coding accesses
- Lorenzo Stoakes has some fixes to the interaction between pagecache
and directio access to file mappings
- John Hubbard has a series of fixes to the MM selftesting code
- ZhangPeng continues the folio conversion campaign
- Hugh Dickins has been working on the pagetable handling code, mainly
with a view to reducing the load on the mmap_lock
- Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
from 128 to 8
- Domenico Cerasuolo has improved the zswap reclaim mechanism by
reorganizing the LRU management
- Matthew Wilcox provides some fixups to make gfs2 work better with the
buffer_head code
- Vishal Moola also has done some folio conversion work
- Matthew Wilcox has removed the remnants of the pagevec code - their
functionality is migrated over to struct folio_batch
* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
mm/hugetlb: remove hugetlb_set_page_subpool()
mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
hugetlb: revert use of page_cache_next_miss()
Revert "page cache: fix page_cache_next/prev_miss off by one"
mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
mm: memcg: rename and document global_reclaim()
mm: kill [add|del]_page_to_lru_list()
mm: compaction: convert to use a folio in isolate_migratepages_block()
mm: zswap: fix double invalidate with exclusive loads
mm: remove unnecessary pagevec includes
mm: remove references to pagevec
mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
mm: remove struct pagevec
net: convert sunrpc from pagevec to folio_batch
i915: convert i915_gpu_error to use a folio_batch
pagevec: rename fbatch_count()
mm: remove check_move_unevictable_pages()
drm: convert drm_gem_put_pages() to use a folio_batch
i915: convert shmem_sg_free_table() to use a folio_batch
scatterlist: add sg_set_folio()
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/memory.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 7 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/start.rst | 10 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/usage.rst | 146 | ||||
-rw-r--r-- | Documentation/dev-tools/kasan.rst | 9 | ||||
-rw-r--r-- | Documentation/dev-tools/kselftest.rst | 1 | ||||
-rw-r--r-- | Documentation/mm/damon/design.rst | 337 | ||||
-rw-r--r-- | Documentation/mm/damon/faq.rst | 23 | ||||
-rw-r--r-- | Documentation/mm/damon/maintainer-profile.rst | 4 | ||||
-rw-r--r-- | Documentation/mm/page_migration.rst | 7 | ||||
-rw-r--r-- | Documentation/mm/split_page_table_lock.rst | 17 | ||||
-rw-r--r-- | Documentation/translations/zh_CN/mm/page_migration.rst | 2 |
12 files changed, 410 insertions, 155 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 47d1d7d932a8..fabaad3fd9c2 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -297,7 +297,7 @@ Lock order is as follows:: Page lock (PG_locked bit of page->flags) mm->page_table_lock or split pte_lock - lock_page_memcg (memcg->move_lock) + folio_memcg_lock (memcg->move_lock) mapping->i_pages lock lruvec->lru_lock. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 9badcb21db6f..4ef890191196 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1580,6 +1580,13 @@ PAGE_SIZE multiple when read back. Healthy workloads are not expected to reach this limit. + memory.swap.peak + A read-only single value file which exists on non-root + cgroups. + + The max swap usage recorded for the cgroup and its + descendants since the creation of the cgroup. + memory.swap.max A read-write single value file which exists on non-root cgroups. The default is "max". diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst index 9f88afc734da..7aa0071ff1c3 100644 --- a/Documentation/admin-guide/mm/damon/start.rst +++ b/Documentation/admin-guide/mm/damon/start.rst @@ -119,9 +119,9 @@ set size has chronologically changed.:: Data Access Pattern Aware Memory Management =========================================== -Below three commands make every memory region of size >=4K that doesn't -accessed for >=60 seconds in your workload to be swapped out. :: +Below command makes every memory region of size >=4K that has not accessed for +>=60 seconds in your workload to be swapped out. :: - $ echo "#min-size max-size min-acc max-acc min-age max-age action" > test_scheme - $ echo "4K max 0 0 60s max pageout" >> test_scheme - $ damo schemes -c test_scheme <pid of your workload> + $ sudo damo schemes --damos_access_rate 0 0 --damos_sz_region 4K max \ + --damos_age 60s max --damos_action pageout \ + <pid of your workload> diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 9b823fec974d..2d495fa85a0e 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -10,9 +10,8 @@ DAMON provides below interfaces for different users. `This <https://github.com/awslabs/damo>`_ is for privileged people such as system administrators who want a just-working human-friendly interface. Using this, users can use the DAMON’s major features in a human-friendly way. - It may not be highly tuned for special cases, though. It supports both - virtual and physical address spaces monitoring. For more detail, please - refer to its `usage document + It may not be highly tuned for special cases, though. For more detail, + please refer to its `usage document <https://github.com/awslabs/damo/blob/next/USAGE.md>`_. - *sysfs interface.* :ref:`This <sysfs_interface>` is for privileged user space programmers who @@ -20,11 +19,7 @@ DAMON provides below interfaces for different users. features by reading from and writing to special sysfs files. Therefore, you can write and use your personalized DAMON sysfs wrapper programs that reads/writes the sysfs files instead of you. The `DAMON user space tool - <https://github.com/awslabs/damo>`_ is one example of such programs. It - supports both virtual and physical address spaces monitoring. Note that this - interface provides only simple :ref:`statistics <damos_stats>` for the - monitoring results. For detailed monitoring results, DAMON provides a - :ref:`tracepoint <tracepoint>`. + <https://github.com/awslabs/damo>`_ is one example of such programs. - *debugfs interface. (DEPRECATED!)* :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface <sysfs_interface>`. This is deprecated, so users should move to the @@ -139,7 +134,7 @@ scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based operating scheme action tried regions directory for each DAMON-based operation scheme of the kdamond. For details of the DAMON-based operation scheme action tried regions directory, please refer to -:ref:tried_regions section <sysfs_schemes_tried_regions>`. +:ref:`tried_regions section <sysfs_schemes_tried_regions>`. If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. @@ -259,12 +254,9 @@ be equal or smaller than ``start`` of directory ``N+1``. contexts/<N>/schemes/ --------------------- -For usual DAMON-based data access aware memory management optimizations, users -would normally want the system to apply a memory management action to a memory -region of a specific access pattern. DAMON receives such formalized operation -schemes from the user and applies those to the target memory regions. Users -can get and set the schemes by reading from and writing to files under this -directory. +The directory for DAMON-based Operation Schemes (:ref:`DAMOS +<damon_design_damos>`). Users can get and set the schemes by reading from and +writing to files under this directory. In the beginning, this directory has only one file, ``nr_schemes``. Writing a number (``N``) to the file creates the number of child directories named ``0`` @@ -277,12 +269,12 @@ In each scheme directory, five directories (``access_pattern``, ``quotas``, ``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file (``action``) exist. -The ``action`` file is for setting and getting what action you want to apply to -memory regions having specific access pattern of the interest. The keywords -that can be written to and read from the file and their meaning are as below. +The ``action`` file is for setting and getting the scheme's :ref:`action +<damon_design_damos_action>`. The keywords that can be written to and read +from the file and their meaning are as below. Note that support of each action depends on the running DAMON operations set -`implementation <sysfs_contexts>`. +:ref:`implementation <sysfs_contexts>`. - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Supported by ``vaddr`` and ``fvaddr`` operations set. @@ -304,32 +296,21 @@ Note that support of each action depends on the running DAMON operations set schemes/<N>/access_pattern/ --------------------------- -The target access pattern of each DAMON-based operation scheme is constructed -with three ranges including the size of the region in bytes, number of -monitored accesses per aggregate interval, and number of aggregated intervals -for the age of the region. +The directory for the target access :ref:`pattern +<damon_design_damos_access_pattern>` of the given DAMON-based operation scheme. Under the ``access_pattern`` directory, three directories (``sz``, ``nr_accesses``, and ``age``) each having two files (``min`` and ``max``) exist. You can set and get the access pattern for the given scheme by writing to and reading from the ``min`` and ``max`` files under ``sz``, -``nr_accesses``, and ``age`` directories, respectively. +``nr_accesses``, and ``age`` directories, respectively. Note that the ``min`` +and the ``max`` form a closed interval. schemes/<N>/quotas/ ------------------- -Optimal ``target access pattern`` for each ``action`` is workload dependent, so -not easy to find. Worse yet, setting a scheme of some action too aggressive -can cause severe overhead. To avoid such overhead, users can limit time and -size quota for each scheme. In detail, users can ask DAMON to try to use only -up to specific time (``time quota``) for applying the action, and to apply the -action to only up to specific amount (``size quota``) of memory regions having -the target access pattern within a given time interval (``reset interval``). - -When the quota limit is expected to be exceeded, DAMON prioritizes found memory -regions of the ``target access pattern`` based on their size, access frequency, -and age. For personalized prioritization, users can set the weights for the -three properties. +The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given +DAMON-based operation scheme. Under ``quotas`` directory, three files (``ms``, ``bytes``, ``reset_interval_ms``) and one directory (``weights``) having three files @@ -337,23 +318,26 @@ Under ``quotas`` directory, three files (``ms``, ``bytes``, You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and ``reset interval`` in milliseconds by writing the values to the three files, -respectively. You can also set the prioritization weights for size, access -frequency, and age in per-thousand unit by writing the values to the three -files under the ``weights`` directory. +respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds +for applying the ``action`` to memory regions of the ``access_pattern``, and to +apply the action to only up to ``bytes`` bytes of memory regions within the +``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the +quota limits. + +You can also set the :ref:`prioritization weights +<damon_design_damos_quotas_prioritization>` for size, access frequency, and age +in per-thousand unit by writing the values to the three files under the +``weights`` directory. schemes/<N>/watermarks/ ----------------------- -To allow easy activation and deactivation of each scheme based on system -status, DAMON provides a feature called watermarks. The feature receives five -values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The -``metric`` is the system metric such as free memory ratio that can be measured. -If the metric value of the system is higher than the value in ``high`` or lower -than ``low`` at the memoent, the scheme is deactivated. If the value is lower -than ``mid``, the scheme is activated. +The directory for the :ref:`watermarks <damon_design_damos_watermarks>` of the +given DAMON-based operation scheme. Under the watermarks directory, five files (``metric``, ``interval_us``, -``high``, ``mid``, and ``low``) for setting each value exist. You can set and +``high``, ``mid``, and ``low``) for setting the metric, the time interval +between check of the metric, and the three watermarks exist. You can set and get the five values by writing to the files, respectively. Keywords and meanings of those that can be written to the ``metric`` file are @@ -367,12 +351,8 @@ The ``interval`` should written in microseconds unit. schemes/<N>/filters/ -------------------- -Users could know something more than the kernel for specific types of memory. -In the case, users could do their own management for the memory and hence -doesn't want DAMOS bothers that. Users could limit DAMOS by setting the access -pattern of the scheme and/or the monitoring regions for the purpose, but that -can be inefficient in some cases. In such cases, users could set non-access -pattern driven filters using files in this directory. +The directory for the :ref:`filters <damon_design_damos_filters>` of the given +DAMON-based operation scheme. In the beginning, this directory has only one file, ``nr_filters``. Writing a number (``N``) to the file creates the number of child directories named ``0`` @@ -432,13 +412,17 @@ starting from ``0`` under this directory. Each directory contains files exposing detailed information about each of the memory region that the corresponding scheme's ``action`` has tried to be applied under this directory, during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The -information includes address range, ``nr_accesses``, , and ``age`` of the -region. +information includes address range, ``nr_accesses``, and ``age`` of the region. The directories will be removed when another special keyword, ``clear_schemes_tried_regions``, is written to the relevant ``kdamonds/<N>/state`` file. +The expected usage of this directory is investigations of schemes' behaviors, +and query-like efficient data access monitoring results retrievals. For the +latter use case, in particular, users can set the ``action`` as ``stat`` and +set the ``access pattern`` as their interested pattern that they want to query. + tried_regions/<N>/ ------------------ @@ -600,15 +584,10 @@ update. Schemes ------- -For usual DAMON-based data access aware memory management optimizations, users -would simply want the system to apply a memory management action to a memory -region of a specific access pattern. DAMON receives such formalized operation -schemes from the user and applies those to the target processes. - -Users can get and set the schemes by reading from and writing to ``schemes`` -debugfs file. Reading the file also shows the statistics of each scheme. To -the file, each of the schemes should be represented in each line in below -form:: +Users can get and set the DAMON-based operation :ref:`schemes +<damon_design_damos>` by reading from and writing to ``schemes`` debugfs file. +Reading the file also shows the statistics of each scheme. To the file, each +of the schemes should be represented in each line in below form:: <target access pattern> <action> <quota> <watermarks> @@ -617,8 +596,9 @@ You can disable schemes by simply writing an empty string to the file. Target Access Pattern ~~~~~~~~~~~~~~~~~~~~~ -The ``<target access pattern>`` is constructed with three ranges in below -form:: +The target access :ref:`pattern <damon_design_damos_access_pattern>` of the +scheme. The ``<target access pattern>`` is constructed with three ranges in +below form:: min-size max-size min-acc max-acc min-age max-age @@ -631,9 +611,9 @@ closed interval. Action ~~~~~~ -The ``<action>`` is a predefined integer for memory management actions, which -DAMON will apply to the regions having the target access pattern. The -supported numbers and their meanings are as below. +The ``<action>`` is a predefined integer for memory management :ref:`actions +<damon_design_damos_action>`. The supported numbers and their meanings are as +below. - 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if ``target`` is ``paddr``. @@ -649,10 +629,8 @@ supported numbers and their meanings are as below. Quota ~~~~~ -Optimal ``target access pattern`` for each ``action`` is workload dependent, so -not easy to find. Worse yet, setting a scheme of some action too aggressive -can cause severe overhead. To avoid such overhead, users can limit time and -size quota for the scheme via the ``<quota>`` in below form:: +Users can set the :ref:`quotas <damon_design_damos_quotas>` of the given scheme +via the ``<quota>`` in below form:: <ms> <sz> <reset interval> <priority weights> @@ -662,19 +640,17 @@ the action to memory regions of the ``target access pattern`` within the ``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both ``<ms>`` and ``<sz>`` zero disables the quota limits. -When the quota limit is expected to be exceeded, DAMON prioritizes found memory -regions of the ``target access pattern`` based on their size, access frequency, -and age. For personalized prioritization, users can set the weights for the -three properties in ``<priority weights>`` in below form:: +For the :ref:`prioritization <damon_design_damos_quotas_prioritization>`, users +can set the weights for the three properties in ``<priority weights>`` in below +form:: <size weight> <access frequency weight> <age weight> Watermarks ~~~~~~~~~~ -Some schemes would need to run based on current value of the system's specific -metrics like free memory ratio. For such cases, users can specify watermarks -for the condition.:: +Users can specify :ref:`watermarks <damon_design_damos_watermarks>` of the +given scheme via ``<watermarks>`` in below form:: <metric> <check interval> <high mark> <middle mark> <low mark> @@ -797,10 +773,12 @@ root directory only. Tracepoint for Monitoring Results ================================= -DAMON provides the monitoring results via a tracepoint, -``damon:damon_aggregated``. While the monitoring is turned on, you could -record the tracepoint events and show results using tracepoint supporting tools -like ``perf``. For example:: +Users can get the monitoring results via the :ref:`tried_regions +<sysfs_schemes_tried_regions>` or a tracepoint, ``damon:damon_aggregated``. +While the tried regions directory is useful for getting a snapshot, the +tracepoint is useful for getting a full record of the results. While the +monitoring is turned on, you could record the tracepoint events and show +results using tracepoint supporting tools like ``perf``. For example:: # echo on > monitor_on # perf record -e damon:damon_aggregated & diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index e66916a483cd..f4acf9c2e90f 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -107,9 +107,12 @@ effectively disables ``panic_on_warn`` for KASAN reports. Alternatively, independent of ``panic_on_warn``, the ``kasan.fault=`` boot parameter can be used to control panic and reporting behaviour: -- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN - report or also panic the kernel (default: ``report``). The panic happens even - if ``kasan_multi_shot`` is enabled. +- ``kasan.fault=report``, ``=panic``, or ``=panic_on_write`` controls whether + to only print a KASAN report, panic the kernel, or panic the kernel on + invalid writes only (default: ``report``). The panic happens even if + ``kasan_multi_shot`` is enabled. Note that when using asynchronous mode of + Hardware Tag-Based KASAN, ``kasan.fault=panic_on_write`` always panics on + asynchronously checked accesses (including reads). Software and Hardware Tag-Based KASAN modes (see the section about various modes below) support altering stack trace collection behavior: diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index dd214af7b7ff..deede972f254 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -36,6 +36,7 @@ Running the selftests (hotplug tests are run in limited mode) To build the tests:: + $ make headers $ make -C tools/testing/selftests To run the tests:: diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 0cff6fac6b7e..4bfdf1d30c4a 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -4,31 +4,55 @@ Design ====== -Configurable Layers -=================== - -DAMON provides data access monitoring functionality while making the accuracy -and the overhead controllable. The fundamental access monitorings require -primitives that dependent on and optimized for the target address space. On -the other hand, the accuracy and overhead tradeoff mechanism, which is the core -of DAMON, is in the pure logic space. DAMON separates the two parts in -different layers and defines its interface to allow various low level -primitives implementations configurable with the core logic. We call the low -level primitives implementations monitoring operations. - -Due to this separated design and the configurable interface, users can extend -DAMON for any address space by configuring the core logics with appropriate -monitoring operations. If appropriate one is not provided, users can implement -the operations on their own. + +Overall Architecture +==================== + +DAMON subsystem is configured with three layers including + +- Operations Set: Implements fundamental operations for DAMON that depends on + the given monitoring target address-space and available set of + software/hardware primitives, +- Core: Implements core logics including monitoring overhead/accurach control + and access-aware system operations on top of the operations set layer, and +- Modules: Implements kernel modules for various purposes that provides + interfaces for the user space, on top of the core layer. + + +Configurable Operations Set +--------------------------- + +For data access monitoring and additional low level work, DAMON needs a set of +implementations for specific operations that are dependent on and optimized for +the given target address space. On the other hand, the accuracy and overhead +tradeoff mechanism, which is the core logic of DAMON, is in the pure logic +space. DAMON separates the two parts in different layers, namely DAMON +Operations Set and DAMON Core Logics Layers, respectively. It further defines +the interface between the layers to allow various operations sets to be +configured with the core logic. + +Due to this design, users can extend DAMON for any address space by configuring +the core logic to use the appropriate operations set. If any appropriate set +is unavailable, users can implement one on their own. For example, physical memory, virtual memory, swap space, those for specific processes, NUMA nodes, files, and backing memory devices would be supportable. -Also, if some architectures or devices support special optimized access check -primitives, those will be easily configurable. +Also, if some architectures or devices supporting special optimized access +check primitives, those will be easily configurable. -Reference Implementations of Address Space Specific Monitoring Operations -========================================================================= +Programmable Modules +-------------------- + +Core layer of DAMON is implemented as a framework, and exposes its application +programming interface to all kernel space components such as subsystems and +modules. For common use cases of DAMON, DAMON subsystem provides kernel +modules that built on top of the core layer using the API, which can be easily +used by the user space end users. + + +Operations Set Layer +==================== The monitoring operations are defined in two parts: @@ -90,8 +114,12 @@ conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, as Idle page tracking does. -Address Space Independent Core Mechanisms -========================================= +Core Logics +=========== + + +Monitoring +---------- Below four sections describe each of the DAMON core mechanisms and the five monitoring attributes, ``sampling interval``, ``aggregation interval``, @@ -100,7 +128,7 @@ regions``. Access Frequency Monitoring ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of DAMON says what pages are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting @@ -127,7 +155,7 @@ size of the target workload grows. Region Based Sampling ---------------------- +~~~~~~~~~~~~~~~~~~~~~ To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that assumed to have the same access frequencies into a region. As long as the @@ -144,7 +172,7 @@ assumption is not guaranteed. Adaptive Regions Adjustment ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), @@ -162,8 +190,22 @@ In this way, DAMON provides its best-effort quality and minimal overhead while keeping the bounds users set for their trade-off. +Age Tracking +~~~~~~~~~~~~ + +By analyzing the monitoring results, users can also find how long the current +access pattern of a region has maintained. That could be used for good +understanding of the access pattern. For example, page placement algorithm +utilizing both the frequency and the recency could be implemented using that. +To make such access pattern maintained period analysis easier, DAMON maintains +yet another counter called ``age`` in each region. For each ``aggregation +interval``, DAMON checks if the region's size and access frequency +(``nr_accesses``) has significantly changed. If so, the counter is reset to +zero. Otherwise, the counter is increased. + + Dynamic Target Space Updates Handling -------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The monitoring target address range could dynamically changed. For example, virtual memory could be dynamically mapped and unmapped. Physical memory could @@ -174,3 +216,246 @@ monitoring operations to check dynamic changes including memory mapping changes and applies it to monitoring operations-related data structures such as the abstracted monitoring target memory area only for each of a user-specified time interval (``update interval``). + + +.. _damon_design_damos: + +Operation Schemes +----------------- + +One common purpose of data access monitoring is access-aware system efficiency +optimizations. For example, + + paging out memory regions that are not accessed for more than two minutes + +or + + using THP for memory regions that are larger than 2 MiB and showing a high + access frequency for more than one minute. + +One straightforward approach for such schemes would be profile-guided +optimizations. That is, getting data access monitoring results of the +workloads or the system using DAMON, finding memory regions of special +characteristics by profiling the monitoring results, and making system +operation changes for the regions. The changes could be made by modifying or +providing advice to the software (the application and/or the kernel), or +reconfiguring the hardware. Both offline and online approaches could be +available. + +Among those, providing advice to the kernel at runtime would be flexible and +effective, and therefore widely be used. However, implementing such schemes +could impose unnecessary redundancy and inefficiency. The profiling could be +redundant if the type of interest is common. Exchanging the information +including monitoring results and operation advice between kernel and user +spaces could be inefficient. + +To allow users to reduce such redundancy and inefficiencies by offloading the +works, DAMON provides a feature called Data Access Monitoring-based Operation +Schemes (DAMOS). It lets users specify their desired schemes at a high +level. For such specifications, DAMON starts monitoring, finds regions having +the access pattern of interest, and applies the user-desired operation actions +to the regions as soon as found. + + +.. _damon_design_damos_action: + +Operation Action +~~~~~~~~~~~~~~~~ + +The management action that the users desire to apply to the regions of their +interest. For example, paging out, prioritizing for next reclamation victim +selection, advising ``khugepaged`` to collapse or split, or doing nothing but +collecting statistics of the regions. + +The list of supported actions is defined in DAMOS, but the implementation of +each action is in the DAMON operations set layer because the implementation +normally depends on the monitoring target address space. For example, the code +for paging specific virtual address ranges out would be different from that for +physical address ranges. And the monitoring operations implementation sets are +not mandated to support all actions of the list. Hence, the availability of +specific DAMOS action depends on what operations set is selected to be used +together. + +Applying an action to a region is considered as changing the region's +characteristics. Hence, DAMOS resets the age of regions when an action is +applied to those. + + +.. _damon_design_damos_access_pattern: + +Target Access Pattern +~~~~~~~~~~~~~~~~~~~~~ + +The access pattern of the schemes' interest. The patterns are constructed with +the properties that DAMON's monitoring results provide, specifically the size, +the access frequency, and the age. Users can describe their access pattern of +interest by setting minimum and maximum values of the three properties. If a +region's three properties are in the ranges, DAMOS classifies it as one of the +regions that the scheme is having an interest in. + + +.. _damon_design_damos_quotas: + +Quotas +~~~~~~ + +DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if +the target access pattern is not properly tuned. For example, if a huge memory +region having the access pattern of interest is found, applying the scheme's +action to all pages of the huge region could consume unacceptably large system +resources. Preventing such issues by tuning the access pattern could be +challenging, especially if the access patterns of the workloads are highly +dynamic. + +To mitigate that situation, DAMOS provides an upper-bound overhead control +feature called quotas. It lets users specify an upper limit of time that DAMOS +can use for applying the action, and/or a maximum bytes of memory regions that +the action can be applied within a user-specified time duration. + + +.. _damon_design_damos_quotas_prioritization: + +Prioritization +^^^^^^^^^^^^^^ + +A mechanism for making a good decision under the quotas. When the action +cannot be applied to all regions of interest due to the quotas, DAMOS +prioritizes regions and applies the action to only regions having high enough +priorities so that it will not exceed the quotas. + +The prioritization mechanism should be different for each action. For example, +rarely accessed (colder) memory regions would be prioritized for page-out +scheme action. In contrast, the colder regions would be deprioritized for huge +page collapse scheme action. Hence, the prioritization mechanisms for each +action are implemented in each DAMON operations set, together with the actions. + +Though the implementation is up to the DAMON operations set, it would be common +to calculate the priority using the access pattern properties of the regions. +Some users would want the mechanisms to be personalized for their specific +case. For example, some users would want the mechanism to weigh the recency +(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users +to specify the weight of each access pattern property and passes the +information to the underlying mechanism. Nevertheless, how and even whether +the weight will be respected are up to the underlying prioritization mechanism +implementation. + + +.. _damon_design_damos_watermarks: + +Watermarks +~~~~~~~~~~ + +Conditional DAMOS (de)activation automation. Users might want DAMOS to run +only under certain situations. For example, when a sufficient amount of free +memory is guaranteed, running a scheme for proactive reclamation would only +consume unnecessary system resources. To avoid such consumption, the user would +need to manually monitor some metrics such as free memory ratio, and turn +DAMON/DAMOS on or off. + +DAMOS allows users to offload such works using three watermarks. It allows the +users to configure the metric of their interest, and three watermark values, +namely high, middle, and low. If the value of the metric becomes above the +high watermark or below the low watermark, the scheme is deactivated. If the +metric becomes below the mid watermark but above the low watermark, the scheme +is activated. If all schemes are deactivated by the watermarks, the monitoring +is also deactivated. In this case, the DAMON worker thread only periodically +checks the watermarks and therefore incurs nearly zero overhead. + + +.. _damon_design_damos_filters: + +Filters +~~~~~~~ + +Non-access pattern-based target memory regions filtering. If users run +self-written programs or have good profiling tools, they could know something +more than the kernel, such as future access patterns or some special +requirements for specific types of memory. For example, some users may know +only anonymous pages can impact their program's performance. They can also +have a list of latency-critical processes. + +To let users optimize DAMOS schemes with such special knowledge, DAMOS provides +a feature called DAMOS filters. The feature allows users to set an arbitrary +number of filters for each scheme. Each filter specifies the type of target +memory, and whether it should exclude the memory of the type (filter-out), or +all except the memory of the type (filter-in). + +As of this writing, anonymous page type and memory cgroup type are supported by +the feature. Some filter target types can require additional arguments. For +example, the memory cgroup filter type asks users to specify the file path of +the memory cgroup for the filter. Hence, users can apply specific schemes to +only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages +excluding those of specific cgroups, and any combination of those. + + +Application Programming Interface +--------------------------------- + +The programming interface for kernel space data access-aware applications. +DAMON is a framework, so it does nothing by itself. Instead, it only helps +other kernel components such as subsystems and modules building their data +access-aware applications using DAMON's core features. For this, DAMON exposes +its all features to other kernel components via its application programming +interface, namely ``include/linux/damon.h``. Please refer to the API +:doc:`document </mm/damon/api>` for details of the interface. + + +Modules +======= + +Because the core of DAMON is a framework for kernel components, it doesn't +provide any direct interface for the user space. Such interfaces should be +implemented by each DAMON API user kernel components, instead. DAMON subsystem +itself implements such DAMON API user modules, which are supposed to be used +for general purpose DAMON control and special purpose data access-aware system +operations, and provides stable application binary interfaces (ABI) for the +user space. The user space can build their efficient data access-aware +applications using the interfaces. + + +General Purpose User Interface Modules +-------------------------------------- + +DAMON modules that provide user space ABIs for general purpose DAMON usage in +runtime. + +DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs +interface' are DAMON API user kernel modules that provide ABIs to the +user-space. Please note that DAMON debugfs interface is currently deprecated. + +Like many other ABIs, the modules create files on sysfs and debugfs, allow +users to specify their requests to and get the answers from DAMON by writing to +and reading from the files. As a response to such I/O, DAMON user interface +modules control DAMON and retrieve the results as user requested via the DAMON +API, and return the results to the user-space. + +The ABIs are designed to be used for user space applications development, +rather than human beings' fingers. Human users are recommended to use such +user space tools. One such Python-written user space tool is available at +Github (https://github.com/awslabs/damo), Pypi +(https://pypistats.org/packages/damo), and Fedora +(https://packages.fedoraproject.org/pkgs/python-damo/damo/). + +Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for +details of the interfaces. + + +Special-Purpose Access-aware Kernel Modules +------------------------------------------- + +DAMON modules that provide user space ABI for specific purpose DAMON usage. + +DAMON sysfs/debugfs user interfaces are for full control of all DAMON features +in runtime. For each special-purpose system-wide data access-aware system +operations such as proactive reclamation or LRU lists balancing, the interfaces +could be simplified by removing unnecessary knobs for the specific purpose, and +extended for boot-time and even compile time control. Default values of DAMON +control parameters for the usage would also need to be optimized for the +purpose. + +To support such cases, yet more DAMON API user kernel modules that provide more +simple and optimized user space interfaces are available. Currently, two +modules for proactive reclamation and LRU lists manipulation are provided. For +more detail, please read the usage documents for those +(:doc:`/admin-guide/mm/damon/reclaim` and +:doc:`/admin-guide/mm/damon/lru_sort`). diff --git a/Documentation/mm/damon/faq.rst b/Documentation/mm/damon/faq.rst index dde7e2414ee6..3279dc7a8211 100644 --- a/Documentation/mm/damon/faq.rst +++ b/Documentation/mm/damon/faq.rst @@ -4,29 +4,6 @@ Frequently Asked Questions ========================== -Why a new subsystem, instead of extending perf or other user space tools? -========================================================================= - -First, because it needs to be lightweight as much as possible so that it can be -used online, any unnecessary overhead such as kernel - user space context -switching cost should be avoided. Second, DAMON aims to be used by other -programs including the kernel. Therefore, having a dependency on specific -tools like perf is not desirable. These are the two biggest reasons why DAMON -is implemented in the kernel space. - - -Can 'idle pages tracking' or 'perf mem' substitute DAMON? -========================================================= - -Idle page tracking is a low level primitive for access check of the physical -address space. 'perf mem' is similar, though it can use sampling to minimize -the overhead. On the other hand, DAMON is a higher-level framework for the -monitoring of various address spaces. It is focused on memory management -optimization and provides sophisticated accuracy/overhead handling mechanisms. -Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of -DAMON's output, but cannot substitute DAMON. - - Does DAMON support virtual memory only? ======================================= diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst index 24a202f03de8..a84c14e59053 100644 --- a/Documentation/mm/damon/maintainer-profile.rst +++ b/Documentation/mm/damon/maintainer-profile.rst @@ -3,7 +3,7 @@ DAMON Maintainer Entry Profile ============================== -The DAMON subsystem covers the files that listed in 'DATA ACCESS MONITOR' +The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR' section of 'MAINTAINERS' file. The mailing lists for the subsystem are damon@lists.linux.dev and @@ -15,7 +15,7 @@ SCM Trees There are multiple Linux trees for DAMON development. Patches under development or testing are queued in damon/next [2]_ by the DAMON maintainer. -Suffieicntly reviewed patches will be queued in mm-unstable [1]_ by the memory +Sufficiently reviewed patches will be queued in mm-unstable [1]_ by the memory management subsystem maintainer. After more sufficient tests, the patches will be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the memory management subsystem maintainer. diff --git a/Documentation/mm/page_migration.rst b/Documentation/mm/page_migration.rst index 313dce18893e..e35af7805be5 100644 --- a/Documentation/mm/page_migration.rst +++ b/Documentation/mm/page_migration.rst @@ -73,14 +73,13 @@ In kernel use of migrate_pages() It also prevents the swapper or other scans from encountering the page. -2. We need to have a function of type new_page_t that can be +2. We need to have a function of type new_folio_t that can be passed to migrate_pages(). This function should figure out - how to allocate the correct new page given the old page. + how to allocate the correct new folio given the old folio. 3. The migrate_pages() function is called which attempts to do the migration. It will call the function to allocate - the new page for each page that is considered for - moving. + the new folio for each folio that is considered for moving. How migrate_pages() works ========================= diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock. There are helpers to lock/unlock a table and other accessor functions: - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() diff --git a/Documentation/translations/zh_CN/mm/page_migration.rst b/Documentation/translations/zh_CN/mm/page_migration.rst index 076081dc1635..f95063826a15 100644 --- a/Documentation/translations/zh_CN/mm/page_migration.rst +++ b/Documentation/translations/zh_CN/mm/page_migration.rst @@ -55,7 +55,7 @@ mbind()设置一个新的内存策略。一个进程的页面也可以通过sys_ 消失。它还可以防止交换器或其他扫描器遇到该页。 -2. 我们需要有一个new_page_t类型的函数,可以传递给migrate_pages()。这个函数应该计算 +2. 我们需要有一个new_folio_t类型的函数,可以传递给migrate_pages()。这个函数应该计算 出如何在给定的旧页面中分配正确的新页面。 3. migrate_pages()函数被调用,它试图进行迁移。它将调用该函数为每个被考虑迁移的页面分 |