diff options
Diffstat (limited to 'Documentation/admin-guide')
28 files changed, 1206 insertions, 183 deletions
diff --git a/Documentation/admin-guide/acpi/fan_performance_states.rst b/Documentation/admin-guide/acpi/fan_performance_states.rst index 98fe5c333121..b9e4b4d146c1 100644 --- a/Documentation/admin-guide/acpi/fan_performance_states.rst +++ b/Documentation/admin-guide/acpi/fan_performance_states.rst @@ -60,3 +60,31 @@ For example:: When a given field is not populated or its value provided by the platform firmware is invalid, the "not-defined" string is shown instead of the value. + +ACPI Fan Fine Grain Control +============================= + +When _FIF object specifies support for fine grain control, then fan speed +can be set from 0 to 100% with the recommended minimum "step size" via +_FSL object. User can adjust fan speed using thermal sysfs cooling device. + +Here use can look at fan performance states for a reference speed (speed_rpm) +and set it by changing cooling device cur_state. If the fine grain control +is supported then user can also adjust to some other speeds which are +not defined in the performance states. + +The support of fine grain control is presented via sysfs attribute +"fine_grain_control". If fine grain control is present, this attribute +will show "1" otherwise "0". + +This sysfs attribute is presented in the same directory as performance states. + +ACPI Fan Performance Feedback +============================= + +The optional _FST object provides status information for the fan device. +This includes field to provide current fan speed in revolutions per minute +at which the fan is rotating. + +This speed is presented in the sysfs using the attribute "fan_speed_rpm", +in the same directory as performance states. diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst index 3e11926a4df9..54fe63745ed8 100644 --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -315,8 +315,8 @@ To use the feature, admin should set up backing device via:: echo /dev/sda5 > /sys/block/zramX/backing_dev -before disksize setting. It supports only partition at this moment. -If admin wants to use incompressible page writeback, they could do via:: +before disksize setting. It supports only partitions at this moment. +If admin wants to use incompressible page writeback, they could do it via:: echo huge > /sys/block/zramX/writeback @@ -341,9 +341,9 @@ Admin can request writeback of those idle pages at right timing via:: echo idle > /sys/block/zramX/writeback -With the command, zram writeback idle pages from memory to the storage. +With the command, zram will writeback idle pages from memory to the storage. -If admin want to write a specific page in zram device to backing device, +If an admin wants to write a specific page in zram device to the backing device, they could write a page index into the interface. echo "page_index=1251" > /sys/block/zramX/writeback @@ -354,7 +354,7 @@ to guarantee storage health for entire product life. To overcome the concern, zram supports "writeback_limit" feature. The "writeback_limit_enable"'s default value is 0 so that it doesn't limit -any writeback. IOW, if admin wants to apply writeback budget, he should +any writeback. IOW, if admin wants to apply writeback budget, they should enable writeback_limit_enable via:: $ echo 1 > /sys/block/zramX/writeback_limit_enable @@ -365,7 +365,7 @@ until admin sets the budget via /sys/block/zramX/writeback_limit. (If admin doesn't enable writeback_limit_enable, writeback_limit's value assigned via /sys/block/zramX/writeback_limit is meaningless.) -If admin want to limit writeback as per-day 400M, he could do it +If admin wants to limit writeback as per-day 400M, they could do it like below:: $ MB_SHIFT=20 @@ -375,16 +375,16 @@ like below:: $ echo 1 > /sys/block/zram0/writeback_limit_enable If admins want to allow further write again once the budget is exhausted, -he could do it like below:: +they could do it like below:: $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ /sys/block/zram0/writeback_limit -If admin wants to see remaining writeback budget since last set:: +If an admin wants to see the remaining writeback budget since last set:: $ cat /sys/block/zramX/writeback_limit -If admin want to disable writeback limit, he could do:: +If an admin wants to disable writeback limit, they could do:: $ echo 0 > /sys/block/zramX/writeback_limit_enable @@ -393,7 +393,7 @@ system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of writeback happened until you reset the zram to allocate extra writeback budget in next setting is user's job. -If admin wants to measure writeback count in a certain period, he could +If admin wants to measure writeback count in a certain period, they could know it via /sys/block/zram0/bd_stat's 3rd column. memory tracking diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index faac50149a22..2cc502a75ef6 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -64,6 +64,7 @@ Brief summary of control files. threads cgroup.procs show list of processes cgroup.event_control an interface for event_fd() + This knob is not available on CONFIG_PREEMPT_RT systems. memory.usage_in_bytes show current usage for memory (See 5.5 for details) memory.memsw.usage_in_bytes show current usage for memory+Swap @@ -75,6 +76,7 @@ Brief summary of control files. memory.max_usage_in_bytes show max memory usage recorded memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded memory.soft_limit_in_bytes set/show soft limit of memory usage + This knob is not available on CONFIG_PREEMPT_RT systems. memory.stat show various statistics memory.use_hierarchy set/show hierarchical account enabled This knob is deprecated and shouldn't be diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 5aa368d165da..69d7a6983f78 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1301,6 +1301,11 @@ PAGE_SIZE multiple when read back. Amount of memory used to cache filesystem data, including tmpfs and shared memory. + kernel (npn) + Amount of total kernel memory, including + (kernel_stack, pagetables, percpu, vmalloc, slab) in + addition to other kernel memory use cases. + kernel_stack Amount of memory allocated to kernel stacks. diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index a2b22d5640ec..9e9556826450 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -60,8 +60,8 @@ privileged data touched during the speculative execution. Spectre variant 1 attacks take advantage of speculative execution of conditional branches, while Spectre variant 2 attacks use speculative execution of indirect branches to leak privileged memory. -See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>` -:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`. +See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[6] <spec_ref6>` +:ref:`[7] <spec_ref7>` :ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`. Spectre variant 1 (Bounds Check Bypass) --------------------------------------- @@ -131,6 +131,19 @@ steer its indirect branch speculations to gadget code, and measure the speculative execution's side effects left in level 1 cache to infer the victim's data. +Yet another variant 2 attack vector is for the attacker to poison the +Branch History Buffer (BHB) to speculatively steer an indirect branch +to a specific Branch Target Buffer (BTB) entry, even if the entry isn't +associated with the source address of the indirect branch. Specifically, +the BHB might be shared across privilege levels even in the presence of +Enhanced IBRS. + +Currently the only known real-world BHB attack vector is via +unprivileged eBPF. Therefore, it's highly recommended to not enable +unprivileged eBPF, especially when eIBRS is used (without retpolines). +For a full mitigation against BHB attacks, it's recommended to use +retpolines (or eIBRS combined with retpolines). + Attack scenarios ---------------- @@ -364,13 +377,15 @@ The possible values in this file are: - Kernel status: - ==================================== ================================= - 'Not affected' The processor is not vulnerable - 'Vulnerable' Vulnerable, no mitigation - 'Mitigation: Full generic retpoline' Software-focused mitigation - 'Mitigation: Full AMD retpoline' AMD-specific software mitigation - 'Mitigation: Enhanced IBRS' Hardware-focused mitigation - ==================================== ================================= + ======================================== ================================= + 'Not affected' The processor is not vulnerable + 'Mitigation: None' Vulnerable, no mitigation + 'Mitigation: Retpolines' Use Retpoline thunks + 'Mitigation: LFENCE' Use LFENCE instructions + 'Mitigation: Enhanced IBRS' Hardware-focused mitigation + 'Mitigation: Enhanced IBRS + Retpolines' Hardware-focused + Retpolines + 'Mitigation: Enhanced IBRS + LFENCE' Hardware-focused + LFENCE + ======================================== ================================= - Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is used to protect against Spectre variant 2 attacks when calling firmware (x86 only). @@ -583,12 +598,13 @@ kernel command line. Specific mitigations can also be selected manually: - retpoline - replace indirect branches - retpoline,generic - google's original retpoline - retpoline,amd - AMD-specific minimal thunk + retpoline auto pick between generic,lfence + retpoline,generic Retpolines + retpoline,lfence LFENCE; indirect branch + retpoline,amd alias for retpoline,lfence + eibrs enhanced IBRS + eibrs,retpoline enhanced IBRS + Retpolines + eibrs,lfence enhanced IBRS + LFENCE Not specifying this option is equivalent to spectre_v2=auto. @@ -599,7 +615,7 @@ kernel command line. spectre_v2=off. Spectre variant 1 mitigations cannot be disabled. -For spectre_v2_user see :doc:`/admin-guide/kernel-parameters`. +For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt Mitigation selection guide -------------------------- @@ -681,7 +697,7 @@ AMD white papers: .. _spec_ref6: -[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_. +[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf>`_. ARM white papers: diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 1bedab498104..5bfafcbb9562 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -35,6 +35,7 @@ problems and bugs in particular. :maxdepth: 1 reporting-issues + reporting-regressions security-bugs bug-hunting bug-bisect diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst index 9b14b0c2c9c4..609a3201fd4e 100644 --- a/Documentation/admin-guide/iostats.rst +++ b/Documentation/admin-guide/iostats.rst @@ -76,7 +76,7 @@ Field 3 -- # of sectors read (unsigned long) Field 4 -- # of milliseconds spent reading (unsigned int) This is the total number of milliseconds spent by all reads (as - measured from __make_request() to end_that_request_last()). + measured from blk_mq_alloc_request() to __blk_mq_end_request()). Field 5 -- # of writes completed (unsigned long) This is the total number of writes completed successfully. @@ -89,7 +89,7 @@ Field 7 -- # of sectors written (unsigned long) Field 8 -- # of milliseconds spent writing (unsigned int) This is the total number of milliseconds spent by all writes (as - measured from __make_request() to end_that_request_last()). + measured from blk_mq_alloc_request() to __blk_mq_end_request()). Field 9 -- # of I/Os currently in progress (unsigned int) The only field that should go to zero. Incremented as requests are @@ -120,7 +120,7 @@ Field 14 -- # of sectors discarded (unsigned long) Field 15 -- # of milliseconds spent discarding (unsigned int) This is the total number of milliseconds spent by all discards (as - measured from __make_request() to end_that_request_last()). + measured from blk_mq_alloc_request() to __blk_mq_end_request()). Field 16 -- # of flush requests completed This is the total number of flush requests completed successfully. diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index cb30ca3df27c..a748e7eb4429 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -146,9 +146,9 @@ System kernel config options CONFIG_SYSFS=y Note that "sysfs file system support" might not appear in the "Pseudo - filesystems" menu if "Configure standard kernel features (for small - systems)" is not enabled in "General Setup." In this case, check the - .config file itself to ensure that sysfs is turned on, as follows:: + filesystems" menu if "Configure standard kernel features (expert users)" + is not enabled in "General Setup." In this case, check the .config file + itself to ensure that sysfs is turned on, as follows:: grep 'CONFIG_SYSFS' .config @@ -533,6 +533,10 @@ the following command:: cp /proc/vmcore <dump-file> +or use scp to write out the dump file between hosts on a network, e.g:: + + scp /proc/vmcore remote_username@remote_ip:<dump-file> + You can also use makedumpfile utility to write out the dump file with specified options to filter out unwanted contents, e.g:: diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index 3861a25faae1..8419019b6a88 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -494,6 +494,14 @@ architecture which is used to lookup the page-tables for the Virtual addresses in the higher VA range (refer to ARMv8 ARM document for more details). +MODULES_VADDR|MODULES_END|VMALLOC_START|VMALLOC_END|VMEMMAP_START|VMEMMAP_END +----------------------------------------------------------------------------- + +Used to get the correct ranges: + MODULES_VADDR ~ MODULES_END-1 : Kernel module space. + VMALLOC_START ~ VMALLOC_END-1 : vmalloc() / ioremap() space. + VMEMMAP_START ~ VMEMMAP_END-1 : vmemmap region, used for struct page array. + arm === diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index f5a27f067db9..b7ccaa2ea867 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -724,6 +724,12 @@ hvc<n> Use the hypervisor console device <n>. This is for both Xen and PowerPC hypervisors. + { null | "" } + Use to disable console output, i.e., to have kernel + console messages discarded. + This must be the only console= parameter used on the + kernel command line. + If the device connected to the port is not a TTY but a braille device, prepend "brl," before the device type, for instance console=brl,ttyS0 @@ -944,6 +950,30 @@ dump out devices still on the deferred probe list after retrying. + dell_smm_hwmon.ignore_dmi= + [HW] Continue probing hardware even if DMI data + indicates that the driver is running on unsupported + hardware. + + dell_smm_hwmon.force= + [HW] Activate driver even if SMM BIOS signature does + not match list of supported models and enable otherwise + blacklisted features. + + dell_smm_hwmon.power_status= + [HW] Report power status in /proc/i8k + (disabled by default). + + dell_smm_hwmon.restricted= + [HW] Allow controlling fans only if SYS_ADMIN + capability is set. + + dell_smm_hwmon.fan_mult= + [HW] Factor to multiply fan speed with. + + dell_smm_hwmon.fan_max= + [HW] Maximum configurable fan speed. + dfltcc= [HW,S390] Format: { on | off | def_only | inf_only | always } on: s390 zlib hardware support for compression on @@ -1435,6 +1465,14 @@ as early as possible in order to facilitate early boot debugging. + ftrace_boot_snapshot + [FTRACE] On boot up, a snapshot will be taken of the + ftrace ring buffer that can be read at: + /sys/kernel/tracing/snapshot. + This is useful if you need tracing information from kernel + boot up that is likely to be overridden by user space + start up functionality. + ftrace_dump_on_oops[=orig_cpu] [FTRACE] will dump the trace buffers on oops. If no parameter is passed, ftrace will dump @@ -1625,7 +1663,7 @@ [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP enabled. Allows heavy hugetlb users to free up some more - memory (6 * PAGE_SIZE for each 2MB hugetlb page). + memory (7 * PAGE_SIZE for each 2MB hugetlb page). Format: { on | off (default) } on: enable the feature @@ -1703,17 +1741,6 @@ i810= [HW,DRM] - i8k.ignore_dmi [HW] Continue probing hardware even if DMI data - indicates that the driver is running on unsupported - hardware. - i8k.force [HW] Activate i8k driver even if SMM BIOS signature - does not match list of supported models. - i8k.power_status - [HW] Report power status in /proc/i8k - (disabled by default) - i8k.restricted [HW] Allow controlling fans only if SYS_ADMIN - capability is set. - i915.invert_brightness= [DRM] Invert the sense of the variable that is used to set the brightness of the panel backlight. Normally a @@ -2339,13 +2366,35 @@ kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) + kvm.eager_page_split= + [KVM,X86] Controls whether or not KVM will try to + proactively split all huge pages during dirty logging. + Eager page splitting reduces interruptions to vCPU + execution by eliminating the write-protection faults + and MMU lock contention that would otherwise be + required to split huge pages lazily. + + VM workloads that rarely perform writes or that write + only to a small region of VM memory may benefit from + disabling eager page splitting to allow huge pages to + still be used for reads. + + The behavior of eager page splitting depends on whether + KVM_DIRTY_LOG_INITIALLY_SET is enabled or disabled. If + disabled, all huge pages in a memslot will be eagerly + split when dirty logging is enabled on that memslot. If + enabled, eager page splitting will be performed during + the KVM_CLEAR_DIRTY ioctl, and only for the pages being + cleared. + + Eager page splitting currently only supports splitting + huge pages mapped by the TDP MMU. + + Default is Y (on). + kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface. Default is false (don't support). - kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit - KVM MMU at runtime. - Default is 0 (off) - kvm.nx_huge_pages= [KVM] Controls the software workaround for the X86_BUG_ITLB_MULTIHIT bug. @@ -2827,6 +2876,9 @@ For details see: Documentation/admin-guide/hw-vuln/mds.rst + mem=nn[KMG] [HEXAGON] Set the memory size. + Must be specified, otherwise memory size will be 0. + mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory Amount of memory to be used in cases as follows: @@ -2834,6 +2886,13 @@ 2 when the kernel is not able to see the whole system memory; 3 memory that lies after 'mem=' boundary is excluded from the hypervisor, then assigned to KVM guests. + 4 to limit the memory available for kdump kernel. + + [ARC,MICROBLAZE] - the limit applies only to low memory, + high memory is not affected. + + [ARM64] - only limits memory covered by the linear + mapping. The NOMAP regions are not affected. [X86] Work as limiting max address. Use together with memmap= to avoid physical address space collisions. @@ -2844,6 +2903,14 @@ in above case 3, memory may need be hot added after boot if system memory of hypervisor is not sufficient. + mem=nn[KMG]@ss[KMG] + [ARM,MIPS] - override the memory layout reported by + firmware. + Define a memory region of size nn[KMG] starting at + ss[KMG]. + Multiple different regions can be specified with + multiple mem= parameters on the command line. + mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel memory. @@ -3485,8 +3552,7 @@ difficult since unequal pointers can no longer be compared. However, if this command-line option is specified, then all normal pointers will have their true - value printed. Pointers printed via %pK may still be - hashed. This option should only be specified when + value printed. This option should only be specified when debugging the kernel. Please do not use on production kernels. @@ -3726,6 +3792,11 @@ bit 3: print locks info if CONFIG_LOCKDEP is on bit 4: print ftrace buffer bit 5: print all printk messages in buffer + bit 6: print all CPUs backtrace (if available in the arch) + *Be aware* that this option may print a _lot_ of lines, + so there are risks of losing older messages in the log. + Use this option carefully, maybe worth to setup a + bigger log buffer with "log_buf_len" along with this. panic_on_taint= Bitmask for conditionally calling panic() in add_taint() Format: <hex>[,nousertaint] @@ -4504,6 +4575,8 @@ (the least-favored priority). Otherwise, when RCU_BOOST is not set, valid values are 0-99 and the default is zero (non-realtime operation). + When RCU_NOCB_CPU is set, also adjust the + priority of NOCB callback kthreads. rcutree.rcu_nocb_gp_stride= [KNL] Set the number of NOCB callback kthreads in @@ -5361,8 +5434,12 @@ Specific mitigations can also be selected manually: retpoline - replace indirect branches - retpoline,generic - google's original retpoline - retpoline,amd - AMD-specific minimal thunk + retpoline,generic - Retpolines + retpoline,lfence - LFENCE; indirect branch + retpoline,amd - alias for retpoline,lfence + eibrs - enhanced IBRS + eibrs,retpoline - enhanced IBRS + Retpolines + eibrs,lfence - enhanced IBRS + LFENCE Not specifying this option is equivalent to spectre_v2=auto. diff --git a/Documentation/admin-guide/laptops/lg-laptop.rst b/Documentation/admin-guide/laptops/lg-laptop.rst index 6fbe165dcd27..67fd6932cef4 100644 --- a/Documentation/admin-guide/laptops/lg-laptop.rst +++ b/Documentation/admin-guide/laptops/lg-laptop.rst @@ -38,7 +38,7 @@ FN lock. Battery care limit ------------------ -Writing 80/100 to /sys/devices/platform/lg-laptop/battery_care_limit +Writing 80/100 to /sys/class/power_supply/CMB0/charge_control_end_threshold sets the maximum capacity to charge the battery. Limiting the charge reduces battery capacity loss over time. diff --git a/Documentation/admin-guide/media/fimc.rst b/Documentation/admin-guide/media/fimc.rst index 56b149d9a527..267ef52fe387 100644 --- a/Documentation/admin-guide/media/fimc.rst +++ b/Documentation/admin-guide/media/fimc.rst @@ -14,7 +14,7 @@ data from LCD controller (FIMD) through the SoC internal writeback data path. There are multiple FIMC instances in the SoCs (up to 4), having slightly different capabilities, like pixel alignment constraints, rotator availability, LCD writeback support, etc. The driver is located at -drivers/media/platform/exynos4-is directory. +drivers/media/platform/samsung/exynos4-is directory. Supported SoCs -------------- diff --git a/Documentation/admin-guide/media/i2c-cardlist.rst b/Documentation/admin-guide/media/i2c-cardlist.rst index db17f39b56cf..ef3b5fff3b01 100644 --- a/Documentation/admin-guide/media/i2c-cardlist.rst +++ b/Documentation/admin-guide/media/i2c-cardlist.rst @@ -284,7 +284,7 @@ tda9887 TDA 9885/6/7 analog IF demodulator tea5761 TEA 5761 radio tuner tea5767 TEA 5767 radio tuner tua9001 Infineon TUA9001 silicon tuner -tuner-xc2028 XCeive xc2028/xc3028 tuners +xc2028 XCeive xc2028/xc3028 tuners xc4000 Xceive XC4000 silicon tuner xc5000 Xceive XC5000 silicon tuner ============ ================================================== diff --git a/Documentation/admin-guide/media/imx7.rst b/Documentation/admin-guide/media/imx7.rst index 4785ae8ac978..2fa27718f52a 100644 --- a/Documentation/admin-guide/media/imx7.rst +++ b/Documentation/admin-guide/media/imx7.rst @@ -33,7 +33,7 @@ reference manual [#f1]_. Entities -------- -imx7-mipi-csi2 +imx-mipi-csi2 -------------- This is the MIPI CSI-2 receiver entity. It has one sink pad to receive the pixel diff --git a/Documentation/admin-guide/media/omap3isp.rst b/Documentation/admin-guide/media/omap3isp.rst index bc447bbec7ce..f32e7375a1a2 100644 --- a/Documentation/admin-guide/media/omap3isp.rst +++ b/Documentation/admin-guide/media/omap3isp.rst @@ -17,7 +17,7 @@ Introduction ------------ This file documents the Texas Instruments OMAP 3 Image Signal Processor (ISP) -driver located under drivers/media/platform/omap3isp. The original driver was +driver located under drivers/media/platform/ti/omap3isp. The original driver was written by Texas Instruments but since that it has been rewritten (twice) at Nokia. diff --git a/Documentation/admin-guide/media/omap4_camera.rst b/Documentation/admin-guide/media/omap4_camera.rst index 24db4222d36d..2ada9b1e6897 100644 --- a/Documentation/admin-guide/media/omap4_camera.rst +++ b/Documentation/admin-guide/media/omap4_camera.rst @@ -25,7 +25,7 @@ As of Revision AB, the ISS is described in detail in section 8. This driver is supporting **only** the CSI2-A/B interfaces for now. It makes use of the Media Controller framework [#f2]_, and inherited most of the -code from OMAP3 ISP driver (found under drivers/media/platform/omap3isp/\*), +code from OMAP3 ISP driver (found under drivers/media/platform/ti/omap3isp/\*), except that it doesn't need an IOMMU now for ISS buffers memory mapping. Supports usage of MMAP buffers only (for now). diff --git a/Documentation/admin-guide/media/vimc.rst b/Documentation/admin-guide/media/vimc.rst index 180507d455f2..0b07f05dde25 100644 --- a/Documentation/admin-guide/media/vimc.rst +++ b/Documentation/admin-guide/media/vimc.rst @@ -76,3 +76,16 @@ vimc-capture: * 1 Pad sink * 1 Pad source + +Module options +-------------- + +Vimc has a module parameter to configure the driver. + +* ``allocator=<unsigned int>`` + + memory allocator selection, default is 0. It specifies the way buffers + will be allocated. + + - 0: vmalloc + - 1: dma-contig diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 59b84904a854..592ea9a50881 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -4,7 +4,7 @@ Detailed Usages =============== -DAMON provides below three interfaces for different users. +DAMON provides below interfaces for different users. - *DAMON user space tool.* `This <https://github.com/awslabs/damo>`_ is for privileged people such as @@ -14,17 +14,21 @@ DAMON provides below three interfaces for different users. virtual and physical address spaces monitoring. For more detail, please refer to its `usage document <https://github.com/awslabs/damo/blob/next/USAGE.md>`_. -- *debugfs interface.* - :ref:`This <debugfs_interface>` is for privileged user space programmers who +- *sysfs interface.* + :ref:`This <sysfs_interface>` is for privileged user space programmers who want more optimized use of DAMON. Using this, users can use DAMON’s major - features by reading from and writing to special debugfs files. Therefore, - you can write and use your personalized DAMON debugfs wrapper programs that - reads/writes the debugfs files instead of you. The `DAMON user space tool + features by reading from and writing to special sysfs files. Therefore, + you can write and use your personalized DAMON sysfs wrapper programs that + reads/writes the sysfs files instead of you. The `DAMON user space tool <https://github.com/awslabs/damo>`_ is one example of such programs. It supports both virtual and physical address spaces monitoring. Note that this interface provides only simple :ref:`statistics <damos_stats>` for the monitoring results. For detailed monitoring results, DAMON provides a :ref:`tracepoint <tracepoint>`. +- *debugfs interface.* + :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface + <sysfs_interface>`. This will be removed after next LTS kernel is released, + so users should move to the :ref:`sysfs interface <sysfs_interface>`. - *Kernel Space Programming Interface.* :doc:`This </vm/damon/api>` is for kernel space programmers. Using this, users can utilize every feature of DAMON most flexibly and efficiently by @@ -32,6 +36,340 @@ DAMON provides below three interfaces for different users. DAMON for various address spaces. For detail, please refer to the interface :doc:`document </vm/damon/api>`. +.. _sysfs_interface: + +sysfs Interface +=============== + +DAMON sysfs interface is built when ``CONFIG_DAMON_SYSFS`` is defined. It +creates multiple directories and files under its sysfs directory, +``<sysfs>/kernel/mm/damon/``. You can control DAMON by writing to and reading +from the files under the directory. + +For a short example, users can monitor the virtual address space of a given +workload as below. :: + + # cd /sys/kernel/mm/damon/admin/ + # echo 1 > kdamonds/nr && echo 1 > kdamonds/0/contexts/nr + # echo vaddr > kdamonds/0/contexts/0/operations + # echo 1 > kdamonds/0/contexts/0/targets/nr + # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid + # echo on > kdamonds/0/state + +Files Hierarchy +--------------- + +The files hierarchy of DAMON sysfs interface is shown below. In the below +figure, parents-children relations are represented with indentations, each +directory is having ``/`` suffix, and files in each directory are separated by +comma (","). :: + + /sys/kernel/mm/damon/admin + │ kdamonds/nr_kdamonds + │ │ 0/state,pid + │ │ │ contexts/nr_contexts + │ │ │ │ 0/operations + │ │ │ │ │ monitoring_attrs/ + │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us + │ │ │ │ │ │ nr_regions/min,max + │ │ │ │ │ targets/nr_targets + │ │ │ │ │ │ 0/pid_target + │ │ │ │ │ │ │ regions/nr_regions + │ │ │ │ │ │ │ │ 0/start,end + │ │ │ │ │ │ │ │ ... + │ │ │ │ │ │ ... + │ │ │ │ │ schemes/nr_schemes + │ │ │ │ │ │ 0/action + │ │ │ │ │ │ │ access_pattern/ + │ │ │ │ │ │ │ │ sz/min,max + │ │ │ │ │ │ │ │ nr_accesses/min,max + │ │ │ │ │ │ │ │ age/min,max + │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms + │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil + │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low + │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds + │ │ │ │ │ │ ... + │ │ │ │ ... + │ │ ... + +Root +---- + +The root of the DAMON sysfs interface is ``<sysfs>/kernel/mm/damon/``, and it +has one directory named ``admin``. The directory contains the files for +privileged user space programs' control of DAMON. User space tools or deamons +having the root permission could use this directory. + +kdamonds/ +--------- + +The monitoring-related information including request specifications and results +are called DAMON context. DAMON executes each context with a kernel thread +called kdamond, and multiple kdamonds could run in parallel. + +Under the ``admin`` directory, one directory, ``kdamonds``, which has files for +controlling the kdamonds exist. In the beginning, this directory has only one +file, ``nr_kdamonds``. Writing a number (``N``) to the file creates the number +of child directories named ``0`` to ``N-1``. Each directory represents each +kdamond. + +kdamonds/<N>/ +------------- + +In each kdamond directory, two files (``state`` and ``pid``) and one directory +(``contexts``) exist. + +Reading ``state`` returns ``on`` if the kdamond is currently running, or +``off`` if it is not running. Writing ``on`` or ``off`` makes the kdamond be +in the state. Writing ``update_schemes_stats`` to ``state`` file updates the +contents of stats files for each DAMON-based operation scheme of the kdamond. +For details of the stats, please refer to :ref:`stats section +<sysfs_schemes_stats>`. + +If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. + +``contexts`` directory contains files for controlling the monitoring contexts +that this kdamond will execute. + +kdamonds/<N>/contexts/ +---------------------- + +In the beginning, this directory has only one file, ``nr_contexts``. Writing a +number (``N``) to the file creates the number of child directories named as +``0`` to ``N-1``. Each directory represents each monitoring context. At the +moment, only one context per kdamond is supported, so only ``0`` or ``1`` can +be written to the file. + +contexts/<N>/ +------------- + +In each context directory, one file (``operations``) and three directories +(``monitoring_attrs``, ``targets``, and ``schemes``) exist. + +DAMON supports multiple types of monitoring operations, including those for +virtual address space and the physical address space. You can set and get what +type of monitoring operations DAMON will use for the context by writing one of +below keywords to, and reading from the file. + + - vaddr: Monitor virtual address spaces of specific processes + - paddr: Monitor the physical address space of the system + +contexts/<N>/monitoring_attrs/ +------------------------------ + +Files for specifying attributes of the monitoring including required quality +and efficiency of the monitoring are in ``monitoring_attrs`` directory. +Specifically, two directories, ``intervals`` and ``nr_regions`` exist in this +directory. + +Under ``intervals`` directory, three files for DAMON's sampling interval +(``sample_us``), aggregation interval (``aggr_us``), and update interval +(``update_us``) exist. You can set and get the values in micro-seconds by +writing to and reading from the files. + +Under ``nr_regions`` directory, two files for the lower-bound and upper-bound +of DAMON's monitoring regions (``min`` and ``max``, respectively), which +controls the monitoring overhead, exist. You can set and get the values by +writing to and rading from the files. + +For more details about the intervals and monitoring regions range, please refer +to the Design document (:doc:`/vm/damon/design`). + +contexts/<N>/targets/ +--------------------- + +In the beginning, this directory has only one file, ``nr_targets``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each monitoring target. + +targets/<N>/ +------------ + +In each target directory, one file (``pid_target``) and one directory +(``regions``) exist. + +If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should +be a process. You can specify the process to DAMON by writing the pid of the +process to the ``pid_target`` file. + +targets/<N>/regions +------------------- + +When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to +the ``contexts/<N>/operations`` file), DAMON automatically sets and updates the +monitoring target regions so that entire memory mappings of target processes +can be covered. However, users could want to set the initial monitoring region +to specific address ranges. + +In contrast, DAMON do not automatically sets and updates the monitoring target +regions when ``paddr`` monitoring operations set is being used (``paddr`` is +written to the ``contexts/<N>/operations``). Therefore, users should set the +monitoring target regions by themselves in the case. + +For such cases, users can explicitly set the initial monitoring target regions +as they want, by writing proper values to the files under this directory. + +In the beginning, this directory has only one file, ``nr_regions``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each initial monitoring target region. + +regions/<N>/ +------------ + +In each region directory, you will find two files (``start`` and ``end``). You +can set and get the start and end addresses of the initial monitoring target +region by writing to and reading from the files, respectively. + +contexts/<N>/schemes/ +--------------------- + +For usual DAMON-based data access aware memory management optimizations, users +would normally want the system to apply a memory management action to a memory +region of a specific access pattern. DAMON receives such formalized operation +schemes from the user and applies those to the target memory regions. Users +can get and set the schemes by reading from and writing to files under this +directory. + +In the beginning, this directory has only one file, ``nr_schemes``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each DAMON-based operation scheme. + +schemes/<N>/ +------------ + +In each scheme directory, four directories (``access_pattern``, ``quotas``, +``watermarks``, and ``stats``) and one file (``action``) exist. + +The ``action`` file is for setting and getting what action you want to apply to +memory regions having specific access pattern of the interest. The keywords +that can be written to and read from the file and their meaning are as below. + + - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED`` + - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD`` + - ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT`` + - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE`` + - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE`` + - ``stat``: Do nothing but count the statistics + +schemes/<N>/access_pattern/ +--------------------------- + +The target access pattern of each DAMON-based operation scheme is constructed +with three ranges including the size of the region in bytes, number of +monitored accesses per aggregate interval, and number of aggregated intervals +for the age of the region. + +Under the ``access_pattern`` directory, three directories (``sz``, +``nr_accesses``, and ``age``) each having two files (``min`` and ``max``) +exist. You can set and get the access pattern for the given scheme by writing +to and reading from the ``min`` and ``max`` files under ``sz``, +``nr_accesses``, and ``age`` directories, respectively. + +schemes/<N>/quotas/ +------------------- + +Optimal ``target access pattern`` for each ``action`` is workload dependent, so +not easy to find. Worse yet, setting a scheme of some action too aggressive +can cause severe overhead. To avoid such overhead, users can limit time and +size quota for each scheme. In detail, users can ask DAMON to try to use only +up to specific time (``time quota``) for applying the action, and to apply the +action to only up to specific amount (``size quota``) of memory regions having +the target access pattern within a given time interval (``reset interval``). + +When the quota limit is expected to be exceeded, DAMON prioritizes found memory +regions of the ``target access pattern`` based on their size, access frequency, +and age. For personalized prioritization, users can set the weights for the +three properties. + +Under ``quotas`` directory, three files (``ms``, ``bytes``, +``reset_interval_ms``) and one directory (``weights``) having three files +(``sz_permil``, ``nr_accesses_permil``, and ``age_permil``) in it exist. + +You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and +``reset interval`` in milliseconds by writing the values to the three files, +respectively. You can also set the prioritization weights for size, access +frequency, and age in per-thousand unit by writing the values to the three +files under the ``weights`` directory. + +schemes/<N>/watermarks/ +----------------------- + +To allow easy activation and deactivation of each scheme based on system +status, DAMON provides a feature called watermarks. The feature receives five +values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The +``metric`` is the system metric such as free memory ratio that can be measured. +If the metric value of the system is higher than the value in ``high`` or lower +than ``low`` at the memoent, the scheme is deactivated. If the value is lower +than ``mid``, the scheme is activated. + +Under the watermarks directory, five files (``metric``, ``interval_us``, +``high``, ``mid``, and ``low``) for setting each value exist. You can set and +get the five values by writing to the files, respectively. + +Keywords and meanings of those that can be written to the ``metric`` file are +as below. + + - none: Ignore the watermarks + - free_mem_rate: System's free memory rate (per thousand) + +The ``interval`` should written in microseconds unit. + +.. _sysfs_schemes_stats: + +schemes/<N>/stats/ +------------------ + +DAMON counts the total number and bytes of regions that each scheme is tried to +be applied, the two numbers for the regions that each scheme is successfully +applied, and the total number of the quota limit exceeds. This statistics can +be used for online analysis or tuning of the schemes. + +The statistics can be retrieved by reading the files under ``stats`` directory +(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, and +``qt_exceeds``), respectively. The files are not updated in real time, so you +should ask DAMON sysfs interface to updte the content of the files for the +stats by writing a special keyword, ``update_schemes_stats`` to the relevant +``kdamonds/<N>/state`` file. + +Example +~~~~~~~ + +Below commands applies a scheme saying "If a memory region of size in [4KiB, +8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate +interval in [10, 20], page out the region. For the paging out, use only up to +10ms per second, and also don't page out more than 1GiB per second. Under the +limitation, page out memory regions having longer age first. Also, check the +free memory rate of the system every 5 seconds, start the monitoring and paging +out when the free memory rate becomes lower than 50%, but stop it if the free +memory rate becomes larger than 60%, or lower than 30%". :: + + # cd <sysfs>/kernel/mm/damon/admin + # # populate directories + # echo 1 > kdamonds/nr_kdamonds; echo 1 > kdamonds/0/contexts/nr_contexts; + # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes + # cd kdamonds/0/contexts/0/schemes/0 + # # set the basic access pattern and the action + # echo 4096 > access_patterns/sz/min + # echo 8192 > access_patterns/sz/max + # echo 0 > access_patterns/nr_accesses/min + # echo 5 > access_patterns/nr_accesses/max + # echo 10 > access_patterns/age/min + # echo 20 > access_patterns/age/max + # echo pageout > action + # # set quotas + # echo 10 > quotas/ms + # echo $((1024*1024*1024)) > quotas/bytes + # echo 1000 > quotas/reset_interval_ms + # # set watermark + # echo free_mem_rate > watermarks/metric + # echo 5000000 > watermarks/interval_us + # echo 600 > watermarks/high + # echo 500 > watermarks/mid + # echo 300 > watermarks/low + +Please note that it's highly recommended to use user space tools like `damo +<https://github.com/awslabs/damo>`_ rather than manually reading and writing +the files as above. Above is only for an example. .. _debugfs_interface: @@ -47,7 +385,7 @@ Attributes ---------- Users can get and set the ``sampling interval``, ``aggregation interval``, -``regions update interval``, and min/max number of monitoring target regions by +``update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. To know about the monitoring attributes in detail, please refer to the :doc:`/vm/damon/design`. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and @@ -108,24 +446,28 @@ In such cases, users can explicitly set the initial monitoring target regions as they want, by writing proper values to the ``init_regions`` file. Each line of the input should represent one region in below form.:: - <target id> <start address> <end address> + <target idx> <start address> <end address> -The ``target id`` should already in ``target_ids`` file, and the regions should -be passed in address order. For example, below commands will set a couple of -address ranges, ``1-100`` and ``100-200`` as the initial monitoring target -region of process 42, and another couple of address ranges, ``20-40`` and -``50-100`` as that of process 4242.:: +The ``target idx`` should be the index of the target in ``target_ids`` file, +starting from ``0``, and the regions should be passed in address order. For +example, below commands will set a couple of address ranges, ``1-100`` and +``100-200`` as the initial monitoring target region of pid 42, which is the +first one (index ``0``) in ``target_ids``, and another couple of address +ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one +(index ``1``) in ``target_ids``.:: # cd <debugfs>/damon - # echo "42 1 100 - 42 100 200 - 4242 20 40 - 4242 50 100" > init_regions + # cat target_ids + 42 4242 + # echo "0 1 100 + 0 100 200 + 1 20 40 + 1 50 100" > init_regions Note that this sets the initial monitoring target regions only. In case of virtual memory monitoring, DAMON will automatically updates the boundary of the -regions after one ``regions update interval``. Therefore, users should set the -``regions update interval`` large enough in this case, if they don't want the +regions after one ``update interval``. Therefore, users should set the +``update interval`` large enough in this case, if they don't want the update. diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst index bfc28704856c..6e2e416af783 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -23,7 +23,7 @@ There are four components to pagemap: * Bit 56 page exclusively mapped (since 4.2) * Bit 57 pte is uffd-wp write-protected (since 5.13) (see :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`) - * Bits 57-60 zero + * Bits 58-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) * Bit 62 page swapped * Bit 63 page present diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index 8edb8d578caf..6e6f7b0d6562 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -130,9 +130,25 @@ attribute, e.g.:: echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled When zswap same-filled page identification is disabled at runtime, it will stop -checking for the same-value filled pages during store operation. However, the -existing pages which are marked as same-value filled pages remain stored -unchanged in zswap until they are either loaded or invalidated. +checking for the same-value filled pages during store operation. +In other words, every page will be then considered non-same-value filled. +However, the existing pages which are marked as same-value filled pages remain +stored unchanged in zswap until they are either loaded or invalidated. + +In some circumstances it might be advantageous to make use of just the zswap +ability to efficiently store same-filled pages without enabling the whole +compressed page storage. +In this case the handling of non-same-value pages by zswap (enabled by default) +can be disabled by setting the ``non_same_filled_pages_enabled`` attribute +to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``. +It can also be enabled and disabled at runtime using the sysfs +``non_same_filled_pages_enabled`` attribute, e.g.:: + + echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled + +Disabling both ``zswap.same_filled_pages_enabled`` and +``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new +pages by zswap. To prevent zswap from shrinking pool when zswap is full and there's a high pressure on swap (this will result in flipping pages in and out zswap pool diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 5a8f2529a033..69b23f087c05 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -8,6 +8,7 @@ Performance monitor support :maxdepth: 1 hisi-pmu + hisi-pcie-pmu imx-ddr qcom_l2_pmu qcom_l3_pmu diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 2f066df4ee9c..1923cb25073b 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -369,6 +369,32 @@ governor (for the policies it is attached to), or by the ``CPUFreq`` core (for t policies with other scaling governors). +Tracer Tool +------------- + +``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then +generate performance plots. This utility can be used to debug and tune the +performance of ``amd-pstate`` driver. The tracer tool needs to import intel +pstate tracer. + +Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be +used in two ways. If trace file is available, then directly parse the file +with command :: + + ./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name> + +Or generate trace file with root privilege, then parse and plot with command :: + + sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes] + +The test result can be found in ``results/test_name``. Following is the example +about part of the output. :: + + common_cpu common_secs common_usecs min_perf des_perf max_perf freq mperf apef tsc load duration_ms sample_num elapsed_time common_comm + CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40 + CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264 + + Reference =========== diff --git a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst new file mode 100644 index 000000000000..09169d935835 --- /dev/null +++ b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst @@ -0,0 +1,60 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +============================== +Intel Uncore Frequency Scaling +============================== + +:Copyright: |copy| 2022 Intel Corporation + +:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> + +Introduction +------------ + +The uncore can consume significant amount of power in Intel's Xeon servers based +on the workload characteristics. To optimize the total power and improve overall +performance, SoCs have internal algorithms for scaling uncore frequency. These +algorithms monitor workload usage of uncore and set a desirable frequency. + +It is possible that users have different expectations of uncore performance and +want to have control over it. The objective is similar to allowing users to set +the scaling min/max frequencies via cpufreq sysfs to improve CPU performance. +Users may have some latency sensitive workloads where they do not want any +change to uncore frequency. Also, users may have workloads which require +different core and uncore performance at distinct phases and they may want to +use both cpufreq and the uncore scaling interface to distribute power and +improve overall performance. + +Sysfs Interface +--------------- + +To control uncore frequency, a sysfs interface is provided in the directory: +`/sys/devices/system/cpu/intel_uncore_frequency/`. + +There is one directory for each package and die combination as the scope of +uncore scaling control is per die in multiple die/package SoCs or per +package for single die per package SoCs. The name represents the +scope of control. For example: 'package_00_die_00' is for package id 0 and +die 0. + +Each package_*_die_* contains the following attributes: + +``initial_max_freq_khz`` + Out of reset, this attribute represent the maximum possible frequency. + This is a read-only attribute. If users adjust max_freq_khz, + they can always go back to maximum using the value from this attribute. + +``initial_min_freq_khz`` + Out of reset, this attribute represent the minimum possible frequency. + This is a read-only attribute. If users adjust min_freq_khz, + they can always go back to minimum using the value from this attribute. + +``max_freq_khz`` + This attribute is used to set the maximum uncore frequency. + +``min_freq_khz`` + This attribute is used to set the minimum uncore frequency. + +``current_freq_khz`` + This attribute is used to get the current uncore frequency. diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst index 5d2757e2de65..ee45887811ff 100644 --- a/Documentation/admin-guide/pm/working-state.rst +++ b/Documentation/admin-guide/pm/working-state.rst @@ -15,3 +15,4 @@ Working-State Power Management cpufreq_drivers intel_epb intel-speed-select + intel_uncore_frequency_scaling diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst index d7ac13f789cc..ec62151fe672 100644 --- a/Documentation/admin-guide/reporting-issues.rst +++ b/Documentation/admin-guide/reporting-issues.rst @@ -1,14 +1,5 @@ .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) -.. - If you want to distribute this text under CC-BY-4.0 only, please use 'The - Linux kernel developers' for author attribution and link this as source: - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst -.. - Note: Only the content of this RST file as found in the Linux kernel sources - is available under CC-BY-4.0, as versions of this text that were processed - (for example by the kernel's build system) might contain content taken from - files which use a more restrictive license. - +.. See the bottom of this file for additional redistribution information. Reporting issues ++++++++++++++++ @@ -395,22 +386,16 @@ fixed as soon as possible, hence there are 'issues of high priority' that get handled slightly differently in the reporting process. Three type of cases qualify: regressions, security issues, and really severe problems. -You deal with a 'regression' if something that worked with an older version of -the Linux kernel does not work with a newer one or somehow works worse with it. -It thus is a regression when a WiFi driver that did a fine job with Linux 5.7 -somehow misbehaves with 5.8 or doesn't work at all. It's also a regression if -an application shows erratic behavior with a newer kernel, which might happen -due to incompatible changes in the interface between the kernel and the -userland (like procfs and sysfs). Significantly reduced performance or -increased power consumption also qualify as regression. But keep in mind: the -new kernel needs to be built with a configuration that is similar to the one -from the old kernel (see below how to achieve that). That's because the kernel -developers sometimes can not avoid incompatibilities when implementing new -features; but to avoid regressions such features have to be enabled explicitly -during build time configuration. +You deal with a regression if some application or practical use case running +fine with one Linux kernel works worse or not at all with a newer version +compiled using a similar configuration. The document +Documentation/admin-guide/reporting-regressions.rst explains this in more +detail. It also provides a good deal of other information about regressions you +might want to be aware of; it for example explains how to add your issue to the +list of tracked regressions, to ensure it won't fall through the cracks. What qualifies as security issue is left to your judgment. Consider reading -'Documentation/admin-guide/security-bugs.rst' before proceeding, as it +Documentation/admin-guide/security-bugs.rst before proceeding, as it provides additional details how to best handle security issues. An issue is a 'really severe problem' when something totally unacceptably bad @@ -517,7 +502,7 @@ line starting with 'CPU:'. It should end with 'Not tainted' if the kernel was not tainted when it noticed the problem; it was tainted if you see 'Tainted:' followed by a few spaces and some letters. -If your kernel is tainted, study 'Documentation/admin-guide/tainted-kernels.rst' +If your kernel is tainted, study Documentation/admin-guide/tainted-kernels.rst to find out why. Try to eliminate the reason. Often it's caused by one these three things: @@ -1043,7 +1028,7 @@ down the culprit, as maintainers often won't have the time or setup at hand to reproduce it themselves. To find the change there is a process called 'bisection' which the document -'Documentation/admin-guide/bug-bisect.rst' describes in detail. That process +Documentation/admin-guide/bug-bisect.rst describes in detail. That process will often require you to build about ten to twenty kernel images, trying to reproduce the issue with each of them before building the next. Yes, that takes some time, but don't worry, it works a lot quicker than most people assume. @@ -1073,10 +1058,11 @@ When dealing with regressions make sure the issue you face is really caused by the kernel and not by something else, as outlined above already. In the whole process keep in mind: an issue only qualifies as regression if the -older and the newer kernel got built with a similar configuration. The best way -to archive this: copy the configuration file (``.config``) from the old working -kernel freshly to each newer kernel version you try. Afterwards run ``make -olddefconfig`` to adjust it for the needs of the new version. +older and the newer kernel got built with a similar configuration. This can be +achieved by using ``make olddefconfig``, as explained in more detail by +Documentation/admin-guide/reporting-regressions.rst; that document also +provides a good deal of other information about regressions you might want to be +aware of. Write and send the report @@ -1283,7 +1269,7 @@ them when sending the report by mail. If you filed it in a bug tracker, forward the report's text to these addresses; but on top of it put a small note where you mention that you filed it with a link to the ticket. -See 'Documentation/admin-guide/security-bugs.rst' for more information. +See Documentation/admin-guide/security-bugs.rst for more information. Duties after the report went out @@ -1571,7 +1557,7 @@ Once your report is out your might get asked to do a proper one, as it allows to pinpoint the exact change that causes the issue (which then can easily get reverted to fix the issue quickly). Hence consider to do a proper bisection right away if time permits. See the section 'Special care for regressions' and -the document 'Documentation/admin-guide/bug-bisect.rst' for details how to +the document Documentation/admin-guide/bug-bisect.rst for details how to perform one. In case of a successful bisection add the author of the culprit to the recipients; also CC everyone in the signed-off-by chain, which you find at the end of its commit message. @@ -1594,7 +1580,7 @@ Some fixes are too complex Even small and seemingly obvious code-changes sometimes introduce new and totally unexpected problems. The maintainers of the stable and longterm kernels are very aware of that and thus only apply changes to these kernels that are -within rules outlined in 'Documentation/process/stable-kernel-rules.rst'. +within rules outlined in Documentation/process/stable-kernel-rules.rst. Complex or risky changes for example do not qualify and thus only get applied to mainline. Other fixes are easy to get backported to the newest stable and @@ -1756,10 +1742,23 @@ art will lay some groundwork to improve the situation over time. .. - This text is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If you - spot a typo or small mistake, feel free to let him know directly and he'll - fix it. You are free to do the same in a mostly informal way if you want - to contribute changes to the text, but for copyright reasons please CC + end-of-content +.. + This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If + you spot a typo or small mistake, feel free to let him know directly and + he'll fix it. You are free to do the same in a mostly informal way if you + want to contribute changes to the text, but for copyright reasons please CC linux-doc@vger.kernel.org and "sign-off" your contribution as Documentation/process/submitting-patches.rst outlines in the section "Sign your work - the Developer's Certificate of Origin". +.. + This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top + of the file. If you want to distribute this text under CC-BY-4.0 only, + please use "The Linux kernel developers" for author attribution and link + this as source: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst +.. + Note: Only the content of this RST file as found in the Linux kernel sources + is available under CC-BY-4.0, as versions of this text that were processed + (for example by the kernel's build system) might contain content taken from + files which use a more restrictive license. diff --git a/Documentation/admin-guide/reporting-regressions.rst b/Documentation/admin-guide/reporting-regressions.rst new file mode 100644 index 000000000000..d8adccdae23f --- /dev/null +++ b/Documentation/admin-guide/reporting-regressions.rst @@ -0,0 +1,451 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) +.. [see the bottom of this file for redistribution information] + +Reporting regressions ++++++++++++++++++++++ + +"*We don't cause regressions*" is the first rule of Linux kernel development; +Linux founder and lead developer Linus Torvalds established it himself and +ensures it's obeyed. + +This document describes what the rule means for users and how the Linux kernel's +development model ensures to address all reported regressions; aspects relevant +for kernel developers are left to Documentation/process/handling-regressions.rst. + + +The important bits (aka "TL;DR") +================================ + +#. It's a regression if something running fine with one Linux kernel works worse + or not at all with a newer version. Note, the newer kernel has to be compiled + using a similar configuration; the detailed explanations below describes this + and other fine print in more detail. + +#. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst, + it already covers all aspects important for regressions and repeated + below for convenience. Two of them are important: start your report's subject + with "[REGRESSION]" and CC or forward it to `the regression mailing list + <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev). + +#. Optional, but recommended: when sending or forwarding your report, make the + Linux kernel regression tracking bot "regzbot" track the issue by specifying + when the regression started like this:: + + #regzbot introduced v5.13..v5.14-rc1 + + +All the details on Linux kernel regressions relevant for users +============================================================== + + +The important basics +-------------------- + + +What is a "regression" and what is the "no regressions rule"? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It's a regression if some application or practical use case running fine with +one Linux kernel works worse or not at all with a newer version compiled using a +similar configuration. The "no regressions rule" forbids this to take place; if +it happens by accident, developers that caused it are expected to quickly fix +the issue. + +It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with +5.14 doesn't work at all, works significantly slower, or misbehaves somehow. +It's also a regression if a perfectly working application suddenly shows erratic +behavior with a newer kernel version; such issues can be caused by changes in +procfs, sysfs, or one of the many other interfaces Linux provides to userland +software. But keep in mind, as mentioned earlier: 5.14 in this example needs to +be built from a configuration similar to the one from 5.13. This can be achieved +using ``make olddefconfig``, as explained in more detail below. + +Note the "practical use case" in the first sentence of this section: developers +despite the "no regressions" rule are free to change any aspect of the kernel +and even APIs or ABIs to userland, as long as no existing application or use +case breaks. + +Also be aware the "no regressions" rule covers only interfaces the kernel +provides to the userland. It thus does not apply to kernel-internal interfaces +like the module API, which some externally developed drivers use to hook into +the kernel. + +How do I report a regression? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Just report the issue as outlined in +Documentation/admin-guide/reporting-issues.rst, it already describes the +important points. The following aspects outlined there are especially relevant +for regressions: + + * When checking for existing reports to join, also search the `archives of the + Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and + `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. + + * Start your report's subject with "[REGRESSION]". + + * In your report, clearly mention the last kernel version that worked fine and + the first broken one. Ideally try to find the exact change causing the + regression using a bisection, as explained below in more detail. + + * Remember to let the Linux regressions mailing list + (regressions@lists.linux.dev) know about your report: + + * If you report the regression by mail, CC the regressions list. + + * If you report your regression to some bug tracker, forward the submitted + report by mail to the regressions list while CCing the maintainer and the + mailing list for the subsystem in question. + + If it's a regression within a stable or longterm series (e.g. + v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list + <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org). + + In case you performed a successful bisection, add everyone to the CC the + culprit's commit message mentions in lines starting with "Signed-off-by:". + +When CCing for forwarding your report to the list, consider directly telling the +aforementioned Linux kernel regression tracking bot about your report. To do +that, include a paragraph like this in your mail:: + + #regzbot introduced: v5.13..v5.14-rc1 + +Regzbot will then consider your mail a report for a regression introduced in the +specified version range. In above case Linux v5.13 still worked fine and Linux +v5.14-rc1 was the first version where you encountered the issue. If you +performed a bisection to find the commit that caused the regression, specify the +culprit's commit-id instead:: + + #regzbot introduced: 1f2e3d4c5d + +Placing such a "regzbot command" is in your interest, as it will ensure the +report won't fall through the cracks unnoticed. If you omit this, the Linux +kernel's regressions tracker will take care of telling regzbot about your +regression, as long as you send a copy to the regressions mailing lists. But the +regression tracker is just one human which sometimes has to rest or occasionally +might even enjoy some time away from computers (as crazy as that might sound). +Relying on this person thus will result in an unnecessary delay before the +regressions becomes mentioned `on the list of tracked and unresolved Linux +kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the +weekly regression reports sent by regzbot. Such delays can result in Linus +Torvalds being unaware of important regressions when deciding between "continue +development or call this finished and release the final?". + +Are really all regressions fixed? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Nearly all of them are, as long as the change causing the regression (the +"culprit commit") is reliably identified. Some regressions can be fixed without +this, but often it's required. + +Who needs to find the root cause of a regression? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Developers of the affected code area should try to locate the culprit on their +own. But for them that's often impossible to do with reasonable effort, as quite +a lot of issues only occur in a particular environment outside the developer's +reach -- for example, a specific hardware platform, firmware, Linux distro, +system's configuration, or application. That's why in the end it's often up to +the reporter to locate the culprit commit; sometimes users might even need to +run additional tests afterwards to pinpoint the exact root cause. Developers +should offer advice and reasonably help where they can, to make this process +relatively easy and achievable for typical users. + +How can I find the culprit? +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Perform a bisection, as roughly outlined in +Documentation/admin-guide/reporting-issues.rst and described in more detail by +Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but +in many cases finds the culprit relatively quickly. If it's hard or +time-consuming to reliably reproduce the issue, consider teaming up with other +affected users to narrow down the search range together. + +Who can I ask for advice when it comes to regressions? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Send a mail to the regressions mailing list (regressions@lists.linux.dev) while +CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the +issue might better be dealt with in private, feel free to omit the list. + + +Additional details about regressions +------------------------------------ + + +What is the goal of the "no regressions rule"? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Users should feel safe when updating kernel versions and not have to worry +something might break. This is in the interest of the kernel developers to make +updating attractive: they don't want users to stay on stable or longterm Linux +series that are either abandoned or more than one and a half years old. That's +in everybody's interest, as `those series might have known bugs, security +issues, or other problematic aspects already fixed in later versions +<http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_. +Additionally, the kernel developers want to make it simple and appealing for +users to test the latest pre-release or regular release. That's also in +everybody's interest, as it's a lot easier to track down and fix problems, if +they are reported shortly after being introduced. + +Is the "no regressions" rule really adhered in practice? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It's taken really seriously, as can be seen by many mailing list posts from +Linux creator and lead developer Linus Torvalds, some of which are quoted in +Documentation/process/handling-regressions.rst. + +Exceptions to this rule are extremely rare; in the past developers almost always +turned out to be wrong when they assumed a particular situation was warranting +an exception. + +Who ensures the "no regressions" is actually followed? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The subsystem maintainers should take care of that, which are watched and +supported by the tree maintainers -- e.g. Linus Torvalds for mainline and +Greg Kroah-Hartman et al. for various stable/longterm series. + +All of them are helped by people trying to ensure no regression report falls +through the cracks. One of them is Thorsten Leemhuis, who's currently acting as +the Linux kernel's "regressions tracker"; to facilitate this work he relies on +regzbot, the Linux kernel regression tracking bot. That's why you want to bring +your report on the radar of these people by CCing or forwarding each report to +the regressions mailing list, ideally with a "regzbot command" in your mail to +get it tracked immediately. + +How quickly are regressions normally fixed? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Developers should fix any reported regression as quickly as possible, to provide +affected users with a solution in a timely manner and prevent more users from +running into the issue; nevertheless developers need to take enough time and +care to ensure regression fixes do not cause additional damage. + +The answer thus depends on various factors like the impact of a regression, its +age, or the Linux series in which it occurs. In the end though, most regressions +should be fixed within two weeks. + +Is it a regression, if the issue can be avoided by updating some software? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Almost always: yes. If a developer tells you otherwise, ask the regression +tracker for advice as outlined above. + +Is it a regression, if a newer kernel works slower or consumes more energy? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Yes, but the difference has to be significant. A five percent slow-down in a +micro-benchmark thus is unlikely to qualify as regression, unless it also +influences the results of a broad benchmark by more than one percent. If in +doubt, ask for advice. + +Is it a regression, if an external kernel module breaks when updating Linux? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +No, as the "no regression" rule is about interfaces and services the Linux +kernel provides to the userland. It thus does not cover building or running +externally developed kernel modules, as they run in kernel-space and hook into +the kernel using internal interfaces occasionally changed. + +How are regressions handled that are caused by security fixes? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In extremely rare situations security issues can't be fixed without causing +regressions; those fixes are given way, as they are the lesser evil in the end. +Luckily this middling almost always can be avoided, as key developers for the +affected area and often Linus Torvalds himself try very hard to fix security +issues without causing regressions. + +If you nevertheless face such a case, check the mailing list archives if people +tried their best to avoid the regression. If not, report it; if in doubt, ask +for advice as outlined above. + +What happens if fixing a regression is impossible without causing another? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Sadly these things happen, but luckily not very often; if they occur, expert +developers of the affected code area should look into the issue to find a fix +that avoids regressions or at least their impact. If you run into such a +situation, do what was outlined already for regressions caused by security +fixes: check earlier discussions if people already tried their best and ask for +advice if in doubt. + +A quick note while at it: these situations could be avoided, if people would +regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each +development cycle a test run. This is best explained by imagining a change +integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at +the same time is a hard requirement for some other improvement applied for +5.15-rc1. All these changes often can simply be reverted and the regression thus +solved, if someone finds and reports it before 5.15 is released. A few days or +weeks later this solution can become impossible, as some software might have +started to rely on aspects introduced by one of the follow-up changes: reverting +all changes would then cause a regression for users of said software and thus is +out of the question. + +Is it a regression, if some feature I relied on was removed months ago? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is, but often it's hard to fix such regressions due to the aspects outlined +in the previous section. It hence needs to be dealt with on a case-by-case +basis. This is another reason why it's in everybody's interest to regularly test +mainline pre-releases. + +Does the "no regression" rule apply if I seem to be the only affected person? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It does, but only for practical usage: the Linux developers want to be free to +remove support for hardware only to be found in attics and museums anymore. + +Note, sometimes regressions can't be avoided to make progress -- and the latter +is needed to prevent Linux from stagnation. Hence, if only very few users seem +to be affected by a regression, it for the greater good might be in their and +everyone else's interest to lettings things pass. Especially if there is an +easy way to circumvent the regression somehow, for example by updating some +software or using a kernel parameter created just for this purpose. + +Does the regression rule apply for code in the staging tree as well? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Not according to the `help text for the configuration option covering all +staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_, +which since its early days states:: + + Please note that these drivers are under heavy development, may or + may not work, and may contain userspace interfaces that most likely + will be changed in the near future. + +The staging developers nevertheless often adhere to the "no regressions" rule, +but sometimes bend it to make progress. That's for example why some users had to +deal with (often negligible) regressions when a WiFi driver from the staging +tree was replaced by a totally different one written from scratch. + +Why do later versions have to be "compiled with a similar configuration"? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Because the Linux kernel developers sometimes integrate changes known to cause +regressions, but make them optional and disable them in the kernel's default +configuration. This trick allows progress, as the "no regressions" rule +otherwise would lead to stagnation. + +Consider for example a new security feature blocking access to some kernel +interfaces often abused by malware, which at the same time are required to run a +few rarely used applications. The outlined approach makes both camps happy: +people using these applications can leave the new security feature off, while +everyone else can enable it without running into trouble. + +How to create a configuration similar to the one of an older kernel? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Start your machine with a known-good kernel and configure the newer Linux +version with ``make olddefconfig``. This makes the kernel's build scripts pick +up the configuration file (the ".config" file) from the running kernel as base +for the new one you are about to compile; afterwards they set all new +configuration options to their default value, which should disable new features +that might cause regressions. + +Can I report a regression I found with pre-compiled vanilla kernels? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You need to ensure the newer kernel was compiled with a similar configuration +file as the older one (see above), as those that built them might have enabled +some known-to-be incompatible feature for the newer kernel. If in doubt, report +the matter to the kernel's provider and ask for advice. + + +More about regression tracking with "regzbot" +--------------------------------------------- + +What is regression tracking and why should I care about it? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Rules like "no regressions" need someone to ensure they are followed, otherwise +they are broken either accidentally or on purpose. History has shown this to be +true for Linux kernel development as well. That's why Thorsten Leemhuis, the +Linux Kernel's regression tracker, and some people try to ensure all regression +are fixed by keeping an eye on them until they are resolved. Neither of them are +paid for this, that's why the work is done on a best effort basis. + +Why and how are Linux kernel regressions tracked using a bot? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Tracking regressions completely manually has proven to be quite hard due to the +distributed and loosely structured nature of Linux kernel development process. +That's why the Linux kernel's regression tracker developed regzbot to facilitate +the work, with the long term goal to automate regression tracking as much as +possible for everyone involved. + +Regzbot works by watching for replies to reports of tracked regressions. +Additionally, it's looking out for posted or committed patches referencing such +reports with "Link:" tags; replies to such patch postings are tracked as well. +Combined this data provides good insights into the current state of the fixing +process. + +How to see which regressions regzbot tracks currently? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. + +What kind of issues are supposed to be tracked by regzbot? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The bot is meant to track regressions, hence please don't involve regzbot for +regular issues. But it's okay for the Linux kernel's regression tracker if you +involve regzbot to track severe issues, like reports about hangs, corrupted +data, or internal errors (Panic, Oops, BUG(), warning, ...). + +How to change aspects of a tracked regression? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By using a 'regzbot command' in a direct or indirect reply to the mail with the +report. The easiest way to do that: find the report in your "Sent" folder or the +mailing list archive and reply to it using your mailer's "Reply-all" function. +In that mail, use one of the following commands in a stand-alone paragraph (IOW: +use blank lines to separate one or multiple of these commands from the rest of +the mail's text). + + * Update when the regression started to happen, for example after performing a + bisection:: + + #regzbot introduced: 1f2e3d4c5d + + * Set or update the title:: + + #regzbot title: foo + + * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of + the issue or a fix are discussed::: + + #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/ + #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 + + * Point to a place with further details of interest, like a mailing list post + or a ticket in a bug tracker that are slightly related, but about a different + topic:: + + #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 + + * Mark a regression as invalid:: + + #regzbot invalid: wasn't a regression, problem has always existed + +Regzbot supports a few other commands primarily used by developers or people +tracking regressions. They and more details about the aforementioned regzbot +commands can be found in the `getting started guide +<https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and +the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ +for regzbot. + +.. + end-of-content +.. + This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top + of the file. If you want to distribute this text under CC-BY-4.0 only, + please use "The Linux kernel developers" for author attribution and link + this as source: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst +.. + Note: Only the content of this RST file as found in the Linux kernel sources + is available under CC-BY-4.0, as versions of this text that were processed + (for example by the kernel's build system) might contain content taken from + files which use a more restrictive license. diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index d359bcfadd39..1144ea3229a3 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -595,65 +595,33 @@ Documentation/admin-guide/kernel-parameters.rst). numa_balancing ============== -Enables/disables automatic page fault based NUMA memory -balancing. Memory is moved automatically to nodes -that access it often. +Enables/disables and configures automatic page fault based NUMA memory +balancing. Memory is moved automatically to nodes that access it often. +The value to set can be the result of ORing the following: -Enables/disables automatic NUMA memory balancing. On NUMA machines, there -is a performance penalty if remote memory is accessed by a CPU. When this -feature is enabled the kernel samples what task thread is accessing memory -by periodically unmapping pages and later trapping a page fault. At the -time of the page fault, it is determined if the data being accessed should -be migrated to a local memory node. += ================================= +0 NUMA_BALANCING_DISABLED +1 NUMA_BALANCING_NORMAL +2 NUMA_BALANCING_MEMORY_TIERING += ================================= + +Or NUMA_BALANCING_NORMAL to optimize page placement among different +NUMA nodes to reduce remote accessing. On NUMA machines, there is a +performance penalty if remote memory is accessed by a CPU. When this +feature is enabled the kernel samples what task thread is accessing +memory by periodically unmapping pages and later trapping a page +fault. At the time of the page fault, it is determined if the data +being accessed should be migrated to a local memory node. The unmapping of pages and trapping faults incur additional overhead that ideally is offset by improved memory locality but there is no universal guarantee. If the target workload is already bound to NUMA nodes then this -feature should be disabled. Otherwise, if the system overhead from the -feature is too high then the rate the kernel samples for NUMA hinting -faults may be controlled by the `numa_balancing_scan_period_min_ms, -numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, -numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. - - -numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb -=============================================================================================================================== - - -Automatic NUMA balancing scans tasks address space and unmaps pages to -detect if pages are properly placed or if the data should be migrated to a -memory node local to where the task is running. Every "scan delay" the task -scans the next "scan size" number of pages in its address space. When the -end of the address space is reached the scanner restarts from the beginning. - -In combination, the "scan delay" and "scan size" determine the scan rate. -When "scan delay" decreases, the scan rate increases. The scan delay and -hence the scan rate of every task is adaptive and depends on historical -behaviour. If pages are properly placed then the scan delay increases, -otherwise the scan delay decreases. The "scan size" is not adaptive but -the higher the "scan size", the higher the scan rate. - -Higher scan rates incur higher system overhead as page faults must be -trapped and potentially data must be migrated. However, the higher the scan -rate, the more quickly a tasks memory is migrated to a local node if the -workload pattern changes and minimises performance impact due to remote -memory accesses. These sysctls control the thresholds for scan delays and -the number of pages scanned. - -``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to -scan a tasks virtual memory. It effectively controls the maximum scanning -rate for each task. - -``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task -when it initially forks. - -``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to -scan a tasks virtual memory. It effectively controls the minimum scanning -rate for each task. - -``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are -scanned for a given scan. +feature should be disabled. +Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among +different types of memory (represented as different NUMA nodes) to +place the hot pages in the fast memory. This is implemented based on +unmapping and page fault too. oops_all_cpu_backtrace ====================== @@ -795,6 +763,8 @@ bit 1 print system memory info bit 2 print timer info bit 3 print locks info if ``CONFIG_LOCKDEP`` is on bit 4 print ftrace buffer +bit 5 print all printk messages in buffer +bit 6 print all CPUs backtrace (if available in the arch) ===== ============================================ So for example to print tasks and memory info on panic, user can:: @@ -1029,23 +999,17 @@ This is a directory, with the following entries: * ``poolsize``: the entropy pool size, in bits; * ``urandom_min_reseed_secs``: obsolete (used to determine the minimum - number of seconds between urandom pool reseeding). + number of seconds between urandom pool reseeding). This file is + writable for compatibility purposes, but writing to it has no effect + on any RNG behavior. * ``uuid``: a UUID generated every time this is retrieved (this can thus be used to generate UUIDs at will); * ``write_wakeup_threshold``: when the entropy count drops below this (as a number of bits), processes waiting to write to ``/dev/random`` - are woken up. - -If ``drivers/char/random.c`` is built with ``ADD_INTERRUPT_BENCH`` -defined, these additional entries are present: - -* ``add_interrupt_avg_cycles``: the average number of cycles between - interrupts used to feed the pool; - -* ``add_interrupt_avg_deviation``: the standard deviation seen on the - number of cycles between interrupts used to feed the pool. + are woken up. This file is writable for compatibility purposes, but + writing to it has no effect on any RNG behavior. randomize_va_space diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 4150f74c521a..f86b5e1623c6 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -365,6 +365,15 @@ new netns has been created. Default : 0 (for compatibility reasons) +txrehash +-------- + +Controls default hash rethink behaviour on listening socket when SO_TXREHASH +option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt). + +If set to 1 (default), hash rethink is performed on listening socket. +If set to 0, hash rethink is not performed. + 2. /proc/sys/net/unix - Parameters for Unix domain sockets ---------------------------------------------------------- |