diff options
author | Linus Torvalds | 2022-03-25 12:34:53 -0700 |
---|---|---|
committer | Linus Torvalds | 2022-03-25 12:34:53 -0700 |
commit | 636f64db07f33a18630248b4c57e182cd315b0da (patch) | |
tree | b27478715e415b5324924e0f6fccc47f28899c0a /Documentation | |
parent | ebcb577aee1448fd60904fc4126cbf7ec012bd0b (diff) | |
parent | 7f1b8e0d6360178e3527d4f14e6921c254a86035 (diff) |
Merge tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- More noinstr fixes
- Add an erratum workaround for Intel CPUs which, in certain
circumstances, end up consuming an unrelated uncorrectable memory
error when using fast string copy insns
- Remove the MCE tolerance level control as it is not really needed or
used anymore
* tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Remove the tolerance level control
x86/mce: Work around an erratum on fast string copy instructions
x86/mce: Use arch atomic and bit helpers
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/removed/sysfs-mce | 37 | ||||
-rw-r--r-- | Documentation/ABI/testing/sysfs-mce | 32 | ||||
-rw-r--r-- | Documentation/vm/hwpoison.rst | 2 | ||||
-rw-r--r-- | Documentation/x86/x86_64/boot-options.rst | 9 |
4 files changed, 38 insertions, 42 deletions
diff --git a/Documentation/ABI/removed/sysfs-mce b/Documentation/ABI/removed/sysfs-mce new file mode 100644 index 000000000000..ef5dd2a80918 --- /dev/null +++ b/Documentation/ABI/removed/sysfs-mce @@ -0,0 +1,37 @@ +What: /sys/devices/system/machinecheck/machinecheckX/tolerant +Contact: Borislav Petkov <bp@suse.de> +Date: Dec, 2021 +Description: + Unused and obsolete after the advent of recoverable machine + checks (see last sentence below) and those are present since + 2010 (Nehalem). + + Original description: + + The entries appear for each CPU, but they are truly shared + between all CPUs. + + Tolerance level. When a machine check exception occurs for a + non corrected machine check the kernel can take different + actions. + + Since machine check exceptions can happen any time it is + sometimes risky for the kernel to kill a process because it + defies normal kernel locking rules. The tolerance level + configures how hard the kernel tries to recover even at some + risk of deadlock. Higher tolerant values trade potentially + better uptime with the risk of a crash or even corruption + (for tolerant >= 3). + + == =========================================================== + 0 always panic on uncorrected errors, log corrected errors + 1 panic or SIGBUS on uncorrected errors, log corrected errors + 2 SIGBUS or log uncorrected errors, log corrected errors + 3 never panic or SIGBUS, log all errors (for testing only) + == =========================================================== + + Default: 1 + + Note this only makes a difference if the CPU allows recovery + from a machine check exception. Current x86 CPUs generally + do not. diff --git a/Documentation/ABI/testing/sysfs-mce b/Documentation/ABI/testing/sysfs-mce index c8cd989034b4..83172f50e27c 100644 --- a/Documentation/ABI/testing/sysfs-mce +++ b/Documentation/ABI/testing/sysfs-mce @@ -53,38 +53,6 @@ Description: (but some corrected errors might be still reported in other ways) -What: /sys/devices/system/machinecheck/machinecheckX/tolerant -Contact: Andi Kleen <ak@linux.intel.com> -Date: Feb, 2007 -Description: - The entries appear for each CPU, but they are truly shared - between all CPUs. - - Tolerance level. When a machine check exception occurs for a - non corrected machine check the kernel can take different - actions. - - Since machine check exceptions can happen any time it is - sometimes risky for the kernel to kill a process because it - defies normal kernel locking rules. The tolerance level - configures how hard the kernel tries to recover even at some - risk of deadlock. Higher tolerant values trade potentially - better uptime with the risk of a crash or even corruption - (for tolerant >= 3). - - == =========================================================== - 0 always panic on uncorrected errors, log corrected errors - 1 panic or SIGBUS on uncorrected errors, log corrected errors - 2 SIGBUS or log uncorrected errors, log corrected errors - 3 never panic or SIGBUS, log all errors (for testing only) - == =========================================================== - - Default: 1 - - Note this only makes a difference if the CPU allows recovery - from a machine check exception. Current x86 CPUs generally - do not. - What: /sys/devices/system/machinecheck/machinecheckX/trigger Contact: Andi Kleen <ak@linux.intel.com> Date: Feb, 2007 diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst index 89b5f7a52077..c742de1769d1 100644 --- a/Documentation/vm/hwpoison.rst +++ b/Documentation/vm/hwpoison.rst @@ -60,8 +60,6 @@ There are two (actually three) modes memory failure recovery can be in: vm.memory_failure_recovery sysctl set to zero: All memory failures cause a panic. Do not attempt recovery. - (on x86 this can be also affected by the tolerant level of the - MCE subsystem) early kill (can be controlled globally and per process) diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst index ccb7e86bf8d9..07aa0007f346 100644 --- a/Documentation/x86/x86_64/boot-options.rst +++ b/Documentation/x86/x86_64/boot-options.rst @@ -47,14 +47,7 @@ Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables. in a reboot. On Intel systems it is enabled by default. mce=nobootlog Disable boot machine check logging. - mce=tolerancelevel[,monarchtimeout] (number,number) - tolerance levels: - 0: always panic on uncorrected errors, log corrected errors - 1: panic or SIGBUS on uncorrected errors, log corrected errors - 2: SIGBUS or log uncorrected errors, log corrected errors - 3: never panic or SIGBUS, log all errors (for testing only) - Default is 1 - Can be also set using sysfs which is preferable. + mce=monarchtimeout (number) monarchtimeout: Sets the time in us to wait for other CPUs on machine checks. 0 to disable. |