aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPaul E. McKenney2021-07-14 11:46:55 -0700
committerPaul E. McKenney2021-07-20 13:36:33 -0700
commit99c0974ffeeab111bc709fc77b6900593e2e9078 (patch)
tree0e494e722bb53d7fb1178233f0cba5fcd5fb9177
parentc28adacc14e70e3260063e97ebb8dd984e6f7a07 (diff)
doc: Update stallwarn.rst with recent changes
This commit calls out the possibility of self-detected stalls, adds new messages, and calls out the use for stack traces. Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r--Documentation/RCU/stallwarn.rst21
1 files changed, 15 insertions, 6 deletions
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index f1c49c626e93..5036df24ae61 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -189,8 +189,8 @@ rcupdate.rcu_task_stall_timeout
Interpreting RCU's CPU Stall-Detector "Splats"
==============================================
-For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
-it will print a message similar to the following::
+For non-RCU-tasks flavors of RCU, when a CPU detects that some other
+CPU is stalling, it will print a message similar to the following::
INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
@@ -204,6 +204,8 @@ PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
the tasks will be indicated by PID, for example, "P3421". It is even
possible for an rcu_state stall to be caused by both CPUs *and* tasks,
in which case the offending CPUs and tasks will all be called out in the list.
+In some cases, CPUs will detect themselves stalling, which will result
+in a self-detected stall.
CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
the RCU core for the past three grace periods. In contrast, CPU 16's "(0
@@ -283,7 +285,8 @@ If the relevant grace-period kthread has been unable to run prior to
the stall warning, as was the case in the "All QSes seen" line above,
the following additional line is printed::
- kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
+ rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
+ Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
Starving the grace-period kthreads of CPU time can of course result
in RCU CPU stall warnings even when all CPUs and tasks have passed
@@ -313,15 +316,21 @@ is the current ``TIMER_SOFTIRQ`` count on cpu 4. If this value does not
change on successive RCU CPU stall warnings, there is further reason to
suspect a timer problem.
+These messages are usually followed by stack dumps of the CPUs and tasks
+involved in the stall. These stack traces can help you locate the cause
+of the stall, keeping in mind that the CPU detecting the stall will have
+an interrupt frame that is mainly devoted to detecting the stall.
+
Multiple Warnings From One Stall
================================
-If a stall lasts long enough, multiple stall-warning messages will be
-printed for it. The second and subsequent messages are printed at
+If a stall lasts long enough, multiple stall-warning messages will
+be printed for it. The second and subsequent messages are printed at
longer intervals, so that the time between (say) the first and second
message will be about three times the interval between the beginning
-of the stall and the first message.
+of the stall and the first message. It can be helpful to compare the
+stack dumps for the different messages for the same stalled grace period.
Stall Warnings for Expedited Grace Periods