aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-12rcu: Add CPU online/offline state to dump_blkd_tasks()Paul E. McKenney
Interactions between CPU-hotplug operations and grace-period initialization can result in dump_blkd_tasks(). One of the first debugging actions in this case is to search back in dmesg to work out which of the affected rcu_node structure's CPUs are online and to determine the last CPU-hotplug operation affecting any of those CPUs. This can be laborious and error-prone, especially when console output is lost. This commit therefore causes dump_blkd_tasks() to dump the state of the affected rcu_node structure's CPUs and the last grace period during which the last offline and online operation affected each of these CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add up-tree information to dump_blkd_tasks() diagnosticsPaul E. McKenney
This commit updates dump_blkd_tasks() to print out quiescent-state bitmasks for the rcu_node structures further up the tree. This information helps debugging of interactions between CPU-hotplug operations and RCU grace-period initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove CPU-hotplug failsafe from force-quiescent-state code pathPaul E. McKenney
Now that quiescent states for newly offlined CPUs are reported either when that CPU goes offline or at the end of grace-period initialization, the CPU-hotplug failsafe in the force-quiescent-state code path is no longer needed. This commit therefore removes this failsafe. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove failsafe check for lost quiescent statePaul E. McKenney
Now that quiescent-state reporting is fully event-driven, this commit removes the check for a lost quiescent state from force_qs_rnp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Move grace-period pre-init delay after pre-initPaul E. McKenney
The main race with the early part of grace-period initialization appears to be with CPU hotplug. To more fully open this race window, this commit moves the rcu_gp_slow() from the beginning of the early initialization loop to follow that loop, thus widening the race window, especially for the rcu_node structures that are initialized last. This commit also expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall delay in the grace period, but concentrated in the spot where it will do the most good. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add RCU-preempt check for waiting on newly onlined CPUPaul E. McKenney
RCU should only be waiting on CPUs that were online at the time that the current grace period started. Failure to abide by this rule can result in confusing splats during grace-period cleanup and initialization. This commit therefore adds a check to RCU-preempt's preempted-task queuing that checks for waiting on newly onlined CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Fix grace-period hangs due to race with CPU offlinePaul E. McKenney
Without special fail-safe quiescent-state-propagation checks, grace-period hangs can result from the following scenario: 1. CPU 1 goes offline. 2. Because CPU 1 is the only CPU in the system blocking the current grace period, the grace period ends as soon as rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp() returns. 3. At this point, the leaf rcu_node structure's ->lock is no longer held: rcu_report_qs_rnp() has released it, as it must in order to awaken the RCU grace-period kthread. 4. At this point, that same leaf rcu_node structure's ->qsmaskinitnext field still records CPU 1 as being online. This is absolutely necessary because the scheduler uses RCU (in this case on the wake-up path while awakening RCU's grace-period kthread), and ->qsmaskinitnext contains RCU's idea as to which CPUs are online. Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's bit from ->qsmaskinitnext would result in a lockdep-RCU splat due to RCU being used from an offline CPU. 5. RCU's grace-period kthread awakens, sees that the old grace period has completed and that a new one is needed. It therefore starts a new grace period, but because CPU 1's leaf rcu_node structure's ->qsmaskinitnext field still shows CPU 1 as being online, this new grace period is initialized to wait for a quiescent state from the now-offline CPU 1. 6. Without the fail-safe force-quiescent-state checks, there would be no quiescent state from the now-offline CPU 1, which would eventually result in RCU CPU stall warnings and memory exhaustion. It would be good to get rid of the special fail-safe quiescent-state propagation checks, and thus it would be good to fix things so that the above scenario cannot happen. This commit therefore adds a new ->ofl_lock to the rcu_state structure. This lock is held by rcu_gp_init() across the applying of buffered online and offline operations to the rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu() when buffering a new offline operation. This prevents rcu_gp_init() from acquiring the leaf rcu_node structure's lock during the interval between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(), which releases ->lock and the re-acquisition of that same lock. This in turn prevents the failure scenario outlined above, and will hopefully eventually allow removal of the offline-CPU checks from the force-quiescent-state code path. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Fix grace-period hangs from mid-init task resumePaul E. McKenney
Without special fail-safe quiescent-state-propagation checks, grace-period hangs can result from the following scenario: 1. A task running on a given CPU is preempted in its RCU read-side critical section. 2. That CPU goes offline, and there are now no online CPUs corresponding to that CPU's leaf rcu_node structure. 3. The rcu_gp_init() function does the first phase of grace-period initialization, and sets the aforementioned leaf rcu_node structure's ->qsmaskinit field to all zeroes. Because there is a blocked task, it does not propagate the zeroing of either ->qsmaskinit or ->qsmaskinitnext up the rcu_node tree. 4. The task resumes on some other CPU and exits its critical section. There is no grace period in progress, so the resulting quiescent state is not reported up the tree. 5. The rcu_gp_init() function does the second phase of grace-period initialization, which results in the leaf rcu_node structure being initialized to expect no further quiescent states, but with that structure's parent expecting a quiescent-state report. The parent will never receive a quiescent state from this leaf rcu_node structure, so the grace period will hang, resulting in RCU CPU stall warnings. It would be good to get rid of the special fail-safe quiescent-state propagation checks. This commit therefore checks the leaf rcu_node structure's ->wait_blkd_tasks field during grace-period initialization. If this flag is set, the rcu_report_qs_rnp() is invoked to immediately report the possible quiescent state. While in the neighborhood, this commit also report quiescent states for any CPUs that went offline between the two phases of grace-period initialization, thus reducing grace-period delays and hopefully eventually allowing removal of offline-CPU checks from the force-quiescent-state code path. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Suppress false-positive splats from mid-init task resumePaul E. McKenney
Consider the following sequence of events in a PREEMPT=y kernel: 1. All CPUs corresponding to a given leaf rcu_node structure are offline. 2. The first phase of the rcu_gp_init() function's grace-period initialization runs, and sets that rcu_node structure's ->qsmaskinit to zero, as it should. 3. One of the CPUs corresponding to that rcu_node structure comes back online. Note that because this CPU came online after the grace period started, this grace period can safely ignore this newly onlined CPU. 4. A task running on the newly onlined CPU enters an RCU-preempt read-side critical section, and is then preempted. Because the corresponding rcu_node structure's ->qsmask is zero, rcu_preempt_ctxt_queue() leaves the rcu_node structure's ->gp_tasks field NULL, as it should. 5. The rcu_gp_init() function continues running the second phase of grace-period initialization. The ->qsmask field of the parent of the aforementioned leaf rcu_node structure is set to not expect a quiescent state from the leaf, as is only right and proper. However, when rcu_gp_init() reaches the leaf, it invokes rcu_preempt_check_blocked_tasks(), which sees that the leaf's ->blkd_tasks list is non-empty, and therefore sets the leaf's ->gp_tasks field to reference the first task on that list. 6. The grace period ends before the preempted task resumes, which is perfectly fine, given that this grace period was under no obligation to wait for that task to exit its late-starting RCU-preempt read-side critical section. Unfortunately, the leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats. After all, it appears to rcu_gp_cleanup() that the grace period failed to wait for a task that was supposed to be blocking that grace period. This commit avoids this false-positive splat by adding a check of both ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks(). If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must have entered its RCU-preempt read-side critical section late (after all, the CPU that it is running on was not online at that time), which means that the upper-level rcu_node structure won't be waiting for anything on the leaf anyway. If ->wait_blkd_tasks is non-zero, then there is at least one task on ths rcu_node structure's ->blkd_tasks list whose RCU read-side critical section predates the current grace period. If ->qsmaskinit is non-zero, there is at least one CPU that was online at the start of the current grace period. Thus, if both are zero, there is nothing to wait for. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Suppress more involved false-positive preempted-task splatsPaul E. McKenney
Consider the following sequence of events in a PREEMPT=y kernel: 1. All but one of the CPUs corresponding to a given leaf rcu_node structure go offline. Each of these CPUs clears its bit in that structure's ->qsmaskinitnext field. 2. A new grace period starts, and rcu_gp_init() scans the leaf rcu_node structures, applying CPU-hotplug changes since the start of the previous grace period, including those changes in #1 above. This copies each leaf structure's ->qsmaskinitnext to its ->qsmask field, which represents the CPUs that this new grace period will wait on. Each copy operation is done holding the corresponding leaf rcu_node structure's ->lock, and at the end of this scan, rcu_gp_init() holds no locks. 3. The last CPU corresponding to #1's leaf rcu_node structure goes offline, clearing its bit in that structure's ->qsmaskinitnext field, but not touching the ->qsmaskinit field. Note that rcu_gp_init() is not currently holding any locks! This CPU does -not- report a quiescent state because the grace period has not yet initialized itself sufficiently to have set any bits in any of the leaf rcu_node structures' ->qsmask fields. 4. The rcu_gp_init() function continues initializing the new grace period, copying each leaf rcu_node structure's ->qsmaskinit field to its ->qsmask field while holding the corresponding ->lock. This sets the ->qsmask bit corresponding to #3's CPU. 5. Before the grace period ends, #3's CPU comes back online. Because te grace period has not yet done any force-quiescent-state scans (which would report a quiescent state on behalf of any offline CPUs), this CPU's ->qsmask bit is still set. 6. A task running on the newly onlined CPU is preempted while in an RCU read-side critical section. Because this CPU's ->qsmask bit is net, not only does this task queue itself on the leaf rcu_node structure's ->blkd_tasks list, it also sets that structure's ->gp_tasks pointer to reference it. 7. The grace period started in #1 above comes to an end. This results in rcu_gp_cleanup() being invoked, which, among other things, checks to make sure that there are no tasks blocking the just-ended grace period, that is, that all ->gp_tasks pointers are NULL. The ->gp_tasks pointer corresponding to the task preempted in #3 above is non-NULL, which results in a splat. This splat is a false positive. The task's RCU read-side critical section cannot have begun before the just-ended grace period because this would mean either: (1) The CPU came online before the grace period started, which cannot have happened because the grace period started before that CPU went offline, or (2) The task started its RCU read-side critical section on some other CPU, but then it would have had to have been preempted before migrating to this CPU, which would mean that it would have instead queued itself on that other CPU's rcu_node structure. RCU's grace periods thus are working correctly. Or, more accurately, that remaining bugs in RCU's grace periods are elsewhere. This commit eliminates this false positive by adding code to the end of rcu_cpu_starting() that reports a quiescent state to RCU, which has the side-effect of clearing that CPU's ->qsmask bit, preventing the above scenario. This approach has the added benefit of more promptly reporting quiescent states corresponding to offline CPUs. Nevertheless, this commit does -not- remove the need for the force-quiescent-state scans to check for offline CPUs, given that a CPU might remain offline indefinitely. And without the checks in the force-quiescent-state scans, the grace period would also persist indefinitely, which could result in hangs or memory exhaustion. Note well that the call to rcu_report_qs_rnp() reporting the quiescent state must come -after- the setting of this CPU's bit in the leaf rcu_node structure's ->qsmaskinitnext field. Otherwise, lockdep-RCU will complain bitterly about quiescent states coming from an offline CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Suppress false-positive preempted-task splatsPaul E. McKenney
Consider the following sequence of events in a PREEMPT=y kernel: 1. All CPUs corresponding to a given rcu_node structure go offline. A new grace period starts just after the CPU-hotplug code path does its synchronize_rcu() for the last CPU, so at least this CPU is present in that structure's ->qsmask. 2. Before the grace period ends, a CPU comes back online, and not just any CPU, but the one corresponding to a non-zero bit in the leaf rcu_node structure's ->qsmask. 3. A task running on the newly onlined CPU is preempted while in an RCU read-side critical section. Because this CPU's ->qsmask bit is net, not only does this task queue itself on the leaf rcu_node structure's ->blkd_tasks list, it also sets that structure's ->gp_tasks pointer to reference it. 4. The grace period started in #1 above comes to an end. This results in rcu_gp_cleanup() being invoked, which, among other things, checks to make sure that there are no tasks blocking the just-ended grace period, that is, that all ->gp_tasks pointers are NULL. The ->gp_tasks pointer corresponding to the task preempted in #3 above is non-NULL, which results in a splat. This splat is a false positive. The task's RCU read-side critical section cannot have begun before the just-ended grace period because this would mean either: (1) The CPU came online before the grace period started, which cannot have happened because the grace period started before that CPU was all the way offline, or (2) The task started its RCU read-side critical section on some other CPU, but then it would have had to have been preempted before migrating to this CPU, which would mean that it would have instead queued itself on that other CPU's rcu_node structure. This commit eliminates this false positive by adding code to the end of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU, which has the side-effect of clearing that CPU's ->qsmask bit, preventing the above scenario. This approach has the added benefit of more promptly reporting quiescent states corresponding to offline CPUs. Note well that the call to rcu_report_qs_rnp() reporting the quiescent state must come -before- the clearing of this CPU's bit in the leaf rcu_node structure's ->qsmaskinitnext field. Otherwise, lockdep-RCU will complain bitterly about quiescent states coming from an offline CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Suppress false-positive offline-CPU lockdep-RCU splatPaul E. McKenney
The rcu_lockdep_current_cpu_online() function currently checks only the RCU-sched data structures to determine whether or not RCU believes that a given CPU is offline. Unfortunately, there are multiple flavors of RCU, which means that there is a short window of time during which the various flavors disagree as to whether or not a given CPU is offline. This can result in false-positive lockdep-RCU splats in which some other flavor of RCU tries to do something based on its view that the CPU is online, only to get hit with a lockdep-RCU splat because RCU-sched instead believes that the CPU is offline. This commit therefore changes rcu_lockdep_current_cpu_online() to scan all RCU flavors and to consider a given CPU to be online if any of the RCU flavors believe it to be online, thus preventing these false-positive splats. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Prevent useless FQS scan after all CPUs have checked inPaul E. McKenney
The force_qs_rnp() function checks for ->qsmask being all zero, that is, all CPUs for the current rcu_node structure having already passed through quiescent states. But with RCU-preempt, this is not sufficient to report quiescent states further up the tree, so there are further checks that can initiate RCU priority boosting and also for races with CPU-hotplug operations. However, if neither of these further checks apply, the code proceeds to carry out a useless scan of an all-zero ->qsmask. This commit therefore adds code to release the current rcu_node structure's lock and continue on to the next rcu_node structure, thereby avoiding this useless scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Replace smp_wmb() with smp_store_release() for stall checkPaul E. McKenney
This commit gets rid of the smp_wmb() in record_gp_stall_check_time() in favor of an smp_store_release(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Fix typo and add additional debugPaul E. McKenney
This commit fixes a typo and adds some additional debugging to the message emitted when a task blocking the current grace period is listed as blocking it when either that grace period ends or the next grace period begins. This commit also reformats the console message for readability. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditionsPaul E. McKenney
If rcu_report_unblock_qs_rnp() is invoked on something other than preemptible RCU or if there are still preempted tasks blocking the current grace period, something went badly wrong in the caller. This commit therefore adds WARN_ON_ONCE() to these conditions, but leaving the legitimate reason for early exit (rnp->qsmask != 0) unwarned. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make rcu_init_new_rnp() stop upon already-set bitPaul E. McKenney
Currently, rcu_init_new_rnp() walks up the rcu_node combining tree, setting bits in the ->qsmaskinit fields on the way up. It walks up unconditionally, regardless of the initial state of these bits. This is OK because only the corresponding RCU grace-period kthread ever tests or sets these bits during runtime. However, it is also pointless, and it increases both memory and lock contention (albeit only slightly), so this commit stops the walk as soon as an already-set bit is encountered. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Fix an obsolete ->qsmaskinit commentPaul E. McKenney
Back in the old days, when grace-period initialization blocked CPU hotplug, the ->qsmaskinit mask was indeed updated at the time that a given CPU went offline. However, with the deferral of these updates until the beginning of the next grace period in commit 0aa04b055e71 ("rcu: Process offlining and onlining only at grace-period start"), it is instead ->qsmaskinitnext that gets updated at that time. This commit therefore updates the obsolete comment. It also fixes punctuation while on the topic of comments mentioning ->qsmaskinit. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Clean up handling of tasks blocked across full-rcu_node offlinePaul E. McKenney
Commit 0aa04b055e71 ("rcu: Process offlining and onlining only at grace-period start") deferred handling of CPU-hotplug events until the start of the next grace period, but consider the following sequence of events: 1. A task is preempted within an RCU-preempt read-side critical section. 2. The CPU that this task was running on goes offline, along with all other CPUs sharing the corresponding leaf rcu_node structure. 3. The task resumes execution. 4. One of those CPUs comes back online before a new grace period starts. In step 2, the code in the next rcu_gp_init() invocation will (correctly) defer removing the leaf rcu_node structure from the upper-level bitmasks, and will (correctly) set that structure's ->wait_blkd_tasks field. During the ensuing interval, RCU will (correctly) track the tasks preempted on that structure because they must block any subsequent grace period. In step 3, the code in rcu_read_unlock_special() will (correctly) remove the task from the leaf rcu_node structure. From this point forward, RCU need not pay attention to this structure, at least not until one of the corresponding CPUs comes back online. In step 4, the code in the next rcu_gp_init() invocation will (incorrectly) invoke rcu_init_new_rnp(). This is incorrect because the corresponding rcu_cleanup_dead_rnp() was never invoked. This is nevertheless harmless because the upper-level bits are still set. So, no harm, no foul, right? At least, all is well until a little further into rcu_gp_init() invocation, which will notice that there are no longer any tasks blocked on the leaf rcu_node structure, conclude that there is no longer anything left over from step 2's offline operation, and will therefore invoke rcu_cleanup_dead_rnp(). But this invocation of rcu_cleanup_dead_rnp() is for the beginning of the earlier offline interval, and the previous invocation of rcu_init_new_rnp() is for the end of that same interval. That is right, they are invoked out of order. That cannot be good, can it? It turns out that this is not a (correctness!) problem because rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs are online, and refuses to do anything if so. In other words, in the case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of order, they both have no effect. But this is at best an accident waiting to happen. This commit therefore adds logic to rcu_gp_init() so that rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in order, and so that neither are invoked at all in cases where RCU had to pay attention to the leaf rcu_node structure during the entire time that all corresponding CPUs were offline. And, while in the area, this commit reduces confusion by using formal parameters rather than local variables that just happen to have the same value at that particular point in the code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Identify grace period is in progress as we advance up the treeJoel Fernandes (Google)
There's no need to keep checking the same starting node for whether a grace period is in progress as we advance up the funnel lock loop. Its sufficient if we just checked it in the start, and then subsequently checked the internal nodes as we advanced up the combining tree. This also makes sense because the grace-period updates propogate from the root to the leaf, so there's a chance we may find a grace period has started as we advance up, lets check for the same. Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Use better variable names in funnel locking loopJoel Fernandes (Google)
The funnel locking loop in rcu_start_this_gp uses rcu_root as a temporary variable while walking the combining tree. This causes a tiresome exercise of a code reader reminding themselves that rcu_root may not be root. Lets just call it rnp, and rename other variables as well to be more appropriate. Original patch: https://patchwork.kernel.org/patch/10396577/ Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix name in comment as well. ]
2018-07-12rcu: Rename the grace-period-request variables and parametersJoel Fernandes
The name 'c' is used for variables and parameters holding the requested grace-period sequence number. However it is no longer very meaningful given the conversions from ->gpnum and (especially) ->completed to ->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'. Previous patch discussion is at: https://patchwork.kernel.org/patch/10396579/ Signed-off-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Regularize resetting of rcu_data wrap indicatorPaul E. McKenney
The rcu_data structure's ->gpwrap indicator is currently reset only when the CPU in question detects a new grace period. This is in theory sufficient because any CPU that has been out of action for long enough that its ->gpwrap indicator is set is guaranteed to see both the end of an old grace period and the start of a new one. However, the current code leaves a short window during which the ->gpwrap indicator has been reset but the corresponding ->gp_seq counter has not yet been brought up to date. This is harmless because interrupts are disabled, but it is likely to (at the very least) cause confusion. This commit therefore moves the resetting of ->gpwrap to follow the updating of ->gp_seq. While in the area, it also resets ->gp_seq_needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Correctly handle grace-period sequence wrapPaul E. McKenney
The new ->gq_seq grace-period sequence numbers must be shifted down, which give artifacts when these numbers wrap. This commit therefore enables rcutorture and rcuperf to handle grace-period sequence numbers even if they do wrap. It does this by allowing a special subtraction function to be specified, and this function subtracts before shifting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make rcu_start_this_gp() check for grace period already startedPaul E. McKenney
In the old days of ->gpnum and ->completed, the code requesting a new grace period checked to see if that grace period had already started, bailing early if so. The new-age ->gp_seq approach instead checks whether the grace period has already finished. A compensating change pushed the requested grace period down to the bottom of the tree, thus reducing lock contention and even eliminating it in some cases. But why not further reduce contention, especially on large systems, by doing both, especially given that the cost of doing both is extremely small? This commit therefore adds a new rcu_seq_started() function that checks whether a specified grace period has already started. It then uses this new function in place of rcu_seq_done() in the rcu_start_this_gp() function's funnel locking code. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Fix cpustart tracepoint gp_seq numberJoel Fernandes (Google)
The "cpustart" trace event shows a stale gp_seq. This is because it uses rdp->gp_seq, which is updated only at the end of the __note_gp_changes() function. This commit therefore instead uses rnp->gp_seq. An alternative fix would be to update rdp->gp_seq earlier, but this would break RCU's detection of the beginning of a new-to-this-CPU grace period. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Produce last "CleanupMore" trace only if late-breaking requestJoel Fernandes (Google)
Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in response to late-arriving grace-period requests even if the grace period was already requested. This makes "CleanupMore" show up an extra time (in addition to once for each rcu_node structure that was previously marked with the request), and for no good reason. This commit therefore avoids emitting this trace message unless the the only request for this next grace period arrived during or after the cleanup scan of the rcu_node structures. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Don't funnel-lock above leaf node if GP in progressPaul E. McKenney
The old grace-period start code would acquire only the leaf's rcu_node structure's ->lock if that structure believed that a grace period was in progress. The new code advances to the leaf's parent in this case, needlessly acquiring then leaf's parent's ->lock. This commit therefore checks the grace-period state after marking the leaf with the need for the specified grace period, and if the leaf believes that a grace period is in progress, takes an early exit. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
2018-07-12doc: Update RCU CPU stall-warning documentationPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12doc: Update memory-ordering documentation for ->gp-seqPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12doc: Update data-structure documentation for ->gp_seqPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make simple callback acceleration refer to rdp->gp_seq_neededPaul E. McKenney
Now that the rcu_data structure contains ->gp_seq_needed, create an rcu_accelerate_cbs_unlocked() helper function that locklessly checks to see if new callbacks' required grace period has already been requested. If so, update the callback list locally and again locklessly. (Though interrupts must be and are disabled to avoid racing with conflicting updates in interrupt handlers.) Otherwise, call rcu_accelerate_cbs() as before. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove ->gpnum and ->completedPaul E. McKenney
Now that everything has been converted to use ->gp_seq instead of ->gpnum and ->completed, this commit removes ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_fqs tracepoint to ->gp_seqPaul E. McKenney
This commit makes the rcu_fqs tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seqPaul E. McKenney
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seqPaul E. McKenney
This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_preempt_task tracepoint to ->gp_seqPaul E. McKenney
This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_grace_period_init tracepoint to gp_seqPaul E. McKenney
This commit makes the rcu_grace_period_init tracepoint use gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_future_grace_period tracepoint to gp_seqPaul E. McKenney
This commit makes the rcu_future_grace_period tracepoint use gp_seq instead of ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert rcu_grace_period tracepoint to gp_seqPaul E. McKenney
This commit makes the rcu_grace_period tracepoint use gp_seq instead of ->gpnum or ->completed. It also introduces a "cpuofl-bgp" string to less obscurely indicate when a CPU has gone offline while a grace period is waiting on it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make rcu_nocb_wait_gp() check if GP already requestedPaul E. McKenney
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see if the current CPU already knows about the needed grace period having already been requested. If so, it avoids acquiring the corresponding leaf rcu_node structure's ->lock, thus decreasing contention. This optimization is intended for cases where either multiple leader rcuo kthreads are running on the same CPU or these kthreads are running on a non-offloaded (e.g., housekeeping) CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ] [ paulmck: Fix caching of furthest-future requested grace period. ]
2018-07-12rcu: Move from ->need_future_gp[] to ->gp_seq_neededPaul E. McKenney
One problem with the ->need_future_gp[] array is that the grace-period assignment of each element changes as the grace periods complete. This means that it is necessary to hold a lock when checking this array to learn if a given grace period has already been requested. This increase lock contention, which is the opposite of helpful. This commit therefore replaces the ->need_future_gp[] with a single ->gp_seq_needed value and keeps it updated in the rcu_data structure. This will enable reliable lockless checking of whether or not a given grace period has already been requested. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Convert rcutorture_get_gp_data() to ->gp_seqPaul E. McKenney
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq. This commit therefore moves the rcutorture_get_gp_data() function from a ->gpnum / ->completed pair to ->gp_seq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make RCU CPU stall warnings use ->gp_seqPaul E. McKenney
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(), print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert grace-period requests to ->gp_seqPaul E. McKenney
This commit converts the grace-period request code paths from ->completed and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates the shift operation required to use ->gp_seq as an index to the ->need_future_gp[] array. The rcu_cbs_completed() function is removed in favor of the rcu_seq_snap() function. The rcu_start_this_gp() gets some temporary consistency checks and uses rcu_seq_done(), rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place of the earlier open-coded comparisons of ->gpnum and ->completed. The rcu_future_gp_cleanup() function replaces use of ->completed with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs() function replaces an access to >completed with one to ->gp_seq and adds some temporary warnings. The rcu_nocb_wait_gp() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded comparison with rcu_seq_done(). The temporary warnings will be removed when the various ->gpnum and ->completed fields are removed. Their purpose is to locate code who might still be using ->gpnum and ->completed. (Much easier that way than trying to trace down the causes of too-short grace periods and grace-period hangs!) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert ->completedqs to ->gp_seqPaul E. McKenney
This commit switches the quiescent-state no-backtracking checks from ->gpnum and ->completed to ->gp_seq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Convert ->rcu_iw_gpnum to ->gp_seqPaul E. McKenney
This commit switches the interrupt-disabled detection mechanism to ->gp_seq. This mechanism is used as part of RCU CPU stall warnings, and detects cases where the stall is due to a CPU having interrupts disabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Move rcu_gp_in_progress() to ->gp_seqPaul E. McKenney
This commit makes rcu_gp_in_progress() use ->gp_seq instead of ->completed and ->gpnum. The READ_ONCE() invocations are buried in rcu_seq_current(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Move rcu_nocb_gp_get() to ->gp_seqPaul E. McKenney
This commit makes rcu_try_advance_all_cbs() use ->gp_seq. It uses rcu_seq_ctr() in order to shift away the state bits, so that the low-order bits of the result may safely be used to index ->nocb_gp_wq[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Move rcu_try_advance_all_cbs() to ->gp_seqPaul E. McKenney
This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the exception of tracing, which will be converted later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>