aboutsummaryrefslogtreecommitdiff
path: root/tools/memory-model/Documentation/explanation.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tools/memory-model/Documentation/explanation.txt')
-rw-r--r--tools/memory-model/Documentation/explanation.txt409
1 files changed, 280 insertions, 129 deletions
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index a727c82bd434..35bff92cc773 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -27,8 +27,9 @@ Explanation of the Linux-Kernel Memory Consistency Model
19. AND THEN THERE WAS ALPHA
20. THE HAPPENS-BEFORE RELATION: hb
21. THE PROPAGATES-BEFORE RELATION: pb
- 22. RCU RELATIONS: link, gp-link, rscs-link, and rcu-path
- 23. ODDS AND ENDS
+ 22. RCU RELATIONS: rcu-link, gp, rscs, rcu-fence, and rb
+ 23. LOCKING
+ 24. ODDS AND ENDS
@@ -804,7 +805,7 @@ type of fence:
Second, some types of fence affect the way the memory subsystem
propagates stores. When a fence instruction is executed on CPU C:
- For each other CPU C', smb_wmb() forces all po-earlier stores
+ For each other CPU C', smp_wmb() forces all po-earlier stores
on C to propagate to C' before any po-later stores do.
For each other CPU C', any store which propagates to C before
@@ -1067,28 +1068,6 @@ allowing out-of-order writes like this to occur. The model avoided
violating the write-write coherence rule by requiring the CPU not to
send the W write to the memory subsystem at all!)
-There is one last example of preserved program order in the LKMM: when
-a load-acquire reads from an earlier store-release. For example:
-
- smp_store_release(&x, 123);
- r1 = smp_load_acquire(&x);
-
-If the smp_load_acquire() ends up obtaining the 123 value that was
-stored by the smp_store_release(), the LKMM says that the load must be
-executed after the store; the store cannot be forwarded to the load.
-This requirement does not arise from the operational model, but it
-yields correct predictions on all architectures supported by the Linux
-kernel, although for differing reasons.
-
-On some architectures, including x86 and ARMv8, it is true that the
-store cannot be forwarded to the load. On others, including PowerPC
-and ARMv7, smp_store_release() generates object code that starts with
-a fence and smp_load_acquire() generates object code that ends with a
-fence. The upshot is that even though the store may be forwarded to
-the load, it is still true that any instruction preceding the store
-will be executed before the load or any following instructions, and
-the store will be executed before any instruction following the load.
-
AND THEN THERE WAS ALPHA
------------------------
@@ -1451,8 +1430,8 @@ they execute means that it cannot have cycles. This requirement is
the content of the LKMM's "propagation" axiom.
-RCU RELATIONS: link, gp-link, rscs-link, and rcu-path
------------------------------------------------------
+RCU RELATIONS: rcu-link, gp, rscs, rcu-fence, and rb
+----------------------------------------------------
RCU (Read-Copy-Update) is a powerful synchronization mechanism. It
rests on two concepts: grace periods and read-side critical sections.
@@ -1509,8 +1488,8 @@ y, which occurs before the end of the critical section, did not
propagate to P1 before the end of the grace period, violating the
Guarantee.
-In the kernel's implementations of RCU, the business about stores
-propagating to every CPU is realized by placing strong fences at
+In the kernel's implementations of RCU, the requirements for stores
+to propagate to every CPU are fulfilled by placing strong fences at
suitable places in the RCU-related code. Thus, if a critical section
starts before a grace period does then the critical section's CPU will
execute an smp_mb() fence after the end of the critical section and
@@ -1523,72 +1502,124 @@ executes.
What exactly do we mean by saying that a critical section "starts
before" or "ends after" a grace period? Some aspects of the meaning
are pretty obvious, as in the example above, but the details aren't
-entirely clear. The LKMM formalizes this notion by means of a
-relation with the unfortunately generic name "link". It is a very
-general relation; among other things, X ->link Z includes cases where
-X happens-before or is equal to some event Y which is equal to or
-comes before Z in the coherence order. Taking Y = Z, this says that
-X ->rfe Z implies X ->link Z, and taking Y = X, it says that X ->fr Z
-and X ->co Z each imply X ->link Z.
-
-The formal definition of the link relation is more than a little
+entirely clear. The LKMM formalizes this notion by means of the
+rcu-link relation. rcu-link encompasses a very general notion of
+"before": Among other things, X ->rcu-link Z includes cases where X
+happens-before or is equal to some event Y which is equal to or comes
+before Z in the coherence order. When Y = Z this says that X ->rfe Z
+implies X ->rcu-link Z. In addition, when Y = X it says that X ->fr Z
+and X ->co Z each imply X ->rcu-link Z.
+
+The formal definition of the rcu-link relation is more than a little
obscure, and we won't give it here. It is closely related to the pb
relation, and the details don't matter unless you want to comb through
a somewhat lengthy formal proof. Pretty much all you need to know
-about link is the information in the preceding paragraph.
-
-The LKMM goes on to define the gp-link and rscs-link relations. They
-bring grace periods and read-side critical sections into the picture,
-in the following way:
-
- E ->gp-link F means there is a synchronize_rcu() fence event S
- and an event X such that E ->po S, either S ->po X or S = X,
- and X ->link F. In other words, E and F are connected by a
- grace period followed by an instance of link.
-
- E ->rscs-link F means there is a critical section delimited by
- an rcu_read_lock() fence L and an rcu_read_unlock() fence U,
- and an event X such that E ->po U, either L ->po X or L = X,
- and X ->link F. Roughly speaking, this says that some event
- in the same critical section as E is connected by link to F.
-
-If we think of the link relation as standing for an extended "before",
-then E ->gp-link F says that E executes before a grace period which
-ends before F executes. (In fact it says more than this, because it
-includes cases where E executes before a grace period and some store
-propagates to F's CPU before F executes and doesn't propagate to some
-other CPU until after the grace period ends.) Similarly,
-E ->rscs-link F says that E is part of (or before the start of) a
-critical section which starts before F executes.
+about rcu-link is the information in the preceding paragraph.
+
+The LKMM also defines the gp and rscs relations. They bring grace
+periods and read-side critical sections into the picture, in the
+following way:
+
+ E ->gp F means there is a synchronize_rcu() fence event S such
+ that E ->po S and either S ->po F or S = F. In simple terms,
+ there is a grace period po-between E and F.
+
+ E ->rscs F means there is a critical section delimited by an
+ rcu_read_lock() fence L and an rcu_read_unlock() fence U, such
+ that E ->po U and either L ->po F or L = F. You can think of
+ this as saying that E and F are in the same critical section
+ (in fact, it also allows E to be po-before the start of the
+ critical section and F to be po-after the end).
+
+If we think of the rcu-link relation as standing for an extended
+"before", then X ->gp Y ->rcu-link Z says that X executes before a
+grace period which ends before Z executes. (In fact it covers more
+than this, because it also includes cases where X executes before a
+grace period and some store propagates to Z's CPU before Z executes
+but doesn't propagate to some other CPU until after the grace period
+ends.) Similarly, X ->rscs Y ->rcu-link Z says that X is part of (or
+before the start of) a critical section which starts before Z
+executes.
+
+The LKMM goes on to define the rcu-fence relation as a sequence of gp
+and rscs links separated by rcu-link links, in which the number of gp
+links is >= the number of rscs links. For example:
+
+ X ->gp Y ->rcu-link Z ->rscs T ->rcu-link U ->gp V
+
+would imply that X ->rcu-fence V, because this sequence contains two
+gp links and only one rscs link. (It also implies that X ->rcu-fence T
+and Z ->rcu-fence V.) On the other hand:
+
+ X ->rscs Y ->rcu-link Z ->rscs T ->rcu-link U ->gp V
+
+does not imply X ->rcu-fence V, because the sequence contains only
+one gp link but two rscs links.
+
+The rcu-fence relation is important because the Grace Period Guarantee
+means that rcu-fence acts kind of like a strong fence. In particular,
+if W is a write and we have W ->rcu-fence Z, the Guarantee says that W
+will propagate to every CPU before Z executes.
+
+To prove this in full generality requires some intellectual effort.
+We'll consider just a very simple case:
+
+ W ->gp X ->rcu-link Y ->rscs Z.
+
+This formula means that there is a grace period G and a critical
+section C such that:
+
+ 1. W is po-before G;
+
+ 2. X is equal to or po-after G;
+
+ 3. X comes "before" Y in some sense;
+
+ 4. Y is po-before the end of C;
+
+ 5. Z is equal to or po-after the start of C.
+
+From 2 - 4 we deduce that the grace period G ends before the critical
+section C. Then the second part of the Grace Period Guarantee says
+not only that G starts before C does, but also that W (which executes
+on G's CPU before G starts) must propagate to every CPU before C
+starts. In particular, W propagates to every CPU before Z executes
+(or finishes executing, in the case where Z is equal to the
+rcu_read_lock() fence event which starts C.) This sort of reasoning
+can be expanded to handle all the situations covered by rcu-fence.
+
+Finally, the LKMM defines the RCU-before (rb) relation in terms of
+rcu-fence. This is done in essentially the same way as the pb
+relation was defined in terms of strong-fence. We will omit the
+details; the end result is that E ->rb F implies E must execute before
+F, just as E ->pb F does (and for much the same reasons).
Putting this all together, the LKMM expresses the Grace Period
-Guarantee by requiring that there are no cycles consisting of gp-link
-and rscs-link connections in which the number of gp-link instances is
->= the number of rscs-link instances. It does this by defining the
-rcu-path relation to link events E and F whenever it is possible to
-pass from E to F by a sequence of gp-link and rscs-link connections
-with at least as many of the former as the latter. The LKMM's "rcu"
-axiom then says that there are no events E such that E ->rcu-path E.
-
-Justifying this axiom takes some intellectual effort, but it is in
-fact a valid formalization of the Grace Period Guarantee. We won't
-attempt to go through the detailed argument, but the following
-analysis gives a taste of what is involved. Suppose we have a
-violation of the first part of the Guarantee: A critical section
-starts before a grace period, and some store propagates to the
-critical section's CPU before the end of the critical section but
-doesn't propagate to some other CPU until after the end of the grace
-period.
+Guarantee by requiring that the rb relation does not contain a cycle.
+Equivalently, this "rcu" axiom requires that there are no events E and
+F with E ->rcu-link F ->rcu-fence E. Or to put it a third way, the
+axiom requires that there are no cycles consisting of gp and rscs
+alternating with rcu-link, where the number of gp links is >= the
+number of rscs links.
+
+Justifying the axiom isn't easy, but it is in fact a valid
+formalization of the Grace Period Guarantee. We won't attempt to go
+through the detailed argument, but the following analysis gives a
+taste of what is involved. Suppose we have a violation of the first
+part of the Guarantee: A critical section starts before a grace
+period, and some store propagates to the critical section's CPU before
+the end of the critical section but doesn't propagate to some other
+CPU until after the end of the grace period.
Putting symbols to these ideas, let L and U be the rcu_read_lock() and
rcu_read_unlock() fence events delimiting the critical section in
question, and let S be the synchronize_rcu() fence event for the grace
period. Saying that the critical section starts before S means there
are events E and F where E is po-after L (which marks the start of the
-critical section), E is "before" F in the sense of the link relation,
-and F is po-before the grace period S:
+critical section), E is "before" F in the sense of the rcu-link
+relation, and F is po-before the grace period S:
- L ->po E ->link F ->po S.
+ L ->po E ->rcu-link F ->po S.
Let W be the store mentioned above, let Z come before the end of the
critical section and witness that W propagates to the critical
@@ -1600,16 +1631,19 @@ some event X which is po-after S. Symbolically, this amounts to:
The fr link from Y to W indicates that W has not propagated to Y's CPU
at the time that Y executes. From this, it can be shown (see the
-discussion of the link relation earlier) that X and Z are connected by
-link, yielding:
+discussion of the rcu-link relation earlier) that X and Z are related
+by rcu-link, yielding:
- S ->po X ->link Z ->po U.
+ S ->po X ->rcu-link Z ->po U.
-These formulas say that S is po-between F and X, hence F ->gp-link Z
-via X. They also say that Z comes before the end of the critical
-section and E comes after its start, hence Z ->rscs-link F via E. But
-now we have a forbidden cycle: F ->gp-link Z ->rscs-link F. Thus the
-"rcu" axiom rules out this violation of the Grace Period Guarantee.
+The formulas say that S is po-between F and X, hence F ->gp X. They
+also say that Z comes before the end of the critical section and E
+comes after its start, hence Z ->rscs E. From all this we obtain:
+
+ F ->gp X ->rcu-link Z ->rscs E ->rcu-link F,
+
+a forbidden cycle. Thus the "rcu" axiom rules out this violation of
+the Grace Period Guarantee.
For something a little more down-to-earth, let's see how the axiom
works out in practice. Consider the RCU code example from above, this
@@ -1635,18 +1669,18 @@ time with statement labels added to the memory access instructions:
}
-If r2 = 0 at the end then P0's store at X overwrites the value
-that P1's load at Z reads from, so we have Z ->fre X and thus
-Z ->link X. In addition, there is a synchronize_rcu() between Y and
-Z, so therefore we have Y ->gp-link X.
+If r2 = 0 at the end then P0's store at X overwrites the value that
+P1's load at Z reads from, so we have Z ->fre X and thus Z ->rcu-link X.
+In addition, there is a synchronize_rcu() between Y and Z, so therefore
+we have Y ->gp Z.
If r1 = 1 at the end then P1's load at Y reads from P0's store at W,
-so we have W ->link Y. In addition, W and X are in the same critical
-section, so therefore we have X ->rscs-link Y.
+so we have W ->rcu-link Y. In addition, W and X are in the same critical
+section, so therefore we have X ->rscs W.
-This gives us a cycle, Y ->gp-link X ->rscs-link Y, with one gp-link
-and one rscs-link, violating the "rcu" axiom. Hence the outcome is
-not allowed by the LKMM, as we would expect.
+Then X ->rscs W ->rcu-link Y ->gp Z ->rcu-link X is a forbidden cycle,
+violating the "rcu" axiom. Hence the outcome is not allowed by the
+LKMM, as we would expect.
For contrast, let's see what can happen in a more complicated example:
@@ -1682,15 +1716,11 @@ For contrast, let's see what can happen in a more complicated example:
}
If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows
-that W ->rscs-link Y via X, Y ->gp-link U via Z, and U ->rscs-link W
-via V. And just as before, this gives a cycle:
-
- W ->rscs-link Y ->gp-link U ->rscs-link W.
-
-However, this cycle has fewer gp-link instances than rscs-link
-instances, and consequently the outcome is not forbidden by the LKMM.
-The following instruction timing diagram shows how it might actually
-occur:
+that W ->rscs X ->rcu-link Y ->gp Z ->rcu-link U ->rscs V ->rcu-link W.
+However this cycle is not forbidden, because the sequence of relations
+contains fewer instances of gp (one) than of rscs (two). Consequently
+the outcome is allowed by the LKMM. The following instruction timing
+diagram shows how it might actually occur:
P0 P1 P2
-------------------- -------------------- --------------------
@@ -1715,6 +1745,147 @@ before it does, and the critical section in P2 both starts after P1's
grace period does and ends after it does.
+LOCKING
+-------
+
+The LKMM includes locking. In fact, there is special code for locking
+in the formal model, added in order to make tools run faster.
+However, this special code is intended to be more or less equivalent
+to concepts we have already covered. A spinlock_t variable is treated
+the same as an int, and spin_lock(&s) is treated almost the same as:
+
+ while (cmpxchg_acquire(&s, 0, 1) != 0)
+ cpu_relax();
+
+This waits until s is equal to 0 and then atomically sets it to 1,
+and the read part of the cmpxchg operation acts as an acquire fence.
+An alternate way to express the same thing would be:
+
+ r = xchg_acquire(&s, 1);
+
+along with a requirement that at the end, r = 0. Similarly,
+spin_trylock(&s) is treated almost the same as:
+
+ return !cmpxchg_acquire(&s, 0, 1);
+
+which atomically sets s to 1 if it is currently equal to 0 and returns
+true if it succeeds (the read part of the cmpxchg operation acts as an
+acquire fence only if the operation is successful). spin_unlock(&s)
+is treated almost the same as:
+
+ smp_store_release(&s, 0);
+
+The "almost" qualifiers above need some explanation. In the LKMM, the
+store-release in a spin_unlock() and the load-acquire which forms the
+first half of the atomic rmw update in a spin_lock() or a successful
+spin_trylock() -- we can call these things lock-releases and
+lock-acquires -- have two properties beyond those of ordinary releases
+and acquires.
+
+First, when a lock-acquire reads from a lock-release, the LKMM
+requires that every instruction po-before the lock-release must
+execute before any instruction po-after the lock-acquire. This would
+naturally hold if the release and acquire operations were on different
+CPUs, but the LKMM says it holds even when they are on the same CPU.
+For example:
+
+ int x, y;
+ spinlock_t s;
+
+ P0()
+ {
+ int r1, r2;
+
+ spin_lock(&s);
+ r1 = READ_ONCE(x);
+ spin_unlock(&s);
+ spin_lock(&s);
+ r2 = READ_ONCE(y);
+ spin_unlock(&s);
+ }
+
+ P1()
+ {
+ WRITE_ONCE(y, 1);
+ smp_wmb();
+ WRITE_ONCE(x, 1);
+ }
+
+Here the second spin_lock() reads from the first spin_unlock(), and
+therefore the load of x must execute before the load of y. Thus we
+cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the
+MP pattern).
+
+This requirement does not apply to ordinary release and acquire
+fences, only to lock-related operations. For instance, suppose P0()
+in the example had been written as:
+
+ P0()
+ {
+ int r1, r2, r3;
+
+ r1 = READ_ONCE(x);
+ smp_store_release(&s, 1);
+ r3 = smp_load_acquire(&s);
+ r2 = READ_ONCE(y);
+ }
+
+Then the CPU would be allowed to forward the s = 1 value from the
+smp_store_release() to the smp_load_acquire(), executing the
+instructions in the following order:
+
+ r3 = smp_load_acquire(&s); // Obtains r3 = 1
+ r2 = READ_ONCE(y);
+ r1 = READ_ONCE(x);
+ smp_store_release(&s, 1); // Value is forwarded
+
+and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
+
+Second, when a lock-acquire reads from a lock-release, and some other
+stores W and W' occur po-before the lock-release and po-after the
+lock-acquire respectively, the LKMM requires that W must propagate to
+each CPU before W' does. For example, consider:
+
+ int x, y;
+ spinlock_t x;
+
+ P0()
+ {
+ spin_lock(&s);
+ WRITE_ONCE(x, 1);
+ spin_unlock(&s);
+ }
+
+ P1()
+ {
+ int r1;
+
+ spin_lock(&s);
+ r1 = READ_ONCE(x);
+ WRITE_ONCE(y, 1);
+ spin_unlock(&s);
+ }
+
+ P2()
+ {
+ int r2, r3;
+
+ r2 = READ_ONCE(y);
+ smp_rmb();
+ r3 = READ_ONCE(x);
+ }
+
+If r1 = 1 at the end then the spin_lock() in P1 must have read from
+the spin_unlock() in P0. Hence the store to x must propagate to P2
+before the store to y does, so we cannot have r2 = 1 and r3 = 0.
+
+These two special requirements for lock-release and lock-acquire do
+not arise from the operational model. Nevertheless, kernel developers
+have come to expect and rely on them because they do hold on all
+architectures supported by the Linux kernel, albeit for various
+differing reasons.
+
+
ODDS AND ENDS
-------------
@@ -1780,26 +1951,6 @@ they behave as follows:
events and the events preceding them against all po-later
events.
-The LKMM includes locking. In fact, there is special code for locking
-in the formal model, added in order to make tools run faster.
-However, this special code is intended to be exactly equivalent to
-concepts we have already covered. A spinlock_t variable is treated
-the same as an int, and spin_lock(&s) is treated the same as:
-
- while (cmpxchg_acquire(&s, 0, 1) != 0)
- cpu_relax();
-
-which waits until s is equal to 0 and then atomically sets it to 1,
-and where the read part of the atomic update is also an acquire fence.
-An alternate way to express the same thing would be:
-
- r = xchg_acquire(&s, 1);
-
-along with a requirement that at the end, r = 0. spin_unlock(&s) is
-treated the same as:
-
- smp_store_release(&s, 0);
-
Interestingly, RCU and locking each introduce the possibility of
deadlock. When faced with code sequences such as: