diff options
Diffstat (limited to 'tools/memory-model/Documentation/explanation.txt')
-rw-r--r-- | tools/memory-model/Documentation/explanation.txt | 409 |
1 files changed, 280 insertions, 129 deletions
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt index a727c82bd434..35bff92cc773 100644 --- a/tools/memory-model/Documentation/explanation.txt +++ b/tools/memory-model/Documentation/explanation.txt @@ -27,8 +27,9 @@ Explanation of the Linux-Kernel Memory Consistency Model 19. AND THEN THERE WAS ALPHA 20. THE HAPPENS-BEFORE RELATION: hb 21. THE PROPAGATES-BEFORE RELATION: pb - 22. RCU RELATIONS: link, gp-link, rscs-link, and rcu-path - 23. ODDS AND ENDS + 22. RCU RELATIONS: rcu-link, gp, rscs, rcu-fence, and rb + 23. LOCKING + 24. ODDS AND ENDS @@ -804,7 +805,7 @@ type of fence: Second, some types of fence affect the way the memory subsystem propagates stores. When a fence instruction is executed on CPU C: - For each other CPU C', smb_wmb() forces all po-earlier stores + For each other CPU C', smp_wmb() forces all po-earlier stores on C to propagate to C' before any po-later stores do. For each other CPU C', any store which propagates to C before @@ -1067,28 +1068,6 @@ allowing out-of-order writes like this to occur. The model avoided violating the write-write coherence rule by requiring the CPU not to send the W write to the memory subsystem at all!) -There is one last example of preserved program order in the LKMM: when -a load-acquire reads from an earlier store-release. For example: - - smp_store_release(&x, 123); - r1 = smp_load_acquire(&x); - -If the smp_load_acquire() ends up obtaining the 123 value that was -stored by the smp_store_release(), the LKMM says that the load must be -executed after the store; the store cannot be forwarded to the load. -This requirement does not arise from the operational model, but it -yields correct predictions on all architectures supported by the Linux -kernel, although for differing reasons. - -On some architectures, including x86 and ARMv8, it is true that the -store cannot be forwarded to the load. On others, including PowerPC -and ARMv7, smp_store_release() generates object code that starts with -a fence and smp_load_acquire() generates object code that ends with a -fence. The upshot is that even though the store may be forwarded to -the load, it is still true that any instruction preceding the store -will be executed before the load or any following instructions, and -the store will be executed before any instruction following the load. - AND THEN THERE WAS ALPHA ------------------------ @@ -1451,8 +1430,8 @@ they execute means that it cannot have cycles. This requirement is the content of the LKMM's "propagation" axiom. -RCU RELATIONS: link, gp-link, rscs-link, and rcu-path ------------------------------------------------------ +RCU RELATIONS: rcu-link, gp, rscs, rcu-fence, and rb +---------------------------------------------------- RCU (Read-Copy-Update) is a powerful synchronization mechanism. It rests on two concepts: grace periods and read-side critical sections. @@ -1509,8 +1488,8 @@ y, which occurs before the end of the critical section, did not propagate to P1 before the end of the grace period, violating the Guarantee. -In the kernel's implementations of RCU, the business about stores -propagating to every CPU is realized by placing strong fences at +In the kernel's implementations of RCU, the requirements for stores +to propagate to every CPU are fulfilled by placing strong fences at suitable places in the RCU-related code. Thus, if a critical section starts before a grace period does then the critical section's CPU will execute an smp_mb() fence after the end of the critical section and @@ -1523,72 +1502,124 @@ executes. What exactly do we mean by saying that a critical section "starts before" or "ends after" a grace period? Some aspects of the meaning are pretty obvious, as in the example above, but the details aren't -entirely clear. The LKMM formalizes this notion by means of a -relation with the unfortunately generic name "link". It is a very -general relation; among other things, X ->link Z includes cases where -X happens-before or is equal to some event Y which is equal to or -comes before Z in the coherence order. Taking Y = Z, this says that -X ->rfe Z implies X ->link Z, and taking Y = X, it says that X ->fr Z -and X ->co Z each imply X ->link Z. - -The formal definition of the link relation is more than a little +entirely clear. The LKMM formalizes this notion by means of the +rcu-link relation. rcu-link encompasses a very general notion of +"before": Among other things, X ->rcu-link Z includes cases where X +happens-before or is equal to some event Y which is equal to or comes +before Z in the coherence order. When Y = Z this says that X ->rfe Z +implies X ->rcu-link Z. In addition, when Y = X it says that X ->fr Z +and X ->co Z each imply X ->rcu-link Z. + +The formal definition of the rcu-link relation is more than a little obscure, and we won't give it here. It is closely related to the pb relation, and the details don't matter unless you want to comb through a somewhat lengthy formal proof. Pretty much all you need to know -about link is the information in the preceding paragraph. - -The LKMM goes on to define the gp-link and rscs-link relations. They -bring grace periods and read-side critical sections into the picture, -in the following way: - - E ->gp-link F means there is a synchronize_rcu() fence event S - and an event X such that E ->po S, either S ->po X or S = X, - and X ->link F. In other words, E and F are connected by a - grace period followed by an instance of link. - - E ->rscs-link F means there is a critical section delimited by - an rcu_read_lock() fence L and an rcu_read_unlock() fence U, - and an event X such that E ->po U, either L ->po X or L = X, - and X ->link F. Roughly speaking, this says that some event - in the same critical section as E is connected by link to F. - -If we think of the link relation as standing for an extended "before", -then E ->gp-link F says that E executes before a grace period which -ends before F executes. (In fact it says more than this, because it -includes cases where E executes before a grace period and some store -propagates to F's CPU before F executes and doesn't propagate to some -other CPU until after the grace period ends.) Similarly, -E ->rscs-link F says that E is part of (or before the start of) a -critical section which starts before F executes. +about rcu-link is the information in the preceding paragraph. + +The LKMM also defines the gp and rscs relations. They bring grace +periods and read-side critical sections into the picture, in the +following way: + + E ->gp F means there is a synchronize_rcu() fence event S such + that E ->po S and either S ->po F or S = F. In simple terms, + there is a grace period po-between E and F. + + E ->rscs F means there is a critical section delimited by an + rcu_read_lock() fence L and an rcu_read_unlock() fence U, such + that E ->po U and either L ->po F or L = F. You can think of + this as saying that E and F are in the same critical section + (in fact, it also allows E to be po-before the start of the + critical section and F to be po-after the end). + +If we think of the rcu-link relation as standing for an extended +"before", then X ->gp Y ->rcu-link Z says that X executes before a +grace period which ends before Z executes. (In fact it covers more +than this, because it also includes cases where X executes before a +grace period and some store propagates to Z's CPU before Z executes +but doesn't propagate to some other CPU until after the grace period +ends.) Similarly, X ->rscs Y ->rcu-link Z says that X is part of (or +before the start of) a critical section which starts before Z +executes. + +The LKMM goes on to define the rcu-fence relation as a sequence of gp +and rscs links separated by rcu-link links, in which the number of gp +links is >= the number of rscs links. For example: + + X ->gp Y ->rcu-link Z ->rscs T ->rcu-link U ->gp V + +would imply that X ->rcu-fence V, because this sequence contains two +gp links and only one rscs link. (It also implies that X ->rcu-fence T +and Z ->rcu-fence V.) On the other hand: + + X ->rscs Y ->rcu-link Z ->rscs T ->rcu-link U ->gp V + +does not imply X ->rcu-fence V, because the sequence contains only +one gp link but two rscs links. + +The rcu-fence relation is important because the Grace Period Guarantee +means that rcu-fence acts kind of like a strong fence. In particular, +if W is a write and we have W ->rcu-fence Z, the Guarantee says that W +will propagate to every CPU before Z executes. + +To prove this in full generality requires some intellectual effort. +We'll consider just a very simple case: + + W ->gp X ->rcu-link Y ->rscs Z. + +This formula means that there is a grace period G and a critical +section C such that: + + 1. W is po-before G; + + 2. X is equal to or po-after G; + + 3. X comes "before" Y in some sense; + + 4. Y is po-before the end of C; + + 5. Z is equal to or po-after the start of C. + +From 2 - 4 we deduce that the grace period G ends before the critical +section C. Then the second part of the Grace Period Guarantee says +not only that G starts before C does, but also that W (which executes +on G's CPU before G starts) must propagate to every CPU before C +starts. In particular, W propagates to every CPU before Z executes +(or finishes executing, in the case where Z is equal to the +rcu_read_lock() fence event which starts C.) This sort of reasoning +can be expanded to handle all the situations covered by rcu-fence. + +Finally, the LKMM defines the RCU-before (rb) relation in terms of +rcu-fence. This is done in essentially the same way as the pb +relation was defined in terms of strong-fence. We will omit the +details; the end result is that E ->rb F implies E must execute before +F, just as E ->pb F does (and for much the same reasons). Putting this all together, the LKMM expresses the Grace Period -Guarantee by requiring that there are no cycles consisting of gp-link -and rscs-link connections in which the number of gp-link instances is ->= the number of rscs-link instances. It does this by defining the -rcu-path relation to link events E and F whenever it is possible to -pass from E to F by a sequence of gp-link and rscs-link connections -with at least as many of the former as the latter. The LKMM's "rcu" -axiom then says that there are no events E such that E ->rcu-path E. - -Justifying this axiom takes some intellectual effort, but it is in -fact a valid formalization of the Grace Period Guarantee. We won't -attempt to go through the detailed argument, but the following -analysis gives a taste of what is involved. Suppose we have a -violation of the first part of the Guarantee: A critical section -starts before a grace period, and some store propagates to the -critical section's CPU before the end of the critical section but -doesn't propagate to some other CPU until after the end of the grace -period. +Guarantee by requiring that the rb relation does not contain a cycle. +Equivalently, this "rcu" axiom requires that there are no events E and +F with E ->rcu-link F ->rcu-fence E. Or to put it a third way, the +axiom requires that there are no cycles consisting of gp and rscs +alternating with rcu-link, where the number of gp links is >= the +number of rscs links. + +Justifying the axiom isn't easy, but it is in fact a valid +formalization of the Grace Period Guarantee. We won't attempt to go +through the detailed argument, but the following analysis gives a +taste of what is involved. Suppose we have a violation of the first +part of the Guarantee: A critical section starts before a grace +period, and some store propagates to the critical section's CPU before +the end of the critical section but doesn't propagate to some other +CPU until after the end of the grace period. Putting symbols to these ideas, let L and U be the rcu_read_lock() and rcu_read_unlock() fence events delimiting the critical section in question, and let S be the synchronize_rcu() fence event for the grace period. Saying that the critical section starts before S means there are events E and F where E is po-after L (which marks the start of the -critical section), E is "before" F in the sense of the link relation, -and F is po-before the grace period S: +critical section), E is "before" F in the sense of the rcu-link +relation, and F is po-before the grace period S: - L ->po E ->link F ->po S. + L ->po E ->rcu-link F ->po S. Let W be the store mentioned above, let Z come before the end of the critical section and witness that W propagates to the critical @@ -1600,16 +1631,19 @@ some event X which is po-after S. Symbolically, this amounts to: The fr link from Y to W indicates that W has not propagated to Y's CPU at the time that Y executes. From this, it can be shown (see the -discussion of the link relation earlier) that X and Z are connected by -link, yielding: +discussion of the rcu-link relation earlier) that X and Z are related +by rcu-link, yielding: - S ->po X ->link Z ->po U. + S ->po X ->rcu-link Z ->po U. -These formulas say that S is po-between F and X, hence F ->gp-link Z -via X. They also say that Z comes before the end of the critical -section and E comes after its start, hence Z ->rscs-link F via E. But -now we have a forbidden cycle: F ->gp-link Z ->rscs-link F. Thus the -"rcu" axiom rules out this violation of the Grace Period Guarantee. +The formulas say that S is po-between F and X, hence F ->gp X. They +also say that Z comes before the end of the critical section and E +comes after its start, hence Z ->rscs E. From all this we obtain: + + F ->gp X ->rcu-link Z ->rscs E ->rcu-link F, + +a forbidden cycle. Thus the "rcu" axiom rules out this violation of +the Grace Period Guarantee. For something a little more down-to-earth, let's see how the axiom works out in practice. Consider the RCU code example from above, this @@ -1635,18 +1669,18 @@ time with statement labels added to the memory access instructions: } -If r2 = 0 at the end then P0's store at X overwrites the value -that P1's load at Z reads from, so we have Z ->fre X and thus -Z ->link X. In addition, there is a synchronize_rcu() between Y and -Z, so therefore we have Y ->gp-link X. +If r2 = 0 at the end then P0's store at X overwrites the value that +P1's load at Z reads from, so we have Z ->fre X and thus Z ->rcu-link X. +In addition, there is a synchronize_rcu() between Y and Z, so therefore +we have Y ->gp Z. If r1 = 1 at the end then P1's load at Y reads from P0's store at W, -so we have W ->link Y. In addition, W and X are in the same critical -section, so therefore we have X ->rscs-link Y. +so we have W ->rcu-link Y. In addition, W and X are in the same critical +section, so therefore we have X ->rscs W. -This gives us a cycle, Y ->gp-link X ->rscs-link Y, with one gp-link -and one rscs-link, violating the "rcu" axiom. Hence the outcome is -not allowed by the LKMM, as we would expect. +Then X ->rscs W ->rcu-link Y ->gp Z ->rcu-link X is a forbidden cycle, +violating the "rcu" axiom. Hence the outcome is not allowed by the +LKMM, as we would expect. For contrast, let's see what can happen in a more complicated example: @@ -1682,15 +1716,11 @@ For contrast, let's see what can happen in a more complicated example: } If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows -that W ->rscs-link Y via X, Y ->gp-link U via Z, and U ->rscs-link W -via V. And just as before, this gives a cycle: - - W ->rscs-link Y ->gp-link U ->rscs-link W. - -However, this cycle has fewer gp-link instances than rscs-link -instances, and consequently the outcome is not forbidden by the LKMM. -The following instruction timing diagram shows how it might actually -occur: +that W ->rscs X ->rcu-link Y ->gp Z ->rcu-link U ->rscs V ->rcu-link W. +However this cycle is not forbidden, because the sequence of relations +contains fewer instances of gp (one) than of rscs (two). Consequently +the outcome is allowed by the LKMM. The following instruction timing +diagram shows how it might actually occur: P0 P1 P2 -------------------- -------------------- -------------------- @@ -1715,6 +1745,147 @@ before it does, and the critical section in P2 both starts after P1's grace period does and ends after it does. +LOCKING +------- + +The LKMM includes locking. In fact, there is special code for locking +in the formal model, added in order to make tools run faster. +However, this special code is intended to be more or less equivalent +to concepts we have already covered. A spinlock_t variable is treated +the same as an int, and spin_lock(&s) is treated almost the same as: + + while (cmpxchg_acquire(&s, 0, 1) != 0) + cpu_relax(); + +This waits until s is equal to 0 and then atomically sets it to 1, +and the read part of the cmpxchg operation acts as an acquire fence. +An alternate way to express the same thing would be: + + r = xchg_acquire(&s, 1); + +along with a requirement that at the end, r = 0. Similarly, +spin_trylock(&s) is treated almost the same as: + + return !cmpxchg_acquire(&s, 0, 1); + +which atomically sets s to 1 if it is currently equal to 0 and returns +true if it succeeds (the read part of the cmpxchg operation acts as an +acquire fence only if the operation is successful). spin_unlock(&s) +is treated almost the same as: + + smp_store_release(&s, 0); + +The "almost" qualifiers above need some explanation. In the LKMM, the +store-release in a spin_unlock() and the load-acquire which forms the +first half of the atomic rmw update in a spin_lock() or a successful +spin_trylock() -- we can call these things lock-releases and +lock-acquires -- have two properties beyond those of ordinary releases +and acquires. + +First, when a lock-acquire reads from a lock-release, the LKMM +requires that every instruction po-before the lock-release must +execute before any instruction po-after the lock-acquire. This would +naturally hold if the release and acquire operations were on different +CPUs, but the LKMM says it holds even when they are on the same CPU. +For example: + + int x, y; + spinlock_t s; + + P0() + { + int r1, r2; + + spin_lock(&s); + r1 = READ_ONCE(x); + spin_unlock(&s); + spin_lock(&s); + r2 = READ_ONCE(y); + spin_unlock(&s); + } + + P1() + { + WRITE_ONCE(y, 1); + smp_wmb(); + WRITE_ONCE(x, 1); + } + +Here the second spin_lock() reads from the first spin_unlock(), and +therefore the load of x must execute before the load of y. Thus we +cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the +MP pattern). + +This requirement does not apply to ordinary release and acquire +fences, only to lock-related operations. For instance, suppose P0() +in the example had been written as: + + P0() + { + int r1, r2, r3; + + r1 = READ_ONCE(x); + smp_store_release(&s, 1); + r3 = smp_load_acquire(&s); + r2 = READ_ONCE(y); + } + +Then the CPU would be allowed to forward the s = 1 value from the +smp_store_release() to the smp_load_acquire(), executing the +instructions in the following order: + + r3 = smp_load_acquire(&s); // Obtains r3 = 1 + r2 = READ_ONCE(y); + r1 = READ_ONCE(x); + smp_store_release(&s, 1); // Value is forwarded + +and thus it could load y before x, obtaining r2 = 0 and r1 = 1. + +Second, when a lock-acquire reads from a lock-release, and some other +stores W and W' occur po-before the lock-release and po-after the +lock-acquire respectively, the LKMM requires that W must propagate to +each CPU before W' does. For example, consider: + + int x, y; + spinlock_t x; + + P0() + { + spin_lock(&s); + WRITE_ONCE(x, 1); + spin_unlock(&s); + } + + P1() + { + int r1; + + spin_lock(&s); + r1 = READ_ONCE(x); + WRITE_ONCE(y, 1); + spin_unlock(&s); + } + + P2() + { + int r2, r3; + + r2 = READ_ONCE(y); + smp_rmb(); + r3 = READ_ONCE(x); + } + +If r1 = 1 at the end then the spin_lock() in P1 must have read from +the spin_unlock() in P0. Hence the store to x must propagate to P2 +before the store to y does, so we cannot have r2 = 1 and r3 = 0. + +These two special requirements for lock-release and lock-acquire do +not arise from the operational model. Nevertheless, kernel developers +have come to expect and rely on them because they do hold on all +architectures supported by the Linux kernel, albeit for various +differing reasons. + + ODDS AND ENDS ------------- @@ -1780,26 +1951,6 @@ they behave as follows: events and the events preceding them against all po-later events. -The LKMM includes locking. In fact, there is special code for locking -in the formal model, added in order to make tools run faster. -However, this special code is intended to be exactly equivalent to -concepts we have already covered. A spinlock_t variable is treated -the same as an int, and spin_lock(&s) is treated the same as: - - while (cmpxchg_acquire(&s, 0, 1) != 0) - cpu_relax(); - -which waits until s is equal to 0 and then atomically sets it to 1, -and where the read part of the atomic update is also an acquire fence. -An alternate way to express the same thing would be: - - r = xchg_acquire(&s, 1); - -along with a requirement that at the end, r = 0. spin_unlock(&s) is -treated the same as: - - smp_store_release(&s, 0); - Interestingly, RCU and locking each introduce the possibility of deadlock. When faced with code sequences such as: |