Memory Consistency Models

References:

Memory Consistency: the problem of defining how parallel threads can observe their shared memory state.

A memory consistency model is a contract between the hardware and software. The hardware promises to only reorder operations in ways allowed by the model, and in return, the software acknowledges that all such reorderings are possible and that it needs to account for them.

Sequential Consistency

  • Sequential consistency (SC): single main memory + program order.
    • e.g. cannot print 00.

sequential consistency analysis

Total Store Ordering

  • Relaxed Memory Models

    • models with multiple (private) memory but shared data: not single main memory due to cache hierachies (e.g. private L1/L2, and shared L3);
  • Total store ordering (TSO): on-core store buffer;

    • writes in the buffer, and do not need to wait for writes to the shared memory (e.g. shared L3);
    • hide the write latency (cache hierachy will pull the write from the store buffer and propagage it thought the caches so that it becomes visible to other threads);
    • preserves single-thread behavior.
    • e.g. can print 00

store buffer

  • Litmus tests: small tests as above example to show the memory model.
  • formalizing x86-TSO by Peter Sewell.

Weaker memory models

ARM: weak ordering + barriers.

ARM memory model is notoriously underspecified, but is essentially a form of weak odering.

Barrier: a barrier instruction forces all memory opertions before it to complete before any memory operation after it can begin. That is, a barrier instruction effectively reinstates sequential consistency at a particular point in program execution.

RISC-V RVWMO

RISC-V Weak Memory Ordering

Defined in terms of the global memory order, a total ordering of the memory operations produced by all harts.

Under RVWMO, code running on a signle hart appears to execute in order from the perspective of other memory instructions in the same hart, but memory instructions from another hart may observe the memory instructions from the first hart being executed in a different order.

Therefore, multithreaded code may require explicit synchronization to guarantee ordering between memory instructions from different harts.

  • The base RISC-V ISA: FENCE instruction.
  • Extension: ‘A’ additionally defines load-reserved/store-conditional and atomic read-modify-write instructions. (AMO: atomic memory operation)
  • Extension: ‘Zam’ for misaligned atomics
  • Extension: ‘Ztso’ for total store ordering, with additional rules to augment RVWMO.

Languages’ memory models

Compilers also reorder memory operations.

Program 1:

X = 0
for i in range(100):
    X = 1
    print X

Program 2 (an optimized version of 1):

X = 1
for i in range(100):
    print X

with parallel, another threads write X as 0 during the execution of Program 12.

Program 1 outputs: 111011111... Program 2 outputs: 111000000...

in C++/Java:

In fact, languages such as C++ and Java offer a guarantee known as sequential consistency for data-race-free programs (or the buzzwordy version, “SC for DRF”). This guarantee says that if your program has no data races, the compiler will insert all the necessary fences to preserve the appearance of sequential consistency. If your program does have data races, however, all bets are off, and the compiler is free to do whatever it likes. The intuition is that programs with data races are quite likely to be buggy, and there’s no need to provide strong guarantees for such buggy programs. If a program has deliberate data races, the programmer likely knows what they’re doing anyway, and so can be responsible for memory ordering issues themselves.

More

Created Jul 6, 2022 // Last Updated Jul 6, 2022

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?