Coming from C++, I’m just trying to better understand the memory model of safe languages like C#.
Imagine the following:
- Thread A creates a new object and stores it in a global variable (initialized to null) without using any synchronization mechanism.
- Thread B is in a loop reading this global variable until it detects it’s no longer null. Once again, no synchronization mechanism is employed.
- When Thread B detects a non-null value in this global variable, it calls a virtual method on the object stored in this global var.
My question is:
- From my understanding, the equivalent program in C++ could crash because when you read (on Thread B) the object pointed to by the global variable, that object’s VTABLE might not even been initialized yet (due to reorderings by the compiler or hardware) or have a spurious value. (That’s why you’d need a mutex or something to protect accesses to this global var).
- On a low-level, how does C# guarantee that the program won’t crash due to thread B reading a non-fully initialized object? (If it makes such a guarantee which I believe it does).
I actually expect C# to never crash or have undefined behavior even when not using synchronization mechanisms.
I am not sure I’d be able to answer your question entirely (because I believe there are some missing bits of what you do not understand which you haven’t articulated out loud). Yet let me try.
First and most importantly – while any modern compiler indeed aggressively reorders a program’s instructions (yet to say – also hardware does so “at runtime” – regardless of what the compiler did), it does so maintaining certain invariants. Depending upon particular invariants (and their precise definitions), one gets a significantly different memory model (which nowadays are classified as weak ones and strong ones in general). So you don’t speak about reorderings by themselves; you speak about the reorderings allowed by a particular memory model – and each language might have its own (and some have none at all – at least, not formalized properly; that used to be the case for the C++ before #include stdatomic.h
, actually).
There are many ways to define a memory model. The sharpest precision is achievable by only mathematical means, of course. Aside from mathematics, there are also somewhat less clear (at least, in my eyes) definitions like the following:
The C# memory model permits the reordering of memory operations in a method, as long as the behavior of single-threaded execution doesn’t change.
That demonstrates a typical approach common for many programming languages: whatever gets reordered, a single-threaded execution of the program must not be affected (and may – often will – perfectly fine be affected within a multi-threaded execution). That immediately brings us to understand that your original question is not quite complete (since you’re reasoning about what a multi-threaded program might observe rather than a single-threaded one – and that doesn’t make sense in C#: you have to add more context to the question such it does).
In the general case, to maintain a desired invariant with respect to a multi-threaded run, one must deal with what the #include stdatomic.h
offers. There is no escape. One must declare _Atomic
memory locations and read/write to/from them using appropriate API (which compiler must respect by avoiding lots and lots of “harmful” reorderings and keeping only harmless ones – if any at all for a particular program and hardware).
C# does it a bit differently:
The C# ECMA specification guarantees that the following types will be written atomically: reference types, bool, char, byte, sbyte, short, ushort, uint, int and float. Values of other types—including user-defined value types—could be written into memory in multiple atomic writes. As a result, a reading thread could observe a torn value consisting of pieces of different values.
In particular, that means that a var foo = new Foo(whatever, else, it, does, not, really, matter)
has to 1) ensure in any valid undocumented hardware-compatible way that Foo
is initialized completely (with all the reorderings allowed; for example, we do not know how whatever
, else
, it
, does
, not
, really
, matter
arguments would get written as private
variables inside its constructor – that could happen literally in any order because either would work exactly the same with respect to a single-threaded read
s); 2) atomically swap var foo
and make it point to the beginning of the new Foo
just created and initialized. The last bit – the swap’s atomicity – eventually is guaranteed by hardware and different hardware would require some distinct instruction (or instructions) to enjoy that level of confidence.
More to say. A typical – sane – way to publish a global variable in C# (and many other languages down to good old C) is by marking it as static
. Compilers, of course, are notoriously sensitive to such markers – for many reasons including persisting a memory model guarantees they are supposed to implement and maintain. As such at this point you shouldn’t really be surprised by the following fact:
Another way to safely publish a value to multiple threads is to write the value to a static field in a static initializer or a static constructor.
That is safe precisely because any modern compiler would treat static
differently from non-static
.
P.S. I am referencing quite outdated docs which were written for the .NET Framework
. Since that time a decent .NET Core
has emerged. Nonetheless, I am not aware of any change in the memory model implemented by both – it could not change at least because exactly the same codebase which ran fine under the Framework
is supposed to run (and seemingly runs indeed!) exactly well under the Core
runtime and its guarantees.
P.P.S. I suggest not to learn the subject by following the C++ docs on stdatomic.h
and its internals. Like the C#, they lack formalism and razor-sharp definitions making it practically impossible to get the subject right for a newbie. Academia has developed rather powerful and much clearer theoretical models to deal with at the cost of learning logic and mathematical notation to be able to parse their definitions.
Even if the thread that created the object made sure this object was fully created before writing the global pointer, this isn’t enough to wholly guarantee that the reader will see it this way. This question asks us to consider the semantics of computing hardware, specifically the memory model.
But this is a start – the writing thread can make sure that it finishes constructing the object before updating the global pointer to point to this new object. As the language controls the compiler (or to put it another way, the compiler is created for the language) there is no need to worry about the compiler changing order here. Its not allowed to for exactly this reason – it must emit code where the object is fully constructed before updating the global pointer.
But there is what seems like a significant hole. If the reading thread reads the global pointer, and sees the new value, how does it guarantee when it follows the pointer down to the objects data that it sees the latest memory and not old and invalid data? This I believe is the crux of your question. This is especially valid on architectures with a weak memory model.
The x86/x64 architecture wouldn’t have an issue here as this cannot happen. This is because it has a strong memory model. And on the strong memory model architecture a store
isn’t allowed to pass a store
, nor is a load
allowed to pass a load
, one of which would have to happen for an inconsistency to be seen.
A weak memory architecture machine however would have an issue.
Basically, because of this, c# cannot run on a weak memory model architecture. Lucky such architecture no longer really exists. All machines we’d think of as weak memory model (such as ARM) are actually weak memory model with data dependency ordering.
What weak memory model with data dependency ordering guarantees that if a thread follows a pointer, then the pointed to memory will be at least as new as the memory that the pointer was. This is exactly what is needed to close the hole.
See https://preshing.com/20120930/weak-vs-strong-memory-models/
which mentions:
These families have memory models which are, in various ways, almost as weak as the Alpha’s, except for one common detail of particular interest to programmers: they maintain data dependency ordering. What does that mean? It means that if you write A->B in C/C++, you are always guaranteed to load a value of B which is at least as new as the value of A.
(Obviously another option would be for C# to have some kind of synchronisation primitives pretty much everywhere when acting on global/shared variables, but obviously that’s looks like a bad choice given how much over head that would add).
How does C# guarantee that the program won’t crash due to thread B reading a non-fully initialized object? (If it makes such a guarantee which I believe it does).
It doesn’t. If you read and write to a shared location from multiple threads without synchronization, your program is erroneous and its behavior is undefined.¹
For entry level multithreading, the recommended way for ensuring the correctness of your program is tο use the lock
statement each and every time you interact with the shared state of your program, using the same locker object. The lock
statement inserts the appropriate memory barriers when the locker is acquired and released, so that the threads that successively interact with the shared state, have a consistent view of it.
For advanced multithreading (low-lock/lock-free) you can use the volatile
keyword or the Volatile
class to update a shared field, and be sure that all threads will see fully initialized objects. Lock-free multithreading is considered to be very hard though.
¹ From the perspective of a code reviewer who goes by the ECMA specification of the C# language, prioritizes correctness and portability, and is willing to make zero assumptions about hardware and CLR implementation.
The object will be fully created before the reference is updated to refer to it, so the code can never observe an incomplete object.
Do not fall into the trap of finding correspondences of behavior between languages. Clear your mind and start from scratch by reading how the new language behaves from the bottom up. Life cycle managment in .Net is fundamentally different : reference semantics + garbage collection (which are threadsafe). But don’t expect everything to be treadsafe
Related: Is locking required when swapping variable reference in multithreaded application?