99

Isn't atomic<bool> redundant because bool is atomic by nature? I don't think it's possible to have a partially modified bool value. When do I really need to use atomic<bool> instead of bool?

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • 7
    You need `atomic` to avoid race-conditions. A race-condition occurs if two threads access the same memory location, and at least one of them is a write operation. If your program contains race-conditions, the behavior is undefined. – nosid May 01 '13 at 15:14
  • 13
    @nosid: Yes, but what the OP is saying is that he doesn't believe that you can have a *partial* write operation on a bool like you can, say an `int` value where you are copying each byte or word of that value individually. There therefore shouldn't be any race condition, if the write is already atomic. – Robert Harvey May 01 '13 at 15:15
  • 17
    `sizeof(bool)` is implementation-defined and may conceivably be > 1, so it's possible that it could be non-atomic in some cases. – Paul R May 01 '13 at 15:19
  • 1
    Related: http://stackoverflow.com/questions/5067492/why-the-sizeofbool-is-not-defined-to-be-one-by-the-standard-itself – Paul R May 01 '13 at 15:20
  • 18
    Without atomic there is no guarantee that you'll ever see the update in the other thread at all, or that you'll see updates to variables in the same order that you make them in a different thread. – jcoder May 01 '13 at 15:21
  • Related: [Do I have to use atomic for “exit” bool variable?](http://stackoverflow.com/q/16111663/341970) – Ali May 01 '13 at 16:01
  • @jcoder: To be super pedantic, I believe the Standard doesn't actually mandate cache coherency (or rather "visibility propagation") -- it's left as a "best effort quality of implementation". That is, you can have an atomic variable synchronize two threads, but there is no guarantee (in the Standard) that a change ever propagates. It's only that *if* the change propagates, then it transfers the happens-before relationship. (For example, thread A could store "unlocked", but thread B could forever continue to read "locked" Only *if* it reads "unlocked" will it proceed safely.) – Kerrek SB May 01 '13 at 19:35
  • @KerrekSB isn't that exactly what the store and load functions std::atomic are for? – jcoder May 01 '13 at 19:40
  • @jcoder: I'm not sure. Store and load (with the relevant orderings!) are *synchronisation points*. That means that *if* you load a certain value, then you know that the store of that value has happened. But there's no guarantee *that* you eventually load the stored value. You might also forever continue loading the old value. ("Exchange" would be different, though, and necessarily have to propagate.) – Kerrek SB May 01 '13 at 20:22
  • One of the main reasons to use atomics is to suppress local-caching optimization of variable state. There's nothing that guarantees a global variable or class member set in one thread will be seen in another thread that is doing "while (condition) ..." In this use, they replace the poorly defined *volatile* keyword with precise semantics. – Wheezil Dec 07 '18 at 20:41

6 Answers6

104

No type in C++ is "atomic by nature" unless it is an std::atomic*-something. That's because the standard says so.

In practice, the actual hardware instructions that are emitted to manipulate an std::atomic<bool> may (or may not) be the same as those for an ordinary bool, but being atomic is a larger concept with wider ramifications (e.g. restrictions on compiler re-ordering). Furthermore, some operations (like negation) are overloaded on the atomic operation to create a distinctly different instruction on the hardware than the native, non-atomic read-modify-write sequence of a non-atomic variable.

Kerrek SB
  • 428,875
  • 83
  • 813
  • 1,025
78

Remember about memory barriers. Although it may be impossible to change bool partially, it is possible that multiprocessor system has this variable in multiple copies and one thread can see old value even after another thread has changed it to new. Atomic introduces memory barrier, so it becomes impossible.

Dims
  • 37,353
  • 77
  • 251
  • 478
  • 4
    can keyword `volatile` fix the multiprocessor issue? – Vincent Xue Jun 29 '15 at 14:44
  • 8
    No. Volatile has nothing to do with memory fences. – unexpectedvalue Oct 04 '15 at 20:18
  • 12
    Just for clarity's sake. @Vincent's comment may have originated from an understanding of the keyword `volatile` in Java. The `volatile` keyword in Java does control memory fences but has a very different behavior than the `volatile` keyword in C which does not. [This question](http://stackoverflow.com/questions/19923352/whats-the-difference-of-the-usage-of-volatile-between-c-c-and-c-java) explains the difference further. – Pace Dec 19 '16 at 19:53
  • Why is atomicity tied to memory ordering? Does std::atomic imply barriers? If so isn't that going a bit further than merely atomic? – nmr Oct 18 '17 at 20:05
  • https://stackoverflow.com/a/14625122/580677 Yeah, turns out std::atomic does more than what it says on the tin. Of course it does. – nmr Oct 18 '17 at 22:25
  • 4
    I think that's real correct answer. Because the answer about "standards bla-bla-bla... sizeof(bool) can be > 1 " is something that never happens in real life. All major compilers have sizeof(bool) == 1 and all read/write operations will work in a similar way for bool and atomic. But multi-core CPU and missed memory barrier is something that will happen with nearly 100% chance for any modern application and hardware – Ezh Sep 02 '18 at 09:53
  • @nmr: atomicity is tied to ordering so you *can* use it to create synchronization between threads. If you don't need that, use `std::memory_order_relaxed` to get atomicity without ordering. This answer is totally wrong: barriers don't create atomicity because for example they don't stop a store from *another* thread from appearing between `tmp=var; tmp++; var=tmp;`. Special CPU instructions are needed to make that sequence into an atomic RMW. See also [Can num++ be atomic for 'int num'?](//stackoverflow.com/a/39396999) – Peter Cordes Jul 03 '19 at 20:18
  • @Dims: please delete this answer and stop spreading this misconception about barriers and how cache coherency works. If you want to say that `atomic` defaults to sequential-consistency ordering, say that. That's not required for atomicity, and conflicting values in cache for the same variable aren't possible: MESI cache coherence prevents that. `atomic` implies some of the same things as `volatile` so the compiler doesn't hoist the variable's value into a register, though. *That* isn't coherent. – Peter Cordes Jul 03 '19 at 20:19
  • @PeterCordes I didn't say barriers imply atomicity, please re-read answer. – Dims Jul 04 '19 at 11:47
  • Oh right. But what you did say is still wrong. Barriers aren't needed to make a store globally visible. You only need them if you need *this* thread to wait until after that happens on its own. In theory you could have an inefficient C++ implementation on a system with non-coherent shared memory, but normally you use MPI or other message-passing for communication between coherency domains in the very rare huge clusters with some shared but not coherent memory. What `atomic` really does on normal systems is stop the compiler from keeping the value in a thread-private *register* – Peter Cordes Jul 05 '19 at 02:34
  • Memory barriers aren't related to multithreading – G Huxley Oct 31 '19 at 23:35
26

C++'s atomic types deal with three potential problems. First, a read or write can be torn by a task switch if the operation requires more than one bus operation (and that can happen to a bool, depending on how it's implemented). Second, a read or write may affect only the cache associated with the processor that's doing the operation, and other processors may have a different value in their cache. Third, the compiler can rearrange the order of operations if they don't affect the result (the constraints are a bit more complicated, but that's sufficient for now).

You can deal with each of these three problems on your own by making assumptions about how the types you are using are implemented, by explicitly flushing caches, and by using compiler-specific options to prevent reordering (and, no, volatile doesn't do this unless your compiler documentation says it does).

But why go through all that? atomic takes care of it for you, and probably does a better job than you can do on your own.

Pete Becker
  • 69,019
  • 6
  • 64
  • 147
  • Task switches don't cause tearing unless it took multiple *instructions* to store the variable. Whole instructions are atomic wrt. interrupts on a single core (they either fully complete before the interrupt, or any partial work is discarded. This is part of what store buffers are for.) Tearing is far more likely between threads on separate cores that are actually running simultaneously, because then yes you can get tearing between the parts of a store done by one instruction, e.g. an unaligned store or one too wide for the bus. – Peter Cordes Jul 03 '19 at 20:27
  • No, a core can't write a cache line until it has exclusive ownership of that line. The MESI cache coherency protocol ensures this. (See [Can num++ be atomic for 'int num'?](//stackoverflow.com/a/39396999)). The real problem for C++ is that the compiler is allowed to assume that non-atomic variables aren't changed by other threads, so it can hoist loads out of loops and keep them in *registers* or optimize away. e.g. turning `while(!var) {}` into `if(!var) infloop();`. This part of `atomic` is similar to what `volatile` does: always re-read from memory (which is cached but coherent). – Peter Cordes Jul 03 '19 at 20:30
  • @PeterCordes — I don’t have the wisdom to make assertions about the behavior of every possible hardware architecture that C++ code could be run on. Maybe you do, but that doesn’t mean you should resurrect a six-year old thread. – Pete Becker Jul 03 '19 at 20:32
  • To roll your own atomics, you don't need to flush caches; you would use `volatile` + barriers. And you'd need inline asm for RMW atomics like `var += 1;` to be a single atomic increment instead of an atomic load, increment inside the CPU, then a separate atomic store. – Peter Cordes Jul 03 '19 at 20:33
  • I was simplifying in my comment to talk about normal machines: sure it's possible to have a C++ implementation on a machine that requires explicit flushing for coherency, but the C++ memory model and the concept of release-stores is only efficient with coherent memory. Otherwise every release-store or seq-cst store would have to flush *everything*, barring clever as-if optimizations. All mainstream SMP systems are cache-coherent. There are non-coherent big clusters with shared memory, but they use that for message passing not for running threads of a single program. – Peter Cordes Jul 03 '19 at 20:37
  • @PeterCordes — “simplifying to talk about normal machines” means that your comments do not address the meaning a “atomic” **in the C++ standard**, which describes requirements for implementations on **any** machine. “There are more things in heaven and Earth ... than we dreamt of in your philosophy.” – Pete Becker Jul 05 '19 at 00:38
  • Your answer introduced discussion of implementation details. There's lots of possible gotchas you could make up if you want to invent hypothetical hardware. But yeah, the language in this answer doesn't go as far as implying that's an issue on normal hardware, unlike your answer on [Can a bool read/write operation be not atomic on x86?](//stackoverflow.com/a/14625122) (one of the duplicates of this question, but which is tagged x86 and thus can't have non-coherent caches across threads). – Peter Cordes Jul 05 '19 at 00:52
  • 1
    An efficient C++ implementation on a machine that required explicit coherency sounds unlikely, so it's a weird one to make up when keeping values in registers produces the same problem you're talking about via a mechanism that does exist on all real CPUs. What bugs me about this answer is that it's not helping to clear up the common misconception about cache coherency in the real systems we do use. Many people think that explicit flushing of some kind is necessary on x86 or ARM, and that reading stale data *from cache* is possible. – Peter Cordes Jul 05 '19 at 00:56
  • 1
    If the C++ standard cared at all about efficiency on non-coherent shared memory running multiple threads, there'd be mechanisms like release-stores that only made a certain array or other object globally visible, not *every* other operation before that point (including all non-atomic ops). On coherent systems, release stores just have to wait for preceding in-flight loads/stores to complete and commit, not write back the whole contents of any private caches. Access to our dirty private caches by other cores happens on demand. – Peter Cordes Jul 05 '19 at 01:00
  • @PeterCordes — this answer wasn’t intended to address cache coherency in systems most people use. It was intended to suggest that that’s **irrelevant** if you use C++ atomics, since the implementation will handle whatever issues are present on the target hardware. But you obviously are impervious to the implications of writing a standard, and I’m not going to waste any more time trying to educate you. – Pete Becker Jul 05 '19 at 01:03
  • I think you're missing the point of my comments. I know the C++ standard is written in a hardware-agnostic way, and that's definitely a good thing. But this answer starts out by saying "C++'s atomic types deal with three potential problems", so you're claiming that you're going to cover *every* possible hardware detail that might be a problem for C++ `atomic`, and that there are only 3 of them. ISO C++ doesn't even mention caches; that's on you so I think it's fair to criticise your choice of what to talk about as far as caches. You're not technically wrong, just IMO misleading. – Peter Cordes Jul 05 '19 at 01:18
  • OTOH you have convinced me that fearmongering about non-coherent caching actually makes some sense here: it's something that `atomic` will take care of for you "if it's an issue on the target system". Even though it isn't on any C++ implementation I'm aware of, if the reader didn't know that then they definitely aren't ready to roll their own atomics on top of `volatile` or compiler-specific memory barriers. – Peter Cordes Jul 05 '19 at 01:20
  • `std::atomic` gives you at least 2 other things you didn't mention: well-defined behaviour if another thread changes a value you're reading in a loop. (So it has that in common with `volatile`: force a re-read from memory). And make read-modify-write operations like `b ^= 1;` atomic. Except `atomic` doesn't have a negate function, but there is `b.compare_exchange_weak` or `.exchange` which are atomic. e.g. on x86 you get `lock cmpxchg` instead of just load/branch or whatever. [How to atomically negate an std::atomic\_bool?](//stackoverflow.com/q/9806200) – Peter Cordes Jul 05 '19 at 01:33
  • @PeterCordes - How does it give you well defined behavior for reading the volatile in a loop? As far as I know that's a common misconception: that forces an up-to-date read hence solving the "non-volatile loop bool" problem - but as far as I know the standard makes no guarantees here. It is hard to see how such guarantees would be written in any case, since the model is largely about relative behavior in the "happens before" style and makes no reference to a global clock (AFAIK). – BeeOnRope Jul 05 '19 at 01:54
  • @BeeOnRope: `volatile` does *not* give you well-defined behaviour for this in ISO C++. Only on specific implementations (like GNU C for a known set of ISAs) can you usefully roll your own atomics on top of `volatile`, ignoring the fact that it's technically UB, like the Linux kernel. I should have said *and* instead of *or* implementation-defined stuff. I think in practice you'd be hard-pressed to find an implementation where `volatile` would break for this; like I said I don't think there are any C++ implementations on non-coherent shared memory hardware, and that's highly non-standard. – Peter Cordes Jul 05 '19 at 02:02
  • Sorry @PeterCordes, I was talking about `atomic` not `volatile`. My claim is that `atomic` doesn't give you the behavior that one thread reading a variable will see the new value after another thread writes it, in theory. In practice it does, as a QoI issue and because the optimization that would break this is somewhat unlikely. – BeeOnRope Jul 05 '19 at 02:07
  • @BeeOnRope: That was the design-intent of the standard I think, while still allowing optimization of atomics in some cases. Yes, [optimization of atomics is a thorny problem](//stackoverflow.com/q/41820539). But I don't think you can ever justify hoisting a relaxed-atomic load out of a spin-wait loop according to any sane reading of the as-if rule. Any real compiler target will have some kind of maximum plausible reordering timespan, and it will be less than infinity. So assuming that *all* infinity of the reads (including the first one) happened before a write isn't sane. – Peter Cordes Jul 05 '19 at 02:11
  • The language isn't written in those terms though ("hoisting", "reordering", etc). You don't need to look for the reasons why such an optimization would be allowed via "as if", because the base case doesn't guarantee this. You need to look for any language which suggests that a write by one thread is guaranteed to be seen by another thread, *ever*. As far as I know, there isn't. So it's not a question of optimization breaking something otherwise guaranteed in the standard: AFAIK it's not guaranteed at all. @PeterCordes – BeeOnRope Jul 05 '19 at 02:14
  • @BeeOnRope: But actually the reasoning about possible orderings should be about orderings allowed in the C++ abstract machine, and then picking one such ordering at compile time. So oops, the target reality doesn't actually come into it that early. – Peter Cordes Jul 05 '19 at 02:14
  • @BeeOnRope: Good point. For `if(!b) infloop();` to be equivalent according to the as-if rule to `while(!b){}`, you'd have to decide that all infinity of the reads of `b` are contiguous in the global order of reads and writes for `b`. i.e. that they all *happen-before* any possible write from another thread. I guess that's possible in theory for a DeathStation 9000 implementation, but very obviously isn't the *intent* of the standard. It might not even be standard-compliant depending on the order of the program starting its threads. – Peter Cordes Jul 05 '19 at 02:22
  • @BeeOnRope: There *is* language in a footnote/guideline in the standard that says implementations *should* ensure that even relaxed-atomic stores are promptly visible to all threads. *32.4.12 Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.* http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf But you're right it doesn't actually guarantee it, I was forgetting that. So I guess this answer is doubly wrong, because `atomic<>` in the ISO standard *doesn't* quite guarantee cache flushing on a non-coherent system. – Peter Cordes Jul 05 '19 at 02:24
25

Consider a compare and exchange operation:

bool a = ...;
bool b = ...;

if (a)
    swap(a,b);

After we read a, we get true, another thread could come along and set a false, we then swap (a,b), so after exit b is false, even though the swap was made.

Using std::atomic::compare_exchange we can do the entire if/swap logic atomically such that the other thread could not set a to false in between the if and the swap (without locking). In such a circumstance if the swap was made than b must be false on exit.

This is just one example of an atomic operation that applies to a two value type such as bool.

Andrew Tomazos
  • 58,923
  • 32
  • 156
  • 267
  • 2
    How come this is the lowest rated answer? This (or test_and_set in std::atomic_flag) is the main reason to use an atomic bool type. – Szocske Dec 02 '15 at 18:55
20

Atomic operations are about more than just torn values, so while I agree with you and other posters that I am not aware of an environment where torn bool is a possibility, there is more at stake.

Herb Sutter gave a great talk about this which you can view online. Be warned, it is a long and involved talk. Herb Sutter, Atomic Weapons. The issue boils down to avoiding data races because it allows you to have the illusion of sequential consistency.

huskerchad
  • 946
  • 7
  • 9
10

Atomicity of certain types depends exclusively on the underlying hardware. Each processor architecture has different guarantees about atomicity of certain operations. For example:

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:

  • Reading or writing a byte
  • Reading or writing a word aligned on a 16-bit boundary
  • Reading or writing a doubleword aligned on a 32-bit boundary

Other architectures have different specifications on which operations are atomic.

C++ is a high-level programming language that strives to abstract you from the underlying hardware. For this reason standard simply cannot permit one to rely on such low-level assumptions because otherwise your application wouldn't be portable. Accordingly, all the primitive types in C++ are provided with atomic counterparts by C++11 compliant standard library out-of-the-box.

Community
  • 1
  • 1
Alexander Shukaev
  • 15,556
  • 8
  • 64
  • 81
  • 2
    Another critical part is that C++ compilers are normally allowed to keep variables in registers or optimize away accesses, because they can assume that no other threads are changing the value. (Because of data-race UB). `atomic` sort of includes this property of `volatile`, so `while(!var){}` can't optimize into `if(!var) infinite_loop();`. See [MCU programming - C++ O2 optimization breaks while loop](//electronics.stackexchange.com/a/387478) – Peter Cordes Jul 03 '19 at 20:46