Can inline substitution cause an infinite loop in multithreaded code?

Question

Please note: This is just a question out of curiosity, but not about writing better-multithreaded code. I don't and won't write code like this in real projects, of course.

Inline substitution may occur when the inline keyword is added. So I'm curious.

Let's say we have code like this:

static bool done = false;

inline void setDone(bool newState) {
    done = newState;
}

inline bool getDone() {
    return done;
}

void someWorkerThreadJob() {
    // Accessing without any lock/atomic/volatile
    while (getDone() == false) {
    }
}

Can someWorkerThreadJob() be compiled like below and run into an infinite loop?

void someThreadJob() {
    while (done == false) {
    }
}

This also leads me to the next question. What about the getters and setters in classes? Member functions defined inside a class are implicitly inline, so I think inline substitution can occur, thus the same problem. Is this correct?

The inline is a red herring. You don't have any synchronization/atomics around `done`, if more than one thread accesses it (and at least one writes), you have undefined behavior (race condition) regardless of inlining. — Mat, May 07 '21 at 08:07
You may have a misunderstanding when and why something becomes inline. The keyword `inline` plays no role for this. -> [inline specifier](https://en.cppreference.com/w/cpp/language/inline) The compiler is free to inline every function (for optimization purposes) though this may not change the observable behavior. The `inline` keyword is a hint for the compiler that multiple occurrences of definitions (e.g. global variable definition in a header) have to be merged to one. (In former standards, such multiple definitions were a violation of the ODR principle and resulted in link errors usually.) — Scheff's Cat, May 07 '21 at 08:16
@Mat Yeah, as said in the first part, this isn't the code we would write. I just suddenly became curious about this situation. I thought my theory was obvious, but wanted answers from experts. — Jenix, May 07 '21 at 08:16
@Scheff If I'm not wrong, what you said is the primary thing we expect from the `inline` keyword. But there's another thing. Inline substitution, where copy&paste happens. — Jenix, May 07 '21 at 08:20
_Inline substitution, where copy&paste happens._ As I already said: The compiler is free to do inlining on its own and beyond the control of the human author. This may not change the observable behavior ([as-if rule](https://en.cppreference.com/w/cpp/language/as_if)) as long as the code doesn't exhibit [Undefined Behavior](https://en.cppreference.com/w/cpp/language/ub). Though, I would consider concurrent access to an unguarded variable as Undefined Behavior... — Scheff's Cat, May 07 '21 at 08:23
@Scheff If you read it again, I said "Inline substitution may occur". I was talking about the situation where it happened, but yeah, didn't much think about the "undefined behavior". Each compiler may substitute differently, so we never know.. — Jenix, May 07 '21 at 08:36
There once was a similar question why a certain code was working when compiled without optimization and stopped working with optimization enabled. (There was nothing mentioned about `inline` but IMHO the latter doesn't play a role actually - and it may not.) [SO: Multithreading program stuck in optimized mode but runs normally in -O0](https://stackoverflow.com/a/58516119/7478597) — Scheff's Cat, May 07 '21 at 08:46
I’m not sure how the above substitution (replacing the getter by a direct use of the global variable) would change any behaviour? The real issue here (as mentioned by other comments/answer) are memory barriers and cache synchronisation between CPUs. — op414, May 07 '21 at 09:01
@op414 I realized my question isn't valid anymore. But if you ask me, I was asking if compilers would just copy-and-paste like above, which is even more problematic. — Jenix, May 07 '21 at 09:09

Jérôme Richard · Accepted Answer · 2021-05-07T08:47:42.160

6

The access to done must be protected in parallel and synchronized between threads. Otherwise, the processor or the compiler can produce/execute an incorrect sequence of instructions. Your current program is ill-formed.

The problem you are facing is that done can be cached in the L1 CPU cache (processor dependent) or the compiler could optimize the access to the done global variable (typically by putting it in a register, although many compilers do not in practice).

You need to use atomic instructions or mutexes/locks (or any synchronization mechanism) so that done can be seen by other threads when it is modified. With atomic instructions, the compiler generates proper memory fences, do not put done in a register or/and produces instructions that synchronize the communication bus (on x86_64 for example).

For more information, you can give a look to cache-coherence protocols like the MOESI and look how x86 deals with atomics instruction.

In this case, mainstream compilers (like GCC and Clang) actually optimize the code to no-op instruction (which is totally legal here regarding the C++ standard) mainly due to the static and inline keywords helping compilers. This is not the case with std::atomic.

edited May 07 '21 at 08:47

answered May 07 '21 at 08:18

Jérôme Richard

8,011
1
9
30

Thank you, but I wasn't asking about multi-threaded programming. I know things like memory barriers, etc. If I didn't I wouldn't even think the example code would run into an infinite loop in the first place. I wondered how compilers would deal with situations like this with the inline keyword. Yes, I know, this is a useless question, but I was really curious. – Jenix May 07 '21 at 08:28
1

Ok. I updated the question to add this information (at the end). This is *not* a useless question as many people do that and are not aware this can actually happens on modern platforms (it was rare to see such an issue a decade ago although the problem was theoretically here). Such an issue is trivial to solve in your specific case but tricky in huge parallel applications calling legacy code with hidden global variables ;). – Jérôme Richard May 07 '21 at 08:52
I concluded that my question was wrong in the first place. As I said in the comments above, didn't much think about the 'undefined behavior', meaning even the substitution process can be different compilers to compilers, which also means, there's no answer to this. I was a little hesitant to accept yours because it wasn't answering my question. But I did because it's useful to others. – Jenix May 07 '21 at 09:04

score 2 · Answer 2 · answered May 07 '21 at 09:08

In the situation you've provided, inlining provides no additional problems above and beyond the normal problems of accessing data without locks of some sort. It isn't as if inlining getDone() to just done somehow erases the properties of done into being non-static or something. Like if you coded done in directly as in the example code, there's a risk of an infinite loop, but that'd be true irrespective of inlining or not because cpp's specification leaves this sort of unprotected multi-threaded access to a variable as (I believe) undefined behavior so you might have an infinite loop, you might have a loop that eventually stops if a different thread updates done, and you might have a loop that ~immediately stops if a different thread updates done as if it were protected. Weird things happen in undefined behavior, so there's no easy way to answer what you've asked (AFAIK, I haven't made a thorough investigation of the C++11 spec on this particular point).

Properly implemented inlining doesn't change what is being executed in an unsafe manner because that's sort of the definition of a correct inline. If inlining a method altered its behavior such that it stopped working in a multi-threaded environment, that would be a bug in the compiler that would need to be fixed.

There is, of course, a lot of room for error in compilers and such a bug may very well exist. In practice, an inlining compiler detects situations where inlining might cause a problem and either avoids inlining or fixes the problem. For the most part, inlining is safe and the reason you don't inline is because of ambiguity over what code to inline (in case there's overloading/inheritance/virtual calls going on), recursion (you can't inline a function into itself infinitely, but you can up to a limit), or performance considerations (typically increased code size, but also causing code blocks to become so large that other optimizations are disabled).

Can inline substitution cause an infinite loop in multithreaded code?

2 Answers2