Why does this race condition occur only with -O3 and only with a certain, seemingly equivalent code order?

Question

I have implemented an immutable hash map and accompanying STM container inspired by clojure's atom, which is to say, something akin to C++'s std::unique_ptr, in that it manages (but not necessarily owns) another object via a pointer, which can be passed around and swapped out, and only decrementing the reference count of the managed object instead of outright destructing it. When running some test code compiled with -O3, I've started noticing incorrect results.

The code for the compare-and-swap looks something like this:

hashmap *update(stm *stm, update_callback cb, void *user_args)
{
    while (1) {
        lock(stm->mutex);
        hashmap *current = stm->reference;
        increment_ref_count(current); // Line 6
        unlock(stm->mutex);

        hashmap *aspirant = cb(current, user_args); // returns a new, thread local instance
        increment_ref_count(aspirant);

        // Position 1

        lock(stm->mutex);
        if (current == stm->reference) { // success, no modification while we were busy
            stm->reference = aspirant;
            increment_ref_count(aspirant); // stm now has a reference
            unlock(stm->mutex);

            // Position 2.1
            decrement_ref_count(current); // release reference acquired at line 6

            decrement_ref_count(current); // stm no longer has a reference
            return aspirant;

        } else { // reference was modified, loop and try again
            unlock(stm->mutex);

            // Position 2.2
            decrement_ref_count(current); // release reference acquired at line 6

            decrement_ref_count(aspirant); // ref_count now at zero, aspirant is free'd
        }
    }
}

increment_ & decrement_ref_count in-/decrease the reference count of the hashmap atomically. If the count drops to zero as the result of a decrement, the hasmap will be free'd shortly after.

The challenge with designing an STM container for a reference counted pointer was mostly about making aquiring-the-reference-and-incrementing-the-counter atomic, which is why I'm using locks here.

As a test, I'm using the hashmap+STM to count occurences in a word list.

If I run the code as posted here, no race conditions occur. Now comes my problem: if I move the decrement_ref_count(current); // for line 6 out of the if/else, away from Positions 2.1/2.2 (right after the second locked region) and put it at Position 1 (right before the second locked region), suddenly the word counts start being incorrect, and I have no idea why.

My argument is that a) I don't make any use of current during the second critical region, and b) therefore it shouldn't matter if I release the reference before or after the compare-and-swap.

Obviously, I have a workaround/solution for the problem; simply leaving the decrement's where they are now, but I would really like to understand why this happens.

Compiled with: gcc -Wall -Wcast-align -Wswitch-enum -Wswitch-default -Winit-self -pedantic -O3 -DNDEBUG -std=gnu11 on the "Windows Subsystem for Linux".

@tadman `current` is the hash map instance that `stm` currently points to. If no thread is currently inside this function (`update`), and no thread has acquired the reference for reading, it's reference count is 1, because `stm` is the only thing that points to that hash map instance. I'm not well versed in C++, but what I mean by "STM" is similar to a thread safe `std::unique_ptr`. — ammut, Apr 15 '20 at 18:33
Random code suddenly breaking or working when you shuffle various statements around tends to happen when you have undefined behavior somewhere in your code. Usually not at all related to the code you moved around. Alternatively a stack overflow, which doesn't look likely here. But what you've posted here is just a bunch of function calls - to find the bug, you need to dig deeper inside those functions. — Lundin, Apr 15 '20 at 18:49
Alternatively, you are using [clang, which apparently goes bats**t when you use while(1)](https://stackoverflow.com/questions/59925618/how-do-i-make-an-infinite-empty-loop-that-wont-be-optimized-away). So which compiler is this? — Lundin, Apr 15 '20 at 18:52
@Lundin Oh boy... Do you have any pointers for me on where or how to start looking for UB? I am compiling with a lot of `-W` flags (see updated question) - and with gcc - and the output is clean. — ammut, Apr 15 '20 at 19:09
@Lundin: `while(1)` is not a problem as long as there's a reachable exit condition or side effects within the loop. Both of those are the case here. — R.. GitHub STOP HELPING ICE, Apr 15 '20 at 20:02
Fishy type conversions, fishy use of function pointers, uninitialized variables, dangling pointers or handles etc etc. Could be anything really. You could also try Wextra and then Wconversion (good but might come with misc false positives). Also -fno-strict-aliasing. Oh and `-pedantic` with `-std=gnu11` doesn't really make any much sense together. Do you actually need something in gnu11? — Lundin, Apr 15 '20 at 20:03
@R..GitHubSTOPHELPINGICE Printing to the console is a side effect which clang up to trunk doesn't handle correctly. The OP isn't using clang anyway. — Lundin, Apr 15 '20 at 20:07
@Lundin I found the reason why my seemingly innocuous code reordering lead to errors, see my answer below. In the end it had nothing to do with UB in my other code, although I did find another, unrelated bug while looking for potential culprits. Thanks a lot for your tips! — ammut, Apr 20 '20 at 14:31
@ammut Well, accessing "dangling pointers" is UB. Not sure if it applies to your case, but initializing all pointers to NULL and then also always setting them to NULL after every `free` makes the code more rugged in general. — Lundin, Apr 20 '20 at 19:01

score 1 · Answer 1 · answered Apr 20 '20 at 14:28

I found the problem - and even have a "lesson learnt". First, let me repost the broken version of the code so it's easier to talk about it:

 1 | hashmap *update(stm *stm, update_callback cb, void *user_args)
 2 | {
 3 |     while (1) {
 4 |         lock(stm->mutex);
 5 |         hashmap *current = stm->reference;
 6 |         increment_ref_count(current); // Line 6
 7 |         unlock(stm->mutex);
 8 | 
 9 |         hashmap *aspirant = cb(current, user_args); // returns a new, thread local instance
10 |         increment_ref_count(aspirant);
11 | 
12 |         decrement_ref_count(current); // current might get free'd here
13 | 
14 |         lock(stm->mutex);
15 |         if (current == stm->reference) { // success, no modification while we were busy
16 |             stm->reference = aspirant;
17 |             increment_ref_count(aspirant); // stm now has a reference
18 |             unlock(stm->mutex);
19 | 
20 |
21 |
22 | 
23 |             decrement_ref_count(current); // stm no longer has a reference
24 |             return aspirant;
25 | 
26 |         } else { // reference was modified, loop and try again
27 |             unlock(stm->mutex);
28 | 
29 |
30 |
31 | 
32 |             decrement_ref_count(aspirant); // ref_count now at zero, aspirant is free'd
33 |         }
34 |     }
35 | }

The problem basically is, that on line 15 I make use of an address, over which I have given up control. I assumed that if another thread was to modify stm->reference, it would do so with a different address. Instead, the following happened: While my thread was waiting for the lock at line 14, the current hash map got deallocated and a new hash map got allocated at the same memory address and successfully stored to stm->reference. Then my thread would continue, find that current == stm->reference, erroneously conclude that no change to stm->reference had occured since acquiring the reference and preform the swap, thus losing some updates.

The lesson here is: Never use pointers that have been free'd, even if only to compare addresses!

Why does this race condition occur only with -O3 and only with a certain, seemingly equivalent code order?

1 Answers1