27

I'm currently using a library that uses code like

T& being_a_bad_boy()
{
    return *reinterpret_cast<T*>(0);
}

to make a reference to a T without there actually being a T. This is undefined behavior, specifically noted to be unsupported by the standard, but it's not an unheard of pattern.

I am curious if there are any examples or platforms or usages that show that in practice this can cause problems. Can anyone provide some?

TemplateRex
  • 65,583
  • 16
  • 147
  • 283
Mike Graham
  • 64,557
  • 13
  • 91
  • 125
  • I thought dereferencing a null pointer caused an access violation? – Ell Feb 21 '12 at 19:17
  • 2
    @Ell, No, it is undefined behavior. In practice, on most (all?) platforms, you won't actually crash anything until you try to *use* the dereferenced NULL. – Mike Graham Feb 21 '12 at 19:19
  • 1
    I'd be really curious to find out an answer, too, but my obvious follow-up question to you would be "why?" Is it just curiosity, or do you have a practical reason? – Sergey Kalinichenko Feb 21 '12 at 19:22
  • @dasblinkenlight, This pattern is in wide use in the wild. I want to know a) how worried this should make me and b) if there is a motivating practical example to put forth the pain of refactoring code not to work this way. – Mike Graham Feb 21 '12 at 19:26
  • 1
    this question is not constructive as _Thou Shall Not Dereference A NULL Pointer!_ – moooeeeep Feb 21 '12 at 19:32
  • 2
    An optimizing compiler can assume that your code's behavior is defined, and perform transformations that depend on that assumption. I don't have a concrete example, but if it were going to misbehave that would be the most likely reason. – Keith Thompson Feb 21 '12 at 19:40
  • 5
    I find your "This pattern is in wide use in the wild." comment disturbing. If that really is your perception, I recommend you find a different "wild" to work in -- if for no other reason, just because you can't ever be sure if 'T a = ;' is going to crash... – mcmcc Feb 24 '12 at 03:39
  • I had the same question but they used it for function arguments: http://stackoverflow.com/q/657964/79455 – rve Feb 26 '12 at 11:04
  • 2
    @mcmcc, Your commitment to sane practices is well-taken, but your assessment of the the C++ ecosystem strikes me as optimistic (or flat out naïve). – Mike Graham Mar 07 '12 at 22:19
  • 1
    @MikeGraham: Well, I've been a professional C++ developer for >15 years so "naive" probably doesn't apply. I don't work in anything like an ivory tower and I never have. I'm giving you this advice not because I don't believe what you're suggesting happens -- I'm telling you this because such a programming environment is not healthy for your mind or your career. There's dumb code and then there's insanity. being_a_bad_boy() qualifies as the latter for what I hope are obvious reasons. It is so not right, it's _not even wrong_. FWIW... – mcmcc Mar 08 '12 at 04:28
  • 1
    There are a number of usages "in the wild" that really should be killed off. Like assuming memory that was `malloc`ed can be `delete`d, or that memory that was `new`ed can be `free`d. Yes, I've seen that assumption show up in a highly respected open source library. – Max Lybbert Apr 12 '12 at 17:21

6 Answers6

89

Classically, compilers treated "undefined behavior" as simply an excuse not to check for various types of errors and merely "let it happen anyway." But contemporary compilers are starting to use undefined behavior to guide optimizations.

Consider this code:

int table[5];
bool does_table_contain(int v)
{
    for (int i = 0; i <= 5; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

Classical compilers wouldn't notice that your loop limit was written incorrectly and that the last iteration reads off the end of the array. It would just try to read off the end of the array anyway, and return true if the value one past the end of the array happened to match.

A post-classical compiler on the other hand might perform the following analysis:

  • The first five times through the loop, the function might return true.
  • When i = 5, the code performs undefined behavior. Therefore, the case i = 5 can be treated as unreachable.
  • The case i = 6 (loop runs to completion) is also unreachable, because in order to get there, you first have to do i = 5, which we have already shown was unreachable.
  • Therefore, all reachable code paths return true.

The compiler would then simplify this function to

bool does_table_contain(int v)
{
    return true;
}

Another way of looking at this optimization is that the compiler mentally unrolled the loop:

bool does_table_contain(int v)
{
    if (table[0] == v) return true;
    if (table[1] == v) return true;
    if (table[2] == v) return true;
    if (table[3] == v) return true;
    if (table[4] == v) return true;
    if (table[5] == v) return true;
    return false;
}

And then it realized that the evaluation of table[5] is undefined, so everything past that point is unreachable:

bool does_table_contain(int v)
{
    if (table[0] == v) return true;
    if (table[1] == v) return true;
    if (table[2] == v) return true;
    if (table[3] == v) return true;
    if (table[4] == v) return true;
    /* unreachable due to undefined behavior */
}

and then observe that all reachable code paths return true.

A compiler which uses undefined behavior to guide optimizations would see that every code path through the being_a_bad_boy function invokes undefined behavior, and therefore the being_a_bad_boy function can be reduced to

T& being_a_bad_boy()
{
    /* unreachable due to undefined behavior */
}

This analysis can then back-propagate into all callers of being_a_bad_boy:

void playing_with_fire(bool match_lit, T& t)
{
    kindle(match_lit ? being_a_bad_boy() : t);
} 

Since we know that being_a_bad_boy is unreachable due to undefined behavior, the compiler can conclude that match_lit must never be true, resulting in

void playing_with_fire(bool match_lit, T& t)
{
    kindle(t);
} 

And now everything is catching fire regardless of whether the match is lit.

You may not see this type of undefined-behavior-guided optimization in current-generation compilers much, but like hardware acceleration in Web browsers, it's only a matter of time before it starts becoming more mainstream.

chikuba
  • 3,988
  • 6
  • 39
  • 74
Raymond Chen
  • 42,606
  • 11
  • 86
  • 125
  • 2
    Generally, if compilers detect undefined behavior, right now they at least issue a warning about it. I doubt they'll intentionally hide this from the user. – bdow Feb 28 '12 at 15:55
  • 12
    @bdow Check out [part 3 of the series I linked to](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html), specifically the part titled "Why can't you warn when optimizing based on undefined behavior?" – Raymond Chen Feb 28 '12 at 17:00
  • 1
    You're right; because the compiler performs optimizations on code that may have been mangled by previous optimizations, it can't just throw up its arms there. What I was trying to say is that hopefully the compiler will detect and flag undefined behavior in code -as originally written-. Maybe compilers aren't as good at that as I thought. – bdow Feb 28 '12 at 22:20
  • 7
    @bdow this is old, but a big problem is that C and C++ programmers are very accustomed to writing code that has undefined behavior but is dynamically unreachable (which is perfectly legal), especially in macros, templates, etc...a compiler can't reliably filter out those warnings without solving the halting problem – Stephen Lin Mar 17 '13 at 19:22
  • Here's an example of [a program that broke as a result of this optimization](http://stackoverflow.com/questions/40911714/undefined-pointer-behaviour) – Raymond Chen Dec 01 '16 at 13:33
  • @RaymondChen: This answer is very interesting. Unfortunately, the link that you posted in your previous comment seems to no longer be valid. Do you happen to have any other examples of programs breaking due to this optimization? – Andreas Wenzel Mar 10 '20 at 23:43
  • The link is valid if you have permission to view deleted questions. I'm sure there are plenty of other examples, but I don't have links to them offhand. – Raymond Chen Mar 11 '20 at 01:45
19

The largest problem with this code isn't that it's likely to break - it's that it defies an implicit assumption programmers have about references that they will always be valid. This is just asking for trouble when someone unfamiliar with the "convention" runs into this code.

There's a potential technical glitch too. Since references are only allowed to refer to valid variables without undefined behavior, and no variable has the address NULL, an optimizing compiler is allowed to optimize out any checks for nullness. I haven't actually seen this done but it is possible.

T &bad = being_a_bad_boy();
if (&bad == NULL)  // this could be optimized away!

Edit: I'm going to shamelessly steal from a comment by @mcmcc and point out that this common idiom is likely to crash because it's using an invalid reference. According to Murphy's Law it will be at the worst possible moment, and of course never during testing.

T bad2 = being_a_bad_boy();

I also know from personal experience that the effects of an invalid reference can propagate far from where the reference was generated, making debugging pure hell.

T &bad3 = being_a_bad_boy();
bad3.do_something();

T::do_something()
{
    use_a_member_of_T();
}

T::use_a_member_of_T()
{
    member = get_unrelated_value(); // crash occurs here, leaving you wondering what happened in get_unrelated_value
}
Community
  • 1
  • 1
Mark Ransom
  • 271,357
  • 39
  • 345
  • 578
  • I'm going to camp on with Mark. There's nothing technically 'wrong' with the code. The reference is an 'alias' to the NULL. There is no dereference, there is no problem. However this practice invariably leads to code like this: 'if (&refVariable == NULL )'. At this point one has to ask 'man, I thought references were supposed to get me away from all this pointer checking!'. – Joe Mar 04 '12 at 23:52
  • @Joe, especially when you consider that the pointer check is guaranteed to work while the reference check might not! – Mark Ransom Mar 05 '12 at 03:01
  • 1
    "... implicit assumption programmers have about references that they will always be valid." I hope I'm not playing language lawyer (I'm not qualified), but a reference cannot be `NULL`, which it appears to be in this case. That surely violates the semantics of a reference. I don't see how a `NULL` reference can be valid. – jww May 21 '13 at 05:26
  • @noloader, you're right - a valid reference can't be NULL. But an invalid one can. And my point was that both programmers and the compiler should be able to assume they'll never see an invalid reference, or very bad things can happen. – Mark Ransom May 21 '13 at 05:41
  • @Joe _"The reference is an 'alias' to the NULL. There is no dereference, there is no problem"_ That is a false and dangerous statement. You're conflating two things: the logical "dereference" in this abstract program written in C++, which has undefined behaviour, period; and the "physical" dereference performed by the executing computer and written into the machine code to actually locate data -- this is unlikely to be reached, but it doesn't matter, because you've already got UB to contend with (see accepted answer). In the context of C++, this absolutely _is_ a dereference, and it's invalid. – Lightness Races in Orbit Jun 17 '19 at 23:39
1

I would expect that on most platforms, the compiler will convert all references into pointers. If that assumption is true, then this will be identical to just passing around a NULL pointer, which is fine as long as you never use it. The question, then is whether there are any compilers that handle references in some way other than just converting them to pointers. I don't know of any such compilers, but I suppose it's possible that they exist.

Edward Loper
  • 13,458
  • 5
  • 38
  • 50
  • 1
    Your first sentence is correct, but the remainder of your comment doesn't really follow from it. Consider something like `for(int i = 0; i >= 0; ++i)`. On most platforms, overflow of `++i` will cause `i` to become negative, exiting the loop -- *except* that an optimizing compiler might notice that `i` is strictly increasing, and remove the unnecessary test for `i >= 0`. (Optimizing compilers really do notice this sort of thing, and really do make these sorts of optimizations that depend on programs' not invoking undefined behavior.) Similarly with references: a valid program with [continued] – ruakh Feb 22 '12 at 00:18
  • [continued] references might be converted into an equivalent program with pointers, but that doesn't mean that the same applies to invalid programs that use references to invoke undefined behavior. – ruakh Feb 22 '12 at 00:19
1

Use the NullObject pattern.

class Null_T : public T
{
public:
    // implement virtual functions to do whatever
    // you'd expect in the null situation
};

T& doing_the_right_thing()
{
    static Null_T null;
    return null;
}
Peter Wood
  • 21,348
  • 4
  • 53
  • 90
  • This can so often be slow and buggy. (The original illegal code is fast and buggy.) – Mike Graham Aug 21 '13 at 12:33
  • What's slow and buggy about this? Your original code assumes `T&` won't be used. I don't know what the performance of creating the static variable would be on your system. What do you mean by *often*? Also, thanks for replying after a year and a half (c:. – Peter Wood Aug 21 '13 at 12:57
  • The slow part is that everything in `T` has to be `virtual` to allow `Null_T` to override it. – Davis Herring Oct 12 '18 at 16:41
  • @DavisHerring Depend on abstract types. – Peter Wood Oct 12 '18 at 19:29
1

The important thing to remember is that you have a contract with your users. If you're trying to return a reference to a null pointer, undefined behavior is now part if your function's interface. If your users are all prepared to accept this, then that's on them... but I would try to avoid it if at all possible.

If your code can result in an invalid object, then either have it return a pointer (preferably a smart pointer, but that's another discussion), use the null object pattern mentioned above (boost::optional may be useful here), or throw an exception.

bdow
  • 181
  • 4
1

I don't know if this is problems enough for you, or near enough to your "use case", this crashes for me in gcc (on x86_64) :

int main( )
{
    volatile int* i = 0;
    *i;
}

That said, we should keep in mind that it is always UB, and compilers might change their mind later, so that today it works, tomorrow not.

Another not so obvious bad thing will happen when you call a virtual function on a null pointer (due to usually being implemented via vptr to vtable), and as such of course this applies to the (in standard C++ not existing) null reference.

Btw. I even heard that architectures exist, where even copying around a non-null pointer to invalid memory will trap, maybe there exists also some out there which makes a distinction between pointer and reference.

PlasmaHH
  • 14,585
  • 5
  • 39
  • 55