55

Why does the following code crash both on Visual Studio and GCC?

For it to crash it requires the range-based for loop, std::map, std::string and taking a reference to the string. If I remove any one of them it will work.

#include <iostream>
#include <string>
#include <map>
using namespace std;

struct S
{
    map<string, string> m;

    S()
    {
        m["key"] = "b";
    }

    const string &func() const
    {
        return m.find("key")->second;
    }
};

int main()
{
    for (char c : S().func())
        cout << c;

    return 0;
}

Ideone link: http://ideone.com/IBmhDH

isanae
  • 3,007
  • 1
  • 19
  • 41
kynnysmatto
  • 3,242
  • 20
  • 24

2 Answers2

65

The range initialization line of a for(:) loop does not extend lifetime of anything but the final temporary (if any). Any other temporaries are discarded prior to the for(:) loop executing.

Now, do not despair; there is an easy fix to this problem. But first a walk through of what is going wrong.

The code for(auto x:exp){ /* code */ } expands to, basically:

{
  auto&& __range=exp;
  auto __it=std::begin(__range);
  auto __end=std::end(__range);
  for(; __it!=__end;++__it){
    auto x=*__it;
    /* code */
  }
}

(With a modest lies on the __it and __end lines, and all variables starting with __ have no visible name. Also I am showing C++17 version, because I believe in a better world, and the differences do not matter here.)

Your exp creates a temporary object, then returns a reference to within it. The temporary dies after that line, so you have a dangling reference in the rest of the code.

Fixing it is relatively easy. To fix it:

std::string const& func() const& // notice &
{
    return m.find("key")->second;
}
std::string func() && // notice &&
{
    return std::move(m.find("key")->second);
}

do rvalue overloads and return moved-into values by value when consuming temporaries instead of returning references into them.

Then the

auto&& __range=exp;

line does reference lifetime extension on the by-value returned string, and no more dangling references.

As a general rule, never return a range by reference to a parameter that could be an rvalue.


Appendix: Wait, && and const& after methods? rvalue references to *this?

C++11 added rvalue references. But the this or self parameter to functions is special. To select which overload of a method based on the rvalue/lvalue-ness of the object being invoked, you can use & or && after the end of the method.

This works much like the type of a parameter to a function. && after the method states that the method should be called only on non-const rvalues; const& means it should be called for constant lvalues. Things that don't exactly match follow the usual precidence rules.

When you have a method that returns a reference into an object, make sure you catch temporaries with a && overload and either don't return a reference in those cases (return a value), or =delete the method.

Community
  • 1
  • 1
Yakk - Adam Nevraumont
  • 235,777
  • 25
  • 285
  • 465
  • Wasn't there a DR to address the lifetime issue of the range-based `for` loop? – Morwenn Dec 04 '16 at 15:36
  • @Morwenn: This combination (returning a temporary that was passed in as a parameter, even `this`) causes a lifetime problem in any context, the ranged-for is not related. – Ben Voigt Dec 04 '16 at 16:47
  • @cat where did you get the idea that undefined behaviour was defined to crash? Morwenn claims to be aware of a DR in the area, I do not know of it. – Yakk - Adam Nevraumont Dec 04 '16 at 20:20
  • 2
    @Yakk: It would be educational if you could link to somewhere explaining the qualifiers after the method parameters. Most people (like me) know about `const`-qualified methods - but not about `const&`- or `&&`-qualified methods. – einpoklum Dec 04 '16 at 22:09
  • @Yakk There's an EWG issue and a couple papers. – T.C. Dec 05 '16 at 19:32
  • @BenVoigt It does not cause a lifetime problem in *any* context. It is perfectly valid to return a reference into a temporary when calling a pure function. Consider the difference between calling the function `foo` in this example and using range-based for. It seems to me that the range-based for should act more like `foo`, with all the caveats and hang-ups therein. http://coliru.stacked-crooked.com/a/c8b8b511c438f028 – caps Apr 24 '17 at 23:01
  • @BenVoigt This is certainly very much about range-based-for, in my view. Things like `foo(std::string("abc").c_str())` work fine; but `for (char c : std::string("abc").c_str()) {}` do not. Some people, myself included, find this difference surprising. – Don Hatch Apr 16 '20 at 14:57
  • 1
    @DonHatch: It'll fail just as fast in the traditional for: `for( const char* p = std::string("abc").c_str(); *p; ++p )` It is not specific to range-based for. – Ben Voigt Apr 16 '20 at 15:39
  • 2
    @BenVoigt This is indeed all about range based for: https://en.cppreference.com/w/cpp/language/range-for#Temporary_range_expression "If range_expression returns a temporary, its lifetime is extended until the end of the loop, ... but beware that the lifetime of any temporary within range_expression is not extended." Please read issue 900 (and dup 1498) in http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html which describes the issue well. My interpretation is that extending the lifetime of only the top-level temporary was a goof; hopefully it will be corrected some day. – Don Hatch Apr 16 '20 at 18:19
  • 1
    @DonHatch: "extending the lifetime of only the top-level temporary" is (1) inaccurate, in this example no temporary lifetime is being extended at all and (2) not a ranged-for choice; it follows directly from the "ranged-for is syntactic sugar for this other code". All reference declarations extend the lifetime of a temporary only if that temporary is the object that's directly bound. Ranged-for is syntactic sugar for a reference declaration plus some additional code, therefore it behaves as all reference declarations do. – Ben Voigt Apr 16 '20 at 18:33
  • @BenVoigt (1) You're right that in this case the top-level thing returned isn't a temporary (it's a reference); I don't think I've claimed otherwise, so I'm not following what you say is inaccurate in what I've said. (2) Yes, we know range-based-for can be viewed as syntactic sugar for , but that particular piece of code isn't sacred. The point of Daniel Frey's note on issue 900 is that we'd get arguably less surprising semantics if it was instead syntactic sugar for . – Don Hatch Apr 17 '20 at 03:08
  • @benv technically `std::string("hello").c_str()` returns a temporary pointer. Regardless, `[&](auto&& __range){ auto __begin = begin_expr(range); auto __end = end_expr(range); for (; __begin!=__end; ++__begin){ init_expr = *__begin; /* body */ } }( range_expr )` with magic pass-through return would give a less bug prone result and break very little valid code. – Yakk - Adam Nevraumont Apr 17 '20 at 19:04
32
S().func()

This constructs a temporary object, and invokes a method that returns a reference to a std::string that's owned (indirectly) by the temporary object (the std::string is in the container that's a part of the temporary object).

After obtaining the reference, the temporary object gets destroyed. This also destroys the std::string that was owned (indirectly) by the temporary object.

After that point, any further usage of the referenced object becomes undefined behavior. Such as iterating over its contents.

This is a very common pitfall, when it comes to using range iteration. Yours truly is also guilty of getting tripped over this.

Sam Varshavchik
  • 84,126
  • 5
  • 57
  • 106
  • 1
    Are there conventions about **not** returning a `const T&` then? Or is it the case this is *usually* efficient for complex class members and just happens to wreck iterators? – Jack Deeth Dec 04 '16 at 02:01
  • Ok, that's interesting. I wonder why the temporary object gets destroyed, though? If I say cout << S().func()[0]; then the temporary S doesn't get destroyed, does it? What makes the range-based for loop different? – kynnysmatto Dec 04 '16 at 02:03
  • 1
    Because temporaries get destroyed at the end of the expression. For `cout << ... ;`, the semicolon is the end of the expression. For range iteration, it's the closing `)` (roughly speaking). There are also some situations where temporaries would also be destroyed in the middle of the expression, but that's neither here, nor there. – Sam Varshavchik Dec 04 '16 at 02:05
  • 1
    ... and the nice thing about conventions, is that everyone has their own conventions. The convention here, basically, is to avoid anything that smells like a temporary object, when setting up a range iteration. – Sam Varshavchik Dec 04 '16 at 02:07
  • Ok now I think I get it. So the range-based for loop probably basically works in a way such that it stores the result of the S().func() into some internal variable, i.e. const string &value = S().func(); and then holds onto "value", at which point the S() part gets destroyed because the statement of saving its value has already ended. – kynnysmatto Dec 04 '16 at 02:13
  • 1
    Correct. The first part of the "Explanation" section from http://en.cppreference.com/w/cpp/language/range-for basically gives the equivalent code that range iteration produces. In my case, when I tripped over this, I tripped over this so badly, that I never forgot it. – Sam Varshavchik Dec 04 '16 at 02:17
  • 2
    @kynnysmatto: The value of the range expression is bound to a reference. If the value happens to be a prvalue, you get lifetime extension, but if it's not, then you don't. – Kerrek SB Dec 04 '16 at 02:23