1087

I have the following code.

#include <iostream>

int * foo()
{
    int a = 5;
    return &a;
}

int main()
{
    int* p = foo();
    std::cout << *p;
    *p = 8;
    std::cout << *p;
}

And the code is just running with no runtime exceptions!

The output was 58

How can it be? Isn't the memory of a local variable inaccessible outside its function?

SuperWill
  • 9
  • 5
Avi Shukron
  • 5,923
  • 8
  • 45
  • 82
  • 14
    this won't even compile as is; if you fix the nonforming business, gcc will still warn `address of local variable ‘a’ returned`; valgrind shows `Invalid write of size 4 [...] Address 0xbefd7114 is just below the stack ptr` – sehe Jun 22 '11 at 14:34
  • 4
    In some platforms/compilers (especially old compilers for DOS) you can even write through NULL pointer and everything seems OK until you overwrite something important (like the code being executed). :) – Serge Dundich Jun 22 '11 at 14:43
  • 1
    @Serge that's because most OS-es these days have a write-protected zero-page however not all of them do! – Jasper Bekkers Jun 22 '11 at 15:26
  • 79
    @Serge: Back in my youth I once worked on some kinda tricky zero-ring code that ran on the Netware operating system that involved cleverly moving around the stack pointer in a way not exactly sanctioned by the operating system. I'd know when I'd made a mistake because often the stack would end up overlapping the screen memory and I could just watch the bytes get written right onto the display. You can't get away with that sort of thing these days. – Eric Lippert Jun 23 '11 at 04:23
  • 3
    Ah man this makes me miss my C++ / DCOM / VB days. We had a home-grown red-black tree that had invalid pointer access issues. I had the distinct pleasure of debugging it. – xanadont Jun 23 '11 at 04:48
  • @Jasper Bekkers: "that's because most OS-es these days have a write-protected zero-page however not all of them do!" Yea. I know. – Serge Dundich Jun 23 '11 at 05:09
  • 4
    @Xeo - I think you misunderstood me... I know it is unsafe, thats for sure! I thought it would be *impossible*. I guess i should get used to the freedom that C++ gives the developer.. – Avi Shukron Jun 23 '11 at 06:21
  • 23
    lol. I needed to read the question and some answers before I even understood where the problem is. Is that actually a question about variable's access scope? You don't even use 'a' outside your function. And that is all there is to it. Throwing around some memory references is a totally different topic from variable scope. – erikbwork Jun 23 '11 at 06:23
  • 2
    @Tomalak please provide a dupe link and I'm happy to vote for close. We can ask a moderator to merge with the question that this one is a dupe of. – Johannes Schaub - litb Jun 23 '11 at 11:29
  • 10
    Dupe answer doesn't mean dupe question. A lot of the dupe questions that people proposed here are completely different questions that happen to refer to the same underlying symptom... but the questioner has know way of knowing that so they should remain open. I closed an older dupe and merged it into this question which should stay open because it has a very good answer. – Joel Spolsky Jun 23 '11 at 14:36
  • 16
    @Joel: If the answer here is good, it should be ___merged into older questions___, of which this is a dupe, not the other way around. And this ___question___ is indeed a dupe of the other questions proposed here and then some (even though some of the proposed are a better fit than others). Note that I think Eric's answer is good. (In fact, I flagged this question for merging the answers into one of the older questions in order to salvage the older questions.) – sbi Jun 23 '11 at 15:20
  • 5
    @Joel dupe means (quote) "This question covers exactly the same ground as earlier questions on this topic;", not "This question covers exactly the same ground as a newer question on this topic;". Either your merge or the "close" popup has it backwards. – Johannes Schaub - litb Jun 23 '11 at 16:49
  • 1
    But this way people don't manually have to click the forward link... so it may have been a good idea. But still the merge was backwards. Trying to justify by saying it was the right way around won't work. – Johannes Schaub - litb Jun 23 '11 at 16:56
  • 2
    Weird question w/ so much love, and I used to think that C developers MUST understand the how hardware works, the stack allocation has been the same forever. – bestsss Jun 24 '11 at 07:57
  • 1
    @Maxpm, zero page on 8086 (and 0000:0000 too) has its usages - interrupt vectors, etc, so addressing it was quite normal. Back in the day viruses (and anti-viruses) used to overwrite quite a bit of. – bestsss Jun 24 '11 at 08:01
  • So memory is overwritten. Otherwise you would get '55' – Martin York Jun 24 '11 at 21:57
  • i mean it is not overwritten after i exit a function foo. And i can output it even if the local variable were destroyed. – Stals Jun 24 '11 at 21:59
  • @Stals Undefined behaviour is undefined. You shouldn't use it, and it's not productive to reason about it. Of course the compiler doesn't waste cycles zeroing out memory that belonged to something that is out of scope. You still can't write code that uses something outside of its scope, as defined by the language. If you don't, your code is invalid, whether or not it happens to produce the 'expected' result. – underscore_d Jul 07 '20 at 14:20

21 Answers21

4874

How can it be? Isn't the memory of a local variable inaccessible outside its function?

You rent a hotel room. You put a book in the top drawer of the bedside table and go to sleep. You check out the next morning, but "forget" to give back your key. You steal the key!

A week later, you return to the hotel, do not check in, sneak into your old room with your stolen key, and look in the drawer. Your book is still there. Astonishing!

How can that be? Aren't the contents of a hotel room drawer inaccessible if you haven't rented the room?

Well, obviously that scenario can happen in the real world no problem. There is no mysterious force that causes your book to disappear when you are no longer authorized to be in the room. Nor is there a mysterious force that prevents you from entering a room with a stolen key.

The hotel management is not required to remove your book. You didn't make a contract with them that said that if you leave stuff behind, they'll shred it for you. If you illegally re-enter your room with a stolen key to get it back, the hotel security staff is not required to catch you sneaking in. You didn't make a contract with them that said "if I try to sneak back into my room later, you are required to stop me." Rather, you signed a contract with them that said "I promise not to sneak back into my room later", a contract which you broke.

In this situation anything can happen. The book can be there -- you got lucky. Someone else's book can be there and yours could be in the hotel's furnace. Someone could be there right when you come in, tearing your book to pieces. The hotel could have removed the table and book entirely and replaced it with a wardrobe. The entire hotel could be just about to be torn down and replaced with a football stadium, and you are going to die in an explosion while you are sneaking around.

You don't know what is going to happen; when you checked out of the hotel and stole a key to illegally use later, you gave up the right to live in a predictable, safe world because you chose to break the rules of the system.

C++ is not a safe language. It will cheerfully allow you to break the rules of the system. If you try to do something illegal and foolish like going back into a room you're not authorized to be in and rummaging through a desk that might not even be there anymore, C++ is not going to stop you. Safer languages than C++ solve this problem by restricting your power -- by having much stricter control over keys, for example.

UPDATE

Holy goodness, this answer is getting a lot of attention. (I'm not sure why -- I considered it to be just a "fun" little analogy, but whatever.)

I thought it might be germane to update this a bit with a few more technical thoughts.

Compilers are in the business of generating code which manages the storage of the data manipulated by that program. There are lots of different ways of generating code to manage memory, but over time two basic techniques have become entrenched.

The first is to have some sort of "long lived" storage area where the "lifetime" of each byte in the storage -- that is, the period of time when it is validly associated with some program variable -- cannot be easily predicted ahead of time. The compiler generates calls into a "heap manager" that knows how to dynamically allocate storage when it is needed and reclaim it when it is no longer needed.

The second method is to have a “short-lived” storage area where the lifetime of each byte is well known. Here, the lifetimes follow a “nesting” pattern. The longest-lived of these short-lived variables will be allocated before any other short-lived variables, and will be freed last. Shorter-lived variables will be allocated after the longest-lived ones, and will be freed before them. The lifetime of these shorter-lived variables is “nested” within the lifetime of longer-lived ones.

Local variables follow the latter pattern; when a method is entered, its local variables come alive. When that method calls another method, the new method's local variables come alive. They'll be dead before the first method's local variables are dead. The relative order of the beginnings and endings of lifetimes of storages associated with local variables can be worked out ahead of time.

For this reason, local variables are usually generated as storage on a "stack" data structure, because a stack has the property that the first thing pushed on it is going to be the last thing popped off.

It's like the hotel decides to only rent out rooms sequentially, and you can't check out until everyone with a room number higher than you has checked out.

So let's think about the stack. In many operating systems you get one stack per thread and the stack is allocated to be a certain fixed size. When you call a method, stuff is pushed onto the stack. If you then pass a pointer to the stack back out of your method, as the original poster does here, that's just a pointer to the middle of some entirely valid million-byte memory block. In our analogy, you check out of the hotel; when you do, you just checked out of the highest-numbered occupied room. If no one else checks in after you, and you go back to your room illegally, all your stuff is guaranteed to still be there in this particular hotel.

We use stacks for temporary stores because they are really cheap and easy. An implementation of C++ is not required to use a stack for storage of locals; it could use the heap. It doesn't, because that would make the program slower.

An implementation of C++ is not required to leave the garbage you left on the stack untouched so that you can come back for it later illegally; it is perfectly legal for the compiler to generate code that turns back to zero everything in the "room" that you just vacated. It doesn't because again, that would be expensive.

An implementation of C++ is not required to ensure that when the stack logically shrinks, the addresses that used to be valid are still mapped into memory. The implementation is allowed to tell the operating system "we're done using this page of stack now. Until I say otherwise, issue an exception that destroys the process if anyone touches the previously-valid stack page". Again, implementations do not actually do that because it is slow and unnecessary.

Instead, implementations let you make mistakes and get away with it. Most of the time. Until one day something truly awful goes wrong and the process explodes.

This is problematic. There are a lot of rules and it is very easy to break them accidentally. I certainly have many times. And worse, the problem often only surfaces when memory is detected to be corrupt billions of nanoseconds after the corruption happened, when it is very hard to figure out who messed it up.

More memory-safe languages solve this problem by restricting your power. In "normal" C# there simply is no way to take the address of a local and return it or store it for later. You can take the address of a local, but the language is cleverly designed so that it is impossible to use it after the lifetime of the local ends. In order to take the address of a local and pass it back, you have to put the compiler in a special "unsafe" mode, and put the word "unsafe" in your program, to call attention to the fact that you are probably doing something dangerous that could be breaking the rules.

For further reading:

Callum Watkins
  • 2,444
  • 2
  • 27
  • 42
Eric Lippert
  • 612,321
  • 166
  • 1,175
  • 2,033
  • 8
    If the hotel were about to be replaced by a football stadium, wouldn't you notice the lack of people? Or the monstrous army of giant bulldozers outside? – Mateen Ulhaq Jun 23 '11 at 03:34
  • 60
    @muntoo: Unfortunately it's not like the operating system sounds a warning siren before it decommits or deallocates a page of virtual memory. If you're mucking around with that memory when you don't own it anymore the operating system is perfectly within its rights to take down the entire process when you touch a deallocated page. Boom! – Eric Lippert Jun 23 '11 at 03:41
  • 6
    I like the analogy, but nearly all hotels use programmable key cards that get locked out at a specified time, or when a new key is issued for that room, whichever comes first. And I would imagine the very few hotels that do not use such a system would be very insistent that you return your key at checkout. – Kyle Cronin Jun 23 '11 at 04:50
  • 13
    That's a great analogy, but bashing C++ at the end is not OK. C++ doesn't impose too many restrictions, but that lack of restrictions normally pays back in measurable performance gains. – cyberguijarro Jun 23 '11 at 05:31
  • 86
    @Kyle: Only safe hotels do that. The unsafe hotels get measurable profit gains from not having to waste time on programming keys. – Alexander Torstling Jun 23 '11 at 05:35
  • 28
    @cyberguijarro I don't think he is bashing C++ at the end. C++ is not safe, and as you say, this is a good thing in a lot of situations. Likewise safer languages are less powerful, but can be easier to use. They're just different. – Edd Jun 23 '11 at 06:50
  • 512
    @cyberguijarro: That C++ is not memory safe is simply a fact. It's not "bashing" anything. Had I said, for example, "C++ is a horrid mishmash of under-specified, overly-complex features piled on top of a brittle, dangerous memory model and I am thankful every day I no longer work in it for my own sanity", that would be bashing C++. Pointing out that it's not memory safe is *explaining* why the original poster is seeing this issue; it's answering the question, not editorializing. – Eric Lippert Jun 23 '11 at 07:27
  • 33
    @Eric: C# (really, .NET) isn't "safe" either in that respect. I can combine `Math.Random`, `IntPtr`, and `Marshal.Copy` and cause total chaos (no `unsafe` keyword nor `/unsafe` compiler switch needed). Safety comes from adhering to the contract, not from language design (although a language can and should make coding in a style that adheres to the contract as easy as possible, and provide warning when the contract is violated as much as possible.) – Ben Voigt Jun 23 '11 at 08:33
  • 1
    Nice explanation Eric. Quick question! Which language would you say is a safer language?! – Bitmap Jun 23 '11 at 08:33
  • 31
    @Bitmap: LOGO is quite safe. – Ben Voigt Jun 23 '11 at 08:34
  • 8
    @Ben well duh, obviously there are ways to become unsafe, which include library functions marked as such (so permissions kick in if required). If someone did a LOGO implementation with a library function allowing intptr moral equivalents then it would stop being safe by your metric too. – ShuggyCoUk Jun 23 '11 at 09:30
  • 54
    Strictly speaking the analogy should mention that the receptionist at the hotel was quite happy for you to take the key with you. "Oh, do you mind if I take this key with me?" "Go ahead. Why would I care? I only work here". It doesn't become illegal until you try to use it. – philsquared Jun 23 '11 at 12:24
  • 2
    @PhilNash: It breaks down a little there, as the key is usually the property of the hotel. – Lightness Races in Orbit Jun 23 '11 at 13:22
  • 14
    @Ben: @ShuggyCoUk is right; that there are library functions that do horrible things if you misuse them is a property of those library functions, not the C# language. C# *the language* is both memory safe and type safe provided that you don't have "unsafe" code blocks in there. If you do, then it is every bit as memory-unsafe as C++. The point is to isolate areas of memory unsafety to areas that can be easily identified and thoroughly reviewed. – Eric Lippert Jun 23 '11 at 13:50
  • 10
    @Kyle Cronin your point only furthers the analogy. Back when C++ was invented, programmable card keys for hotels were less common or even nonexistent. Newer hotels have naturally adopted safer practices, as have newer languages. Even older hotels have been retrofitted with new locks, as has C++ (smart pointers anybody?) – Mark Ransom Jun 23 '11 at 14:58
  • C++ not being memory safe makes it pragmatic. Some tricks can be used, and those hacks would not be there if C++ was too safe. – Thaddee Tyl Jun 23 '11 at 16:50
  • 24
    @Thaddee: First off, there are plenty of pragmatic languages that are memory safe. However, the problem with C++ is not that it is unsafe. The problem it is that it is *so easy* to *accidentally* do something massively unsafe and not realize that you're doing so until a you crash the end-user's machine. Memory-unsafe languages often are quite useful, I agree, but there should be a way of isolating that unsafeness to specifically those "tricky, hacky" bits of code that really need it. – Eric Lippert Jun 23 '11 at 16:58
  • 6
    Eric: This question might be getting traffic because it was the top post on Hacker News: http://news.ycombinator.com/item?id=2686580. Regardless, 1100 upvotes in 24 hours?! That must be a record, by far. – Steve Tjoa Jun 23 '11 at 19:06
  • 145
    Please, please at least consider writing a book one day. I would buy it even if it was just a collection of revised and expanded blog posts, and I'm sure so would a lot of people. But a book with your original thoughts on various programming-related matters would be a great read. I know that it's incredible hard to find the time for it, but please consider writing one. – Dyppl Jun 24 '11 at 06:43
  • 17
    @Dyppl: Thanks for the kind words. Having written a couple of books already I am well aware of how much work it is! I have considered turning the blog into a book and I might at some point if I can find both the time and a willing publisher. – Eric Lippert Jun 24 '11 at 16:19
  • Actually there are three basic techniques for managing memory in C and C++. There are the two that you mentioned plus `static` memory where the variables have process lifetime. And if you don't mind getting very technical there is also register file storage, but this is apparently ignored by current compilers. – ThomasMcLeod Jun 05 '13 at 20:26
  • So what exactly is the frequency of finding that book every time I sneak into the same room? Also, on what factors does this frequency depend on? – chosentorture Jun 27 '13 at 06:49
  • 6
    @chosentorture answer your question with science. Get a few hundred c compilers and try a few hundred different configurations of each and soon you will have excellent empirical data. Anything else is guessing. – Eric Lippert Jun 27 '13 at 12:59
  • I have written a lot of code in a lot of different languages. My least favorite language has been C++. I have no issues with unsafe languages - what I have issues with is a language so poorly designed (and then hacked upon to cover up those design flaws) that it is FAR TOO EASY to create code that segfaults. I'm dealing with an issue right now that was SUPPOSED to fix memory leaks. Now it segfaults. Fun. -_- – Lloyd Sargent Sep 30 '13 at 23:05
  • Does the same also apply to local variables inside the same function, that are declared in a different scope? I'm asking since in my experience both GCC and MSVC do a) not warn (even with -Wextra) about using a pointer to variable in a different scope, and b) create assembly that suggests they track the usage of every variable through pointers even beyond the variable's scope. example: void foo(void){ int i, *px; for (i=0;i<10;i+=*px) {int x=i+1; px=&x;} printf("*px=%i\n", *px); } – Timo Nov 04 '13 at 13:40
  • 3
    @timo you are required to never use an address to a local whose lifetime has ended. If you do and it happens to work, well, again, the runtime is not required to fail when you break the rules. Unsafe code is marked unsafe for a reason. – Eric Lippert Nov 04 '13 at 15:23
  • @EricLippert I think this is best analogy that I've ever seen related this topic but I have one confusion you wrote that "yours could be in the hotel's furnace" it means my value could be there in other location in the system or something else you try to explain ? – Vikas Verma Jan 25 '14 at 18:38
  • 7
    @VikasVerma: Some memory managers deliberately shred memory when it is no longer usable. The debug version of the Microsoft C runtime, for example, sets unused memory to `0xCC` because (1) it is very easy to see in the debugger memory window that a particular block of memory is now no longer valid, and (2) that is the "break into the debugger" instruction code; if the shredded memory ever gets *executed* then the debugger will be activated. – Eric Lippert Jan 26 '14 at 05:57
  • I actually disagree with this as 'answer' after being sent the link to it, in that it didn't answer the clear cause of confusion for the poster. He clearly thinks that, similar to an object, a function contains its own local storage and therefore doesn't exist after the function is "destroyed". Only it's not actually destroyed. Unlike classes, they aren't a container for the variables used in them, they're items stored on the stack or in a register. However, this answer is a great analogy on how manipulated access of the stack can (and can't) work. Technical answer is more important, though. – Deji Apr 22 '14 at 15:11
  • 11
    @Deji: Your psychic powers are much stronger than mine; I have no idea what the original poster was thinking. – Eric Lippert Apr 22 '14 at 15:40
  • 2
    @ErricLippert It's not psychic powers, but more familiarity with the confusion, the example he's using and the actual question he asked. He asked if the memory was inaccessible, which means he likely thinks the memory doesn't exist any more. Both are untrue, the memory is accessible and it does exist. The reason to which is the difference in the way these are stored, which is why I think the 'answer' ought to focus on that on a technical level. – Deji Apr 22 '14 at 15:47
  • 4
    @VikasVerma It means that your drawer is being reconstructed. – joey rohan Aug 13 '14 at 18:24
  • 2
    @EricLippert: Thanks for the great answer. What is your opinion about C++11 & C++14 way of free store management using smart pointers? Can I now say that modern C++ is safe language because there is no need to use delete operator – Destructor Mar 02 '15 at 15:39
  • 2
    @meet: I am not an expert by any means on what has been added to C++ 11 and 14, though in talking with people who are experts, it sounds to me like there is a lot of good stuff in there. More generally I am happy to see that the C++ committee is willing to be both bold and active as they move the language towards something more modern and less error-prone. – Eric Lippert Mar 02 '15 at 16:20
  • 2
    @EricLippert: Ok. But question is why C++ won't stop me if I do something foolish? Wouldn't it be very nice if compiler gives me error when I attempt to take address of local variable? Why C++ provides so much freedom to the programmer? Or these are the problems C++ inherited from C? Your help will be appreciated. – Destructor Mar 02 '15 at 16:35
  • 9
    @meet: you should ask these questions of someone who is an expert in the design of C++; I would not care to speculate as to the motives of the C++ language designers. I would note that "stop the user from doing something foolish" does not appear to have been too high on the list of traits considered admirable by the designers of C. – Eric Lippert Mar 02 '15 at 16:37
  • 2
    @Meet Remember that C++ is a general propuse language. Total memory control is needed for a wide range of applications (cracking tools). – Edwin Rodríguez Aug 27 '15 at 17:04
  • 2
    @EricLippert: C++ doesn't define this behaviour, true. But I think that if you are running on x86, the architecture guarantees that you can safely write and read up to 128 bytes above the stack pointer (`esp`), without risking that the memory changes. Now, if the compiler doesn't compile any instructions that actively modify that memory (which is probably the case, since it will just increase esp and jump back when leaving the function), I think you could technically say that on x86, this is defined behaviour. Is this true? Any thoughts on that? – Martijn Courteaux Aug 27 '15 at 21:16
  • 2
    @MartijnCourteaux: Who says that the compiler is required to use esp to determine the locations of local variables? If a particular compiler vendor defines the behaviour for a particular implementation, then the behaviour is *implementation defined*. – Eric Lippert Aug 27 '15 at 22:58
  • @EricLippert: I don't see where you are going. Could it be true that on a specific architecture/compiler combo, this results in consistent behaviour, even if the local variables now sit above the stackpointer (`&var < esp`)? For example: i tried x86_64 with gcc -O0, and it produces code that I think will give consistent results. – Martijn Courteaux Aug 28 '15 at 11:11
  • @MartijnCourteaux: Undefined behavior can do *anything*. Consistent behavior is a subset of *anything*, so yes, that is possible. Where I am going is: you are asking whether *behaviour that is defined by a particular implementation* is a kind of *implementation-defined behavior*. Yes, it is. – Eric Lippert Aug 28 '15 at 14:39
  • In the last paragraph before your update you say something to the effect of _"C++ is not a safe language[...] Safer languages like C++[....]"_ Did you mean to say C# is safer? – canon Jan 29 '16 at 15:54
  • It would be nice if the answer at least once mentioned word undefined behaviour – gmoniava Jan 29 '16 at 20:31
  • @GiorgiMoniava: Comment noted. Consider writing an answer you like better; that way the whole site is improved. – Eric Lippert Jan 29 '16 at 21:06
  • @EricLippert Your answer is already pretty good, I don't attempt to criticize it, just wrote my opinion. Thanks. It won't be easy/realistic to write something better than current answers here. – gmoniava Jan 30 '16 at 10:37
  • 8
    I have to agree with @Dyppl that I would like to read a book written by you. Along with some of the blog posts/answers written by the jOOQ crew, or Josh Bloch/Goetz, your answers provide a really detailed and easily understandable material on behind the scenes/under the hood details regarding programming languages – Abdul Dec 15 '16 at 16:11
  • 1
    "C++ is not a safe language". And chainsaws aren't safe either but, as long as you use them properly, they're *so* much better than the alternative :-) Unless the alternative is Python, of course. – paxdiablo Sep 04 '20 at 01:39
  • 1
    @Destructor "But question is why C++ won't stop me if I do something foolish? Wouldn't it be very nice if compiler gives me error when I attempt to take address of local variable? Why C++ provides so much freedom to the programmer?" Let the designer of C++ answer that: https://www.stroustrup.com/bs_faq.html#unsafe – Jerry Jeremiah Nov 23 '20 at 01:48
279

What you're doing here is simply reading and writing to memory that used to be the address of a. Now that you're outside of foo, it's just a pointer to some random memory area. It just so happens that in your example, that memory area does exist and nothing else is using it at the moment. You don't break anything by continuing to use it, and nothing else has overwritten it yet. Therefore, the 5 is still there. In a real program, that memory would be re-used almost immediately and you'd break something by doing this (though the symptoms may not appear until much later!)

When you return from foo, you tell the OS that you're no longer using that memory and it can be reassigned to something else. If you're lucky and it never does get reassigned, and the OS doesn't catch you using it again, then you'll get away with the lie. Chances are though you'll end up writing over whatever else ends up with that address.

Now if you're wondering why the compiler doesn't complain, it's probably because foo got eliminated by optimization. It usually will warn you about this sort of thing. C assumes you know what you're doing though, and technically you haven't violated scope here (there's no reference to a itself outside of foo), only memory access rules, which only triggers a warning rather than an error.

In short: this won't usually work, but sometimes will by chance.

chue x
  • 17,865
  • 6
  • 52
  • 67
Rena
  • 3,863
  • 3
  • 18
  • 18
154

Because the storage space wasn't stomped on just yet. Don't count on that behavior.

msw
  • 40,500
  • 8
  • 77
  • 106
  • 1
    Man, that was the longest wait for a comment since, "What is truth? said jesting Pilate." Maybe it was a Gideon's Bible in that hotel drawer. And what happened to them, anyway? Notice they are no longer present, in London at least. I guess that under the Equalities legislation, you would need a library of religious tracts. – Rob Kent Jul 16 '15 at 16:47
  • I could have sworn that I wrote that long ago, but it popped up recently and found my response wasn't there. Now I have to go figure out your allusions above as I expect I'll be amused when I do >. – msw Jul 17 '15 at 03:17
  • 1
    Haha. Francis Bacon, one of Britain's greatest essayists, whom some people suspect wrote Shakespeare's plays, because they can't accept that a grammar school kid from the country, son of a glover, could be a genius. Such is the English class system. Jesus said, 'I am the Truth'. http://oregonstate.edu/instruct/phl302/texts/bacon/bacon_essays.html – Rob Kent Jul 17 '15 at 11:00
88

A little addition to all the answers:

if you do something like that:

#include<stdio.h>
#include <stdlib.h>
int * foo(){
    int a = 5;
    return &a;
}
void boo(){
    int a = 7;

}
int main(){
    int * p = foo();
    boo();
    printf("%d\n",*p);
}

the output probably will be: 7

That is because after returning from foo() the stack is freed and then reused by boo(). If you deassemble the executable you will see it clearly.

Michael
  • 123
  • 3
  • 12
  • 2
    Simple, but great example to understand the underlying stack theory.Just one test addition, declaring "int a = 5;" in foo() as "static int a = 5;" can be used to understand the scope and life time of a static variable. – control May 07 '13 at 19:02
  • 17
    -1 "for will **probably be 7**". The compiler might enregister a in boo. It might remove it because it's unnecessary. There is a good chance that *p will **not be 5**, but that doesn't mean that there is any particularly good reason why it will **probably be 7**. – Matt Oct 09 '13 at 19:16
  • 2
    It is called undefined behavior! – Francis Cugler Mar 27 '15 at 08:24
  • why and how `boo` reuses the `foo` stack ? aren't function stacks separated from each other, also I get garbage running this code on Visual Studio 2015 – ampawd Aug 22 '16 at 10:07
  • 1
    @ampawd it's almost a year old, but no, "function stacks" are not separated from each other. A CONTEXT has a stack. That context uses its stack to enter main, then descends into `foo()`, exists, then descends into `boo()`. `Foo()` and `Boo()` both enter with the stack pointer at the same location. This isn't however, behavior that should be relied upon. Other 'stuff' (like interrupts, or the OS) can use the stack between the call of `boo()` and `foo()`, modifying it's contents... – Russ Schultz Jul 16 '17 at 01:07
72

In C++, you can access any address, but it doesn't mean you should. The address you are accessing is no longer valid. It works because nothing else scrambled the memory after foo returned, but it could crash under many circumstances. Try analyzing your program with Valgrind, or even just compiling it optimized, and see...

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Charles Brunet
  • 18,389
  • 21
  • 77
  • 120
  • 5
    You probably mean you can attempt to access any address. Because most of the operating systems today will not let any program access any address; there are tons of safeguards to protect the address space. This is why there will not be another LOADLIN.EXE out there. – v010dya May 12 '15 at 13:21
68

You never throw a C++ exception by accessing invalid memory. You are just giving an example of the general idea of referencing an arbitrary memory location. I could do the same like this:

unsigned int q = 123456;

*(double*)(q) = 1.2;

Here I am simply treating 123456 as the address of a double and write to it. Any number of things could happen:

  1. q might in fact genuinely be a valid address of a double, e.g. double p; q = &p;.
  2. q might point somewhere inside allocated memory and I just overwrite 8 bytes in there.
  3. q points outside allocated memory and the operating system's memory manager sends a segmentation fault signal to my program, causing the runtime to terminate it.
  4. You win the lottery.

The way you set it up it is a bit more reasonable that the returned address points into a valid area of memory, as it will probably just be a little further down the stack, but it is still an invalid location that you cannot access in a deterministic fashion.

Nobody will automatically check the semantic validity of memory addresses like that for you during normal program execution. However, a memory debugger such as valgrind will happily do this, so you should run your program through it and witness the errors.

martijnn2008
  • 3,236
  • 3
  • 27
  • 38
Kerrek SB
  • 428,875
  • 83
  • 813
  • 1,025
  • 10
    I'm just going to write a program now that keeps on running this program so that `4) I win the lottery` – Aidiakapi Mar 11 '15 at 00:37
29

Did you compile your program with the optimiser enabled? The foo() function is quite simple and might have been inlined or replaced in the resulting code.

But I agree with Mark B that the resulting behavior is undefined.

Alec
  • 6,521
  • 7
  • 23
  • 48
gastush
  • 1,006
  • 6
  • 17
  • That's my bet. Optimizer dumped the function call. – Erik Aronesty Jun 22 '11 at 17:34
  • 9
    That is not necessary. Since no new function is called after foo(), the functions local stack frame is simply not yet overwritten. Add another function invocation after foo(), and the `5` will be changed... – Tomas Jun 23 '11 at 11:02
  • I ran the program with GCC 4.8, replacing cout with printf (and including stdio). Rightfully warns "warning: address of local variable ‘a’ returned [-Wreturn-local-addr]". Outputs 58 with no optimization and 08 with -O3. Strangely P does have an address, even though its value is 0. I expected NULL (0) as address. – Kevin May 04 '17 at 03:55
23

Your problem has nothing to do with scope. In the code you show, the function main does not see the names in the function foo, so you can't access a in foo directly with this name outside foo.

The problem you are having is why the program doesn't signal an error when referencing illegal memory. This is because C++ standards does not specify a very clear boundary between illegal memory and legal memory. Referencing something in popped out stack sometimes causes error and sometimes not. It depends. Don't count on this behavior. Assume it will always result in error when you program, but assume it will never signal error when you debug.

Chang Peng
  • 1,002
  • 8
  • 14
  • I recall from an old copy of *Turbo C Programming for the IBM*, which I used to play around with some way back when, how directly manipulating the graphics memory, and the layout of the IBM's text mode video memory, was described in great detail. Of course then, the system that the code ran on clearly defined what writing to those addresses meant, so as long as you didn't worry about portability to other systems, everything was fine. IIRC, pointers to void were a common theme in that book. – user Jun 23 '11 at 08:05
  • @Michael Kjörling: Sure! People like to do some dirty work once in a while ;) – Chang Peng Jun 23 '11 at 08:47
18

You are just returning a memory address, it's allowed but probably an error.

Yes if you try to dereference that memory address you will have undefined behavior.

int * ref () {

 int tmp = 100;
 return &tmp;
}

int main () {

 int * a = ref();
 //Up until this point there is defined results
 //You can even print the address returned
 // but yes probably a bug

 cout << *a << endl;//Undefined results
}
Brian R. Bondy
  • 314,085
  • 114
  • 576
  • 619
  • I disagree: There is a problem before the `cout`. `*a` points to unallocated (freed) memory. Even if you don't derefence it, it is still dangerous (and likely bogus). – ereOn May 19 '10 at 07:15
  • @ereOn: I clarified more what I meant by problem, but no it is not dangerous in terms of valid c++ code. But it is dangerous in terms of likely the user made a mistake and will do something bad. Maybe for example you are trying to see how the stack grows, and you only care about the address value and will never dereference it. – Brian R. Bondy May 19 '10 at 13:02
18

Pay attention to all warnings . Do not only solve errors.
GCC shows this Warning

warning: address of local variable 'a' returned

This is power of C++. You should care about memory. With the -Werror flag, this warning becames an error and now you have to debug it.

Gary
  • 11,083
  • 14
  • 43
  • 68
sam
  • 1,186
  • 1
  • 18
  • 27
18

That's classic undefined behaviour that's been discussed here not two days ago -- search around the site for a bit. In a nutshell, you were lucky, but anything could have happened and your code is making invalid access to memory.

Kerrek SB
  • 428,875
  • 83
  • 813
  • 1,025
18

This behavior is undefined, as Alex pointed out--in fact, most compilers will warn against doing this, because it's an easy way to get crashes.

For an example of the kind of spooky behavior you are likely to get, try this sample:

int *a()
{
   int x = 5;
   return &x;
}

void b( int *c )
{
   int y = 29;
   *c = 123;
   cout << "y=" << y << endl;
}

int main()
{
   b( a() );
   return 0;
}

This prints out "y=123", but your results may vary (really!). Your pointer is clobbering other, unrelated local variables.

AHelps
  • 1,742
  • 11
  • 16
17

It works because the stack has not been altered (yet) since a was put there. Call a few other functions (which are also calling other functions) before accessing a again and you will probably not be so lucky anymore... ;-)

Adrian Grigore
  • 31,759
  • 32
  • 127
  • 205
16

You actually invoked undefined behaviour.

Returning the address of a temporary works, but as temporaries are destroyed at the end of a function the results of accessing them will be undefined.

So you did not modify a but rather the memory location where a once was. This difference is very similar to the difference between crashing and not crashing.

Alexander Gessler
  • 42,787
  • 5
  • 78
  • 120
14

In typical compiler implementations, you can think of the code as "print out the value of the memory block with adress that used to be occupied by a". Also, if you add a new function invocation to a function that constains a local int it's a good chance that the value of a (or the memory address that a used to point to) changes. This happens because the stack will be overwritten with a new frame containing different data.

However, this is undefined behaviour and you should not rely on it to work!

larsmoa
  • 11,450
  • 5
  • 54
  • 82
  • 3
    "print out the value of the memory block with address that *used to be* occupied by a" isn't quite right. This makes it sound like his code has some well-defined meaning, which is not the case. You are right that this is probably how most compilers would implement it, though. – Brennan Vincent Jun 22 '11 at 21:23
  • @BrennanVincent: While the storage was occupied by `a`, the pointer held the address of `a`. Although the Standard does not require that implementations define the behavior of addresses after the lifetime of their target has ended, it also recognizes that on some platforms UB is processed in a documented manner characteristic of the environment. While the address of a local variable won't generally be of much use after it has gone out of scope, some other kinds of addresses may still be meaningful after the lifetime of their respective targets. – supercat Jun 26 '18 at 22:21
  • @BrennanVincent: For example, while the Standard may not require that implementations allow a pointer passed to `realloc` to be compared against the return value, nor allow pointers to addresses within the old block to be adjusted to point to the new one, some implementations do so, and code which exploits such a feature may be more efficient than code which has to avoid any action--even comparisons--involving pointers to the allocation that was given to `realloc`. – supercat Jun 26 '18 at 22:30
14

It can, because a is a variable allocated temporarily for the lifetime of its scope (foo function). After you return from foo the memory is free and can be overwritten.

What you're doing is described as undefined behavior. The result cannot be predicted.

littleadv
  • 19,072
  • 2
  • 31
  • 46
12

The things with correct (?) console output can change dramatically if you use ::printf but not cout. You can play around with debugger within below code (tested on x86, 32-bit, MSVisual Studio):

char* foo() 
{
  char buf[10];
  ::strcpy(buf, "TEST”);
  return buf;
}

int main() 
{
  char* s = foo();    //place breakpoint & check 's' varialbe here
  ::printf("%s\n", s); 
}
Mykola
  • 11
  • 1
  • 4
5

After returning from a function, all identifiers are destroyed instead of kept values in a memory location and we can not locate the values without having an identifier.But that location still contains the value stored by previous function.

So, here function foo() is returning the address of a and a is destroyed after returning its address. And you can access the modified value through that returned address.

Let me take a real world example:

Suppose a man hides money at a location and tells you the location. After some time, the man who had told you the money location dies. But still you have the access of that hidden money.

Ghulam Moinul Quadir
  • 1,448
  • 1
  • 9
  • 16
4

It's 'Dirty' way of using memory addresses. When you return an address (pointer) you don't know whether it belongs to local scope of a function. It's just an address. Now that you invoked the 'foo' function, that address (memory location) of 'a' was already allocated there in the (safely, for now at least) addressable memory of your application (process). After the 'foo' function returned, the address of 'a' can be considered 'dirty' but it's there, not cleaned up, nor disturbed/modified by expressions in other part of program (in this specific case at least). A C/C++ compiler doesn't stop you from such 'dirty' access (might warn you though, if you care). You can safely use (update) any memory location that is in the data segment of your program instance (process) unless you protect the address by some means.

Ayub
  • 879
  • 8
  • 12
1

Your code is very risky. You are creating a local variable (wich is considered destroyed after function ends) and you return the address of memory of that variable after it is destoyed.

That means the memory address could be valid or not, and your code will be vulnerable to possible memory address issues (for example segmentation fault).

This means that you are doing a very bad thing, becouse you are passing a memory address to a pointer wich is not trustable at all.

Consider this example, instead, and test it:

int * foo()
{
   int *x = new int;
   *x = 5;
   return x;
}

int main()
{
    int* p = foo();
    std::cout << *p << "\n"; //better to put a new-line in the output, IMO
    *p = 8;
    std::cout << *p;
    delete p;
    return 0;
}

Unlike your example, with this example you are:

  • allocating memory for int into a local function
  • that memory address is still valid also when function expires, (it is not deleted by anyone)
  • the memory address is trustable (that memory block is not considered free, so it will be not overridden until it is deleted)
  • the memory address should be deleted when not used. (see the delete at the end of the program)
Nobun
  • 141
  • 1
  • 1
  • 10
  • Did you add something not already covered by the existing answers? And please don't use raw pointers/`new`. – Lightness Races in Orbit May 02 '19 at 10:20
  • 1
    The asker used raw pointers. I did an example wich reflected exactly the example he did in order to allow him to see the difference between untrusty pointer and trusty one. Actually there is another answer similar to mine, but it uses strcpy wich, IMHO, could be less clear to a novice coder than my example that uses new. – Nobun May 02 '19 at 10:28
  • They didn't use `new`. You're teaching them to use `new`. But you shouldn't use `new`. – Lightness Races in Orbit May 02 '19 at 10:34
  • So in your opinion it is better to pass an address to a local variable wich is destroyed in a function than actually allocating memory? This makes no sense. Understanding the concept of allocating e deallocating memory is important, imho, mainly if you are asking about pointers (asker didn't use new, but used pointers). – Nobun May 02 '19 at 10:36
  • When did I say that? No, it is better to use smart pointers to properly indicate ownership of the referenced resource. Don't use `new` in 2019 (unless you're writing library code) and don't teach newcomers to do so either! Cheers. – Lightness Races in Orbit May 02 '19 at 10:45
  • smart pointers surely are better then ```new```, I agree with you. But I used ```new``` since it is simplier than smart pointers to use and to understand for a novice. I think that complexity should be scaled. The first thing is to understand what allocation means before reaching the next step (wich could be... how could I use allocation in a better way? -> smart pointers, and other tools in std::) - I admit, however, that I am a very old-school c++ auto-learner :D so -> my fault :P – Nobun May 02 '19 at 10:50
  • Object management with smart pointers is what should be taught. `new` and `delete` is an advanced topic that can be taught later. :) – Lightness Races in Orbit May 02 '19 at 11:08
0

That depends on the language. In C & C++/Cpp, YES, you technically could because it has very weak checks of whether any given pointer actually points somewhere valid or not. The compiler will report an error if you attempt to access the variable itself when it’s out of scope, but it won’t likely be smart enough to know if you intentionally copy a pointer to that variable’s location to some other variable that will still be in scope later.

However, modifying that memory once the variable is out of scope will have totally undefined effects. You’ll probably be corrupting the stack, which may have reused that space for new variables.

More modern languages such as Java or C# frequently go to great lengths to avoid the programmer needing to have access to the actual addresses of variables in the first place, as well as bounds-checking array access, keeping reference counts of variables that point to objects in the heap so they don’t get deallocated prematurely, and so on. All of this is meant to help keep the programmer from doing something unintentionally insecure and/or out of bounds of the in-scope variables.

Numan Gillani
  • 338
  • 3
  • 13