practical explanation of c++ functions with pointers

Question

I am relatively new to C++...

I am learning and coding but I am finding the idea of pointers to be somewhat fuzzy. As I understand it * points to a value and & points to an address...great but why? Which is byval and which is byref and again why?

And while I feel like I am learning and understanding the idea of stack vs heap, runtime vs design time etc, I don't feel like I'm fully understanding what is going on. I don't like using coding techniques that I don't fully understand.

Could anyone please elaborate on exactly what and why the pointers in this fairly "simple" function below are used, esp the pointer to the function itself.. [got it]

Just asking how to clean up (delete[]) the str... or if it just goes out of scope.. Thanks.

char *char_out(AnsiString ansi_in)
{
// allocate memory for char array
char *str = new char[ansi_in.Length() + 1];

// copy contents of string into char array
strcpy(str, ansi_in.c_str());
return str;
}

Questions like these are too broad for SO. Also what is the purpose of the code you posted (except to see that it returns an invalid pointer)? — UnholySheep, Mar 18 '17 at 18:23
[The Definitive C++ Book Guide and List](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list) — molbdnilo, Mar 18 '17 at 18:26
The pointer indicates that you are returning a pointer to a character array. It is not a pointer to the function, it is a type specifier for what is being returned. Think of it as: char* char_out(), with the space before the function name, and it becomes a little clearer than it returns a char*, just like int foobar() would return an int. — ScottK, Mar 18 '17 at 18:35
@unholy ok well I guess the delete[] is wrong. what is the correct way to clean up the used pointer after the function is called? — devdude, Mar 18 '17 at 18:38
@devdude The best way is to use a class, such as `std::string`. — PaulMcKenzie, Mar 18 '17 at 18:46
Possible duplicate of [Why should I use a pointer rather than the object itself?](http://stackoverflow.com/questions/22146094/why-should-i-use-a-pointer-rather-than-the-object-itself) — , Mar 18 '17 at 18:46

Koby Duck · Answer 1 · 2017-08-20T02:02:07.447

Revision 3

TL;DR:

AnsiString appears to be an object which is passed by value to that function. char* str is on the stack.

A new array is created on the heap with (ansi_in.Length() + 1) elements. A pointer to the array is stored in str. +1 is used because strings in C/C++ typically use a null terminator, which is a special character used to identify the end of the string when scanning through it.
ansi_in.cstr() is called, copying a pointer to its string buffer into an unnamed local variable on the stack.
str and the temporary pointer are pushed onto the stack and strcpy is called. This has the effect of copying the string(including the null-terminator) pointed at from the temporary to str.
str is returned to the caller

Long answer:

You appear to be struggling to understand stack vs heap, and pointers vs non-pointers. I'll break them down for you and then answer your question.

The stack is a concept where a fixed region of memory is allocated for each thread before it starts and before any user code runs. Ignoring lower level details such as calling conventions and compiler optimizations, you can reason that the following happens when you call a function:

Arguments are pushed onto the stack. This reserves part of the stack for use of the arguments.
The function performs some job, using and copying the arguments as needed.
The function pops the arguments off the stack and returns. This frees the space reserved for the arguments.

This isn't limited to function calls. When you declare objects and primitives in a function's body, space for them is reserved via pushing. When they're out of scope, they're automatically cleaned up by calling destructors and popping.

When your program runs out of stack space and starts using the space outside of it, you'll typically encounter an error. Regardless of what the actual error is, it's known as a stack overflow because you're going past it and therefore "overflowing".

The heap is a different concept where the remaining unused memory of the system is available for you to manually allocate and deallocate from. This is primarily used when you have a large data set that's too big for the stack, or when you need data to persist across arbitrary functions.

C++ is a difficult beast to master, but if you can wrap your head around the core concepts is becomes easier to understand.

Suppose we wanted to model a human:

struct Human
{
    const char* Name;
    int Age;
};

int main(int argc, char** argv)
{
    Human human;
    human.Name = "Edward";
    human.Age = 30;
    return 0;
}

This allocates at least sizeof(Human) bytes on the stack for storing the 'human' object. Right before main() returns, the space for 'human' is freed.

Now, suppose we wanted an array of 10 humans:

int main(int argc, char** argv)
{
    Human humans[10];
    humans[0].Name = "Edward";
    humans[0].Age = 30;
    // ...
    return 0;
}

This allocates at least (sizeof(Human) * 10) bytes on the stack for storing the 'humans' array. This too is automatically cleaned up.

Note uses of ".". When using anything that's not a pointer, you access their contents using a period. This is direct memory access if you're not using a reference.

Here's the single object version using the heap:

int main(int argc, char** argv)
{
    Human* human = new Human();
    human->Name = "Edward";
    human->Age = 30;
    delete human;
    return 0;
}

This allocates sizeof(Human*) bytes on the stack for the pointer 'human', and at least sizeof(Human) bytes on the heap for storing the object it points to. 'human' is not automatically cleaned up, you must call delete to free it. Note uses of "a->b". When using pointers, you access their contents using the "->" operator. This is indirect memory access, because you're accessing memory through an variable address.

It's sort of like mail. When someone wants to mail you something they write an address on an envelope and submit it through the mail system. A mailman takes the mail and moves it to your mailbox. For comparison the pointer is the address written on the envelope, the memory management unit(mmu) is the mail system, the electrical signals being passed down the wire are the mailman, and the memory location the address refers to is the mailbox.

Here's the array version using the heap:

int main(int argc, char** argv)
{
    Human* humans = new Human[10];
    humans[0].Name = "Edward";
    humans[0].Age = 30;
    // ...
    delete[] humans;
    return 0;
}

This allocates sizeof(Human*) bytes on the stack for pointer 'humans', and (sizeof(Human) * 10) bytes on the heap for storing the array it points to. 'humans' is also not automatically cleaned up; you must call delete[] to free it.

Note uses of "a[i].b" rather than "a[i]->b". The "[]" operator(indexer) is really just syntactic sugar for "*(a + i)", which really just means treat it as a normal variable in a sequence so I can type less.

In both of the above heap examples, if you didn't write delete/delete[], the memory that the pointers point to would leak(also known as dangle). This is bad because if left unchecked it could eat through all your available memory, eventually crashing when there isn't enough or the OS decides other apps are more important than yours.

Using the stack is usually the wiser choice as you get automatic lifetime management via scope(aka RAII) and better data locality. The only "drawback" to this approach is that because of scoped lifetime you can't directly access your stack variables once the scope has exited. In other words you can only use stack variables within the scope they're declared. Despite this, C++ allows you to copy pointers and references to stack variables, and indirectly use them outside the scope they're declared in. Do note however that this is almost always a very bad idea, don't do it unless you really know what you're doing, I can't stress this enough.

Passing an argument by-ref means pushing a copy of a pointer or reference to the data on the stack. As far as the computer is concerned pointers and references are the same thing. This is a very lightweight concept, but you typically need to check for null in functions receiving pointers.

Pointer variant of an integer adding function:

int add(const int* firstIntPtr, const int* secondIntPtr)
{
    if (firstIntPtr == nullptr) {
        throw std::invalid_argument("firstIntPtr cannot be null.");
    }
    if (secondIntPtr == nullptr) {
        throw std::invalid_argument("secondIntPtr cannot be null.");
    }
    return *firstIntPtr + *secondIntPtr;
}

Note the null checks. If it didn't verify its arguments are valid, they very well may be null or point to memory the app doesn't have access to. Attempting to read such values via dereferencing(*firstIntPtr/*secondIntPtr) is undefined behavior and if you're lucky results in a segmentation fault(aka access violation on windows), crashing the program. When this happens and your program doesn't crash, there are deeper issues with your code that are out of the scope of this answer.

Reference variant of an integer adding function:

int add(const int& firstInt, const int& secondInt)
{
    return firstInt + secondInt;
}

Note the lack of null checks. By design C++ limits how you can acquire references, so you're not suppose to be able to pass a null reference, and therefore no null checks are required. That said, it's still possible to get a null reference through converting a pointer to a reference, but if you're doing that and not checking for null before converting you have a bug in your code.

Passing an argument by-val means pushing a copy of it on the stack. You almost always want to pass small data structures by value. You don't have to check for null when passing values because you're passing the actual data itself and not a pointer to it.

i.e.

int add(int firstInt, int secondInt)
{
    return firstInt + secondInt;
}

No null checks are required because values, not pointers are used. Values can't be null.

Assuming you're interested in learning about all this, I highly suggest you use std::string(also see this) for all your string needs and std::unique_ptr(also see this) for managing pointers.

i.e.

std::string char_out(AnsiString ansi_in)
{
    return std::string(ansi_in.c_str());
}

std::unique_ptr<char[]> char_out(AnsiString ansi_in)
{
    std::unique_ptr<char[]> str(new char[ansi_in.Length() + 1]);
    strcpy(str.get(), ansi_in.c_str());
    return str; // std::move(str) if you're using an older C++11 compiler.
}

Thanks for all the detailed info.. I still don't know how to recover the memory in the original example...once the value is returned from the function (garbage collection)... One of the main problems I have simple passing/converting std::strings to controls which only accept ansi strings... once passed the arrays are no longer needed. — devdude, Mar 20 '17 at 18:40
Call delete[] on the pointer you receive from char_out to recover the memory. i.e. char* str = char_out(...); /* do something with str */ delete[] str; — Koby Duck, Mar 21 '17 at 00:04

practical explanation of c++ functions with pointers

1 Answers1