What is the difference between how references and Box are represented in memory?

Question

I am trying to understand how references and Box<T> work. Let's consider a code example:

fn main() {
    let x = 5;
    let y = &x;

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

In my imagination, Rust saves the value in memory as:

Consider this second code snippet with Box<T>:

fn main() {
    let x = 5;
    let y = Box::new(x);

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

How is x going to be stored in Box? What does the memory look like?

The examples above are from Treating Smart Pointers Like Regular References with the Deref Trait. For the second example, the book explains it as:

The only difference between Listing 15-7 and Listing 15-6 is that here we set y to be an instance of a box pointing to the value in x rather than a reference pointing to the value of x.

Does it mean that y in the box points directly to value 5?

https://doc.rust-lang.org/std/boxed/index.html, "Does it mean here, y in the box points directly to value 5?" no x is moved into the box — Stargateur, Jan 24 '20 at 13:40

Shepmaster · Accepted Answer · 2020-01-24T14:55:54.147

Your diagram for the simple case is fine, although it may be unclear as you use 5 for both the value and the address. I've moved y in my diagram to prevent any confusion.

What does memory look like for a `Box<T>`?

The equivalent diagram for Box would look similar, but with the addition of the heap:

    Stack

     ADDR                    VALUE
    +------------------------------+
x = |0x0001|                     5 |
y = |0x0002|                0xFF01 |
    |0x0003|                       |
    |0x0004|                       |
    |0x0005|                       |
    +------------------------------+

    Heap

     ADDR                    VALUE
    +------------------------------+
    |0xFF01|                     5 |
    |0xFF02|                       |
    |0xFF03|                       |
    |0xFF04|                       |
    |0xFF05|                       |
    +------------------------------+

(See the pedantic notes below about this diagram)

Box has allocated enough space in the heap for us, here at address 0xFF01. The value is then moved from the stack onto the heap.

Does it mean that y in the box points directly

It does not. y holds the pointer to the data allocated by the Box. It must do this in order to be able to free the allocated memory when the Box goes out of scope.

The point of the chapter you are reading is that Rust will transparently dereference the Box for you, so you don't usually need to concern yourself with this fact.

What's the difference in memory?

This might bend your brain a little bit!

Looking at the stack for both examples, there isn't really a difference between the two cases — both the reference and the Box are stored on the stack as a pointer. The only difference is in the code, where it knows to treat the value on the stack differently depending on if it's a reference or Box.

In fact, this is true for everything in Rust! To the computer, it's all just bits, and the structure encoded in the program binary is the only thing that distinguishes one blob of bytes from another.

Why is `x` still on the stack after being moved to the `Box`?

Observant readers will note that I left the value 5 for x on the stack. There are two relevant reasons why:

That's actually what happens in memory. Programs don't usually "reset" values they are done with as it would be unneeded overhead. Rust avoids problems by marking the variable as moved and disallowing access to the moved-from variable.
In this case, i32 implements Copy, which means that it's OK to access the value after it's been moved. The compiler will actually allow us to continue accessing x. This wouldn't be true if x were a type that didn't implement Copy, such as a String or a Box.

Pedantic diagram notes

This diagram is not to scale. An i32 takes 4 bytes and a pointer / reference take a platform-dependent number of bytes, but it's simpler to assume everything is the same size.
The stack typically starts at a high address and grows downward, while the heap starts at a low address and grows upward.

Note that with the box, if you don't use `x` after copying it into the box, the optimizer may decide to re-use the stack space for `y`. — Jmb, Jan 24 '20 at 14:43

Hauleth · Answer 2 · 2020-01-24T14:32:10.663

While the general rule is exactly the same as in that answer What are the differences between Rust's `String` and `str`?, I'm answering here as well.

A Rust reference is (almost) exactly what you have described: a pointer to the value somewhere in the memory. (It's not always. For example, slices also contain a length and pointers to traits also contain a v-table. These are called fat pointers). At the start, the Box<T> is a value, like any other value in Rust, so the difference is obvious - one is a reference to a place in memory and the second is a value somewhere in memory. The confusion is that Box<T> internally contains a reference to memory, but that reference is allocated on the heap instead of stack. The difference between these two is that the stack is function local and is quite small (on my macOS it is max 8192 KiB).

For example, you cannot do something like this for a few reasons:

fn foo() -> &u32 {
    let a = 5;

    &a
}

The most important reason is that a will not be there after foo() returns. That memory will be wiped out (not always though) and it is possible that it will be changed to another value soon. This is undefined behavior in C and C++ and an error in Rust which does not allow for any undefined behavior (in code that does not use unsafe).

On the other hand, if you do:

fn foo() -> Box<u32> {
    let a = Box::new(5);

    a
}

A few things relevant to us will happen:

memory will be allocated on the stack. This memory is totally independent from the current function scope, which means that it need to be freed when it will not be needed
we will move the value, so there are no lifetimes involved
ownership of a will be moved to the caller

For convenience, Box<T> will behave like a reference in many cases, as these two can be often used interchangeably. For example, see this C program where we provide similar functionality to the second example:

int* foo(void) {
  int* a = malloc(sizeof(int));
  *a = 5;

  return a;
}

As you can see, the pointer is used to store the address of the memory and this is passed further.

@Stargateur fixed. Without `unsafe` the UB are almost non-existent without bugs in libraries. — Hauleth, Jan 24 '20 at 14:16
(also worth saying people should always check malloc return Box does the allocation verification) — Stargateur, Jan 24 '20 at 14:26
@Shepmaster yeah, right, I mixed heap and stack again… It should be in reverse. — Hauleth, Jan 24 '20 at 14:30
*the difference is obvious - one is a reference to a place in memory and the second is a value somewhere in memory* This seems incorrect: `&T` and `Box` are both values and both refer to a place in memory (at least notionally). Also, both `&T` and `Box` may be considered to be "values somewhere in memory" (although both are likely to be put in registers in a running program). I don't understand the distinction you are making here. — trentcl, Jan 24 '20 at 14:39

What is the difference between how references and Box are represented in memory?

2 Answers2

What does memory look like for a `Box<T>`?

What's the difference in memory?

Why is `x` still on the stack after being moved to the `Box`?

Pedantic diagram notes

Related

What is the difference between how references and Box are represented in memory?

2 Answers2

What does memory look like for a Box<T>?

What's the difference in memory?

Why is x still on the stack after being moved to the Box?

Pedantic diagram notes

Related

What does memory look like for a `Box<T>`?

Why is `x` still on the stack after being moved to the `Box`?