What is the order of operations when creating a struct in memory (under the hood)?

Question

Say I have a simple system in C:

#include <cstddef>

typedef struct Point { 
  Point *a;
  Point *b;
  int x;
  int y;
} Point; 

int main() { 
  Point p1 = {NULL, NULL, 3, 5};
  return 0; 
}

Godbolt compiles to:

main:
  push rbp
  mov  rbp, rsp
  mov  QWORD PTR [rbp-32], 0
  mov  QWORD PTR [rbp-24], 0
  mov  DWORD PTR [rbp-16], 3
  mov  DWORD PTR [rbp-12], 5
  mov  eax, 0
  pop  rbp
  ret

A tiny step further and we have:

int main() { 
  Point v = {NULL, NULL, 3, 5};
  Point m = {NULL, NULL, 7, 9};
  Point s = {&v, &s, 11, 12};
  return 0; 
}

Compiled to:

main:
  push rbp                    ; save the base pointer to the stack.
  mov  rbp, rsp               ; put the previous stack pointer into the base pointer.
  mov  QWORD PTR [rbp-32], 0
  mov  QWORD PTR [rbp-24], 0
  mov  DWORD PTR [rbp-16], 3
  mov  DWORD PTR [rbp-12], 5
  mov  QWORD PTR [rbp-64], 0
  mov  QWORD PTR [rbp-56], 0
  mov  DWORD PTR [rbp-48], 7
  mov  DWORD PTR [rbp-44], 9
  mov  QWORD PTR [rbp-96], 0
  mov  QWORD PTR [rbp-88], 0
  mov  QWORD PTR [rbp-80], 0
  mov  DWORD PTR [rbp-80], 11
  mov  DWORD PTR [rbp-76], 12
  lea  rax, [rbp-32]
  mov  QWORD PTR [rbp-96], rax
  lea  rax, [rbp-96]
  mov  QWORD PTR [rbp-88], rax
  mov  eax, 0
  pop  rbp
  ret

I can't exactly tell what's going on yet, but this helps (a little). Could one explain what is happening in the last example? I don't quite understand what the base pointer is, I know what the stack pointer is. I am not sure what QWORD PTR [...] does, but it's saying it's a quad-word size and a pointer/address. But why is it picking those specific offsets from rbp? I don't understand why it chose that.

Then the second part is the lea rax, [rbp-32]. It looks like it's handling the part where I did {&v, &s}.

So my question is:

What is the QWORD/DWORD PTR loading into? Is this loading into the heap, the stack, or something else?
Why is it choosing to be an offset of rbp?
Do the order of operations always go from the smallest object (most primitive object) to the most complex object? Or can you think of a case where the assembly code would first construct the complex object and then construct the more primitive objects?

I am wondering because I'm trying to wrap my head around how to create a tree in assembly. In functional programming or in JavaScript, you have a(b(c(), d(), e(f(g(), h()), ...))). The deepest functions get evaluated first, then a gets evaluated last, passed in the arguments. But I'm having a hard time visualizing how this would look in assembly.

More specifically, I am trying to create like a simple key/value store in assembly, to get a deeper understanding of how "objects" are created at this low level. It's easy in JavaScript:

db[key] = value

But this is because value already exists somewhere in memory. The question I have is, should I be creating this directly in the key-value store up-front? Or do you always create it in a random free spot in memory (like the offsets from rbp) and then later move them to the correct position (or point them to the right places)? I keep thinking I should be creating the tree leaf node directly on the branch, like I am pasting a leaf on the branch (visually). But the leaf already exists! Where does it exist before it is on the branch!? Can it ever exist on the branch before it is constructed elsewhere? I am getting confused.

So, start with a leaf.

Paste it on a branch.

   
  /  
\ | |
 \|/
  |
  |

Where is the leaf being created in the first place? That's what I was trying to see with the assembly example.

Basically I'm wondering how it looks to directly create something on the heap, rather than the stack.

A database object with overloaded C++ `operator[]` (thus actually just a function call) is hardly a good starting point for understanding basics! First learn some basics of asm syntax and how stack-frames are used for automatic storage. Then have a look at [How do objects work in x86 at the assembly level?](//stackoverflow.com/q/33556511). Asm doesn't have key/value stuff in hardware, you have to implement a hash table or tree yourself. (Unless your keys are just integers and your table is just a big-enough array, so it's just array indexing.) — Peter Cordes, Dec 10 '19 at 08:46
Yes I have looked through that stuff and understand the basics to some degree, I am implementing the tree myself. I'll read through that objects link, that looks nice. Thank you! — Lance Pollard, Dec 10 '19 at 08:50
If you still haven't understood for sure that `mov QWORD PTR [rbp-32], 0` is an 8-byte store (of an immediate `0`) into the function's local variables (automatic storage on the stack, relative to RBP as the frame pointer), you don't understand the basics yet! Matt Godbolt's CppCon2017 talk [“What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”](https://youtu.be/bSkpMdDe4g4) might be a good start, also [How to remove "noise" from GCC/clang assembly output?](//stackoverflow.com/a/38552509) — Peter Cordes, Dec 10 '19 at 08:54
C doesn't specify an order for initializing the elements of an aggregate, so the compiler can use any order to initialize the elements of your structures. By the as if rule the compiler can even initialize the different structures in a different order than the they're initialized in your program, because your program doesn't in any way depend on the order. In fact that compiler can completely omit allocating and initializing these structures because you never use them. Also `` is a C++ header that doesn't exist in C. — Ross Ridge, Dec 10 '19 at 09:02

Some programmer dude · Answer 1 · 2019-12-10T08:52:42.583

2

Most compiler use the stack for local variables.

Space on the stack is usually managed by two pointers: The stack pointer; and a "base" pointer that points to the base of the "allocated" memory on the stack.

Also worth to note is that the stack on almost all systems grows downward, which is why there are negative offsets from the base pointer (register rbp in your generated code).

The amount of space reserved is calculated by the compiler, which add code to initialize the two pointers either inside the function or before the function is called (it depends on calling conventions).

When the function returns the pointers are reset, which is a very simple way to "free" the memory for the local variables.

Somewhat illustrated, it looks like this:

base pointer ---> +---------------------+
                  | Space for variables |
                  | ...                 |
                  | ...                 |
                  | ...                 |
stack pointer --> +---------------------+

edited Dec 10 '19 at 08:52

answered Dec 10 '19 at 08:48

Some programmer dude

363,249
31
351
550

So but then does it always create objects first on the stack, then move them to the heap? Or can you show me an example where it puts it directly on the heap from the get go? – Lance Pollard Dec 10 '19 at 08:52
5

@LancePollard No compiler I know of put local variables on the heap, they are allocated on the stack and on the stack they stay. If you want object on the heap you need to dynamically allocate them using `malloc` et. al. – Some programmer dude Dec 10 '19 at 08:53
@LancePollard: some managed languages don't distinguish dynamic vs. automatic storage the way C does. e.g. in C# or Java you can always return a reference to a local variable. It's up to the compiler to do "escape analysis" to find out if a reference to a variable is visible outside the function, and if so to actually allocate it on "the heap", otherwise it can optimize it away or just use the stack. In C, returning a pointer to a local variable doesn't work; the object doesn't exist after leaving the function's scope. You can do it but dereferencing the pointer is UB. – Peter Cordes Dec 11 '19 at 02:42
1

@Someprogrammerdude: Right; normal implementations on ISAs that have an easy-to-use call stack (like x86 or any other modern ISA) just use it to implement C's "automatic" storage class (non-static local vars). But if you don't need to support recursion or reentrancy, some implementations use static storage. e.g. IBM S/360 had a "SAVEAREA" for every subroutine. [How was C ported to architectures that had no hardware stack?](//retrocomputing.stackexchange.com/q/7197) But if we're talking about x86, that's just an extra complication. – Peter Cordes Dec 11 '19 at 02:46
@LancePollard: you can't have a named variable "on the heap" in C. You can only have *pointers* to dynamically-allocated storage. (Where the pointer variable itself is either local or global, automatic or static storage, but the value it holds can be a pointer to the return value of `malloc`) – Peter Cordes Dec 11 '19 at 02:48

score 2 · Accepted Answer · answered Dec 11 '19 at 02:54

Basically I'm wondering how it looks to directly create something on the heap, rather than the stack.

You can't have a named variable "on the heap" in C. You can only have pointers to dynamically-allocated storage. (Where the pointer variable itself is either local or global, automatic or static storage, but the value it holds can be a pointer to the return value of malloc)

e.g. int *buffer = malloc(100*sizeof(*buffer)); inside a function: buffer is a local variable (automatic storage, which means stack space or just a register on "normal" C implementations on mainstream ISAs).

*buffer is the first int of that block of dynamic storage.

Some managed languages don't distinguish dynamic vs. automatic storage the way C does. e.g. in C# or Java you can always return a reference to a local variable. It's up to the compiler to do "escape analysis" to find out if a reference to a variable is visible outside the function, and if so to actually allocate it on "the heap", otherwise it can optimize it away or just use the stack.

In C, returning a pointer to a local variable doesn't work; the object doesn't exist after leaving the function's scope. You can do it without compile errors (just warnings) but dereferencing the pointer is UB.

e.g.

int *bad_return_local() {
    int buf[100];    // on the stack; destroyed when the function returns
    return buf;      // caller can't use this pointer to out-of-scope automatic storage
}

int *good_return_dynamic() {
    int *buf = malloc(100*sizeof(*buf));  // on "the heap"
    if (!buf) /* error: couldn't allocate memory */;
    return buf;      // caller must manually free() the return value at some point
}


int *return_static() {
    static int buf[100];   // static storage, e.g. in the BSS, same as global scope
    return buf;            // return the same pointer to the same storage every call
}

score 1 · Answer 3 · answered Dec 10 '19 at 17:02

main:
  push rbp                    ; save the base pointer to the stack.
  mov  rbp, rsp               ; put the previous stack pointer into the base pointer.
  mov  QWORD PTR [rbp-32], 0  ; Write 0 (NULL) to v.a
  mov  QWORD PTR [rbp-24], 0  ; Write 0 (NULL) to v.b
  mov  DWORD PTR [rbp-16], 3  ; Write 3 to v.x
  mov  DWORD PTR [rbp-12], 5  ; Write 5 to v.y
  mov  QWORD PTR [rbp-64], 0  ; Write 0 (NULL) m.a
  mov  QWORD PTR [rbp-56], 0  ; Write 0 (NULL) to m.b
  mov  DWORD PTR [rbp-48], 7  ; Write 7 to m.x
  mov  DWORD PTR [rbp-44], 9  ; Write 9 to m.y
  mov  QWORD PTR [rbp-96], 0  ; Write 0 (NULL) to s.a
  mov  QWORD PTR [rbp-88], 0  ; Write 0 (NULL) to s.b
  mov  QWORD PTR [rbp-80], 0  ; Write 0 to s.x
  mov  DWORD PTR [rbp-80], 11 ; Write 11 to s.x
  mov  DWORD PTR [rbp-76], 12 ; Write 11 to s.y
  lea  rax, [rbp-32]          ; Load effective address of v.a into rax
  mov  QWORD PTR [rbp-96], rax ; Write address of v.a into s.a
  lea  rax, [rbp-96]          ; Load effective address of s.a into rax
  mov  QWORD PTR [rbp-88], rax ; Write address of m.a into s.b
  mov  eax, 0                 
  pop  rbp
  ret

In a function (typically), parameters and local variables are organized into a stack frame (along with the address of the previous frame and the address of the next instruction) and are referenced via an offset from a base ( or frame) pointer. rbp stores the address of the stack frame, and you reference objects by offsetting from that address. Why not just offset from the stack pointer (rsp)? Depending on what you do in the function, the stack pointer can change (not so much in compiled code, more in hand-hacked assembly). The base or frame pointer gives you a stable, unchanging reference point for doing the offsets. So what

mov QWORD PTR [rbp-32], 0

means is "Write the value of the immediate operand 0, expanded to a QWORD (8 bytes), to the address computed from rbp-32". If rbp is 0xdeadbeef, then that means zero out the 8 bytes starting at 0xdeadbeef - 0x20, or 0xdeadbecf.

There is some weirdness in the generated code - not sure why it's zeroing out s.x before writing 11 to it. Also not sure why it's bothering to zero out s.a and s.b before copying the addresses of m and s (the address of a struct object and the address of its first member are always the same). Turning on optimization may fix that.

This is how one compiler does it. Different compilers may do something different - for example, this is output from gcc (LLVM) on a Mac:

        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 10, 14    sdk_version 10, 14
        .globl  _main                   ## -- Begin function main
        .p2align        4, 0x90
_main:                                  ## @main
        .cfi_startproc
## %bb.0:
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        xorl    %eax, %eax
        movl    $0, -4(%rbp)
        movq    l___const.main.v(%rip), %rcx
        movq    %rcx, -32(%rbp)
        movq    l___const.main.v+8(%rip), %rcx
        movq    %rcx, -24(%rbp)
        movq    l___const.main.v+16(%rip), %rcx
        movq    %rcx, -16(%rbp)
        movq    l___const.main.s(%rip), %rcx
        movq    %rcx, -56(%rbp)
        movq    l___const.main.s+8(%rip), %rcx
        movq    %rcx, -48(%rbp)
        movq    l___const.main.s+16(%rip), %rcx
        movq    %rcx, -40(%rbp)
        leaq    -32(%rbp), %rcx
        movq    %rcx, -80(%rbp)
        leaq    -56(%rbp), %rcx
        movq    %rcx, -72(%rbp)
        movl    $11, -64(%rbp)
        movl    $12, -60(%rbp)
        popq    %rbp
        retq
        .cfi_endproc
                                        ## -- End function
        .section        __TEXT,__const
        .p2align        3               ## @__const.main.v
l___const.main.v:
        .quad   0
        .quad   0
        .long   3                       ## 0x3
        .long   5                       ## 0x5

        .p2align        3               ## @__const.main.s
l___const.main.s:
        .quad   0
        .quad   0
        .long   7                       ## 0x7
        .long   9                       ## 0x9


.subsections_via_symbols

Different syntax, different approach, same end result.

You can get Intel-syntax from clang/LLVM just like from GCC. https://godbolt.org/ has both installed (although Linux not Mac), and defaults to passing `-masm=intel`. Weird that your Mac clang didn't use SSE2 for 16-byte copies if it's going to copy from static storage instead of immediates. Anyway, GCC's zeroing before storing the real value looks like the same missed optimization as [Why does GCC aggregate initialization of an array fill the whole thing with zeros first, including non-zero elements?](//stackoverflow.com/q/59022176) — Peter Cordes, Dec 11 '19 at 03:00

What is the order of operations when creating a struct in memory (under the hood)?

3 Answers3