How does x=x+1 is evaluated by the compiler and how is represented in assembly?

Question

I'm trying to understand how does the compiler "sees" the i+1 part from expression i=i+1. I understand that i=3 means putting the value 3 in the location memory of variable i.

My guess about the i=i+1 is that the compiler expects a value from the right side of the "=" operator, so it gets the value from the location memory of variable i (which is 3, after the assignment) and add 1 to it, and the final result of the "i+1" expression(3+1=4) is stored back into the location memory of variable i, as a value. Is that correct?

And if it is, it means that any variable/combination of variables and literals present on the right side of an "=" operator will always be replaced with the value stored in them and those value can be added/substracted/etc with the values from other variables/literals (as in the x+1 expression), whilst the final result of those calculations will also be literal values (ex: 5, literal strings, etc), and will also be stored like values in a single variable on the left side of the "=" operator.

I'm also curious how this code is seen in assembly, and what are the main operations of this incrementation of i ( i = i+1);

#include <stdio.h>
int main()
{
    int i = 3;
    i = i + 1; // i should have the value of 4 stored back in it;
    return 0;
}

This isn't shrouded in dark mystery. Your compiler will happily give you assembly-style output of the compiled code. The method to get this varies by compiler to compiler. [Example with `clang`](https://stackoverflow.com/questions/10990018/how-to-generate-assembly-code-with-clang-in-intel-syntax). — tadman, Nov 02 '19 at 00:51
As a note, it's best not to think in terms of "memory" when dealing with the actual code your compiler emits. With optimizations turned on a lot of things will be stored in other places, like registers, or boiled down to other values based on observed patterns. — tadman, Nov 02 '19 at 00:54
@tadman, I tried to use dissasembly in Code::Blocks, and the line is no different from the int i=3 initialization....that's why I posted here. Also I don't how to use other compilers...How about the rest of the text, is it a right conclusion ? — painkiller, Nov 02 '19 at 00:54
What happens in theory and what happens in practice are often two different things. You'll need to learn more about common [compiler optimizations](https://en.wikipedia.org/wiki/Optimizing_compiler) and how they play out in your code to have a better sense of what actually happens. Even then compilers today are very complicated pieces of code that do all sorts of sometimes seemingly bizarre things to squeeze out maximal performance. The C standard does not dictate how things are stored, nor what the resulting machine code should look like, so there's a lot of latitude. — tadman, Nov 02 '19 at 00:56
The compiler will likely create an abstract syntax tree representing the operations specified by your code. At some point, those will represent operations like "load the value of memory address X (labelled 'i')", "add 1 to that value", "store that value back in memory address X", etc. How the compiler chooses to implement those operations are up to it--there are lots of different instruction sequences that get there. In fact, the compiler is also free to notice that you never use i, and eliminate all that code entirely, only implementing the "return 0". — Lee Daniel Crocker, Nov 02 '19 at 00:57
"Disassembly" tries to turn machine code back into C code, so it's no use here. What you want is to look at the actual assembly representation of the machine code, or the assembler output from the compiler itself, which is often easier to follow along with, especially as it often includes the source lines mixed in as comments. — tadman, Nov 02 '19 at 00:57
Ok, I understand that. But how about the conclusion with the i+1 expression ? Did I get it right ? — painkiller, Nov 02 '19 at 01:00

klutt · Accepted Answer · 2019-11-02T01:12:43.637

This is not answerable for the general case. It depends on the target platform. If you want to inspect the assembly, you can do so with the -S parameter with gcc. When I did that to your code, it gave me this:

/tmp$ cat main.s 
    .file   "main.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $3, -4(%rbp)
    addl    $1, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
    .section    .note.GNU-stack,"",@progbits

A brief little explanation of what is happening here. First we push the value of the stackpointer. This is so that we can jump back later.

.cfi_startproc
pushq   %rbp

Then we set up the stack frame with this code. It corresponds to declaring variables.

.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq    %rsp, %rbp

Then we have this. Comments are mine.

movl    $3, -4(%rbp) # i = 3;
addl    $1, -4(%rbp) # i = i + 1;

Lastly, we return from the main function

movl    $0, %eax # Store 0 in the "return register"
popq    %rbp     # Restore stackpointer
.cfi_def_cfa 7, 8
ret              # return

Note that there is not a 1-1 relationship between lines. Not even for very simple lines.

Please also note that C imposes requirement on the observable behavior of the program and not on the generated assembly. So for instance, a compiler might remove the whole body for the main function because the variable i is not used in an observable way. And it will if you use optimization. When I recompiled your code with -O3 I got this instead:

/tmp/$ cat main.s
    .file   "main.c"
    .text
    .section    .text.startup,"ax",@progbits
    .p2align 4
    .globl  main
    .type   main, @function
main:
.LFB11:
    .cfi_startproc
    xorl    %eax, %eax
    ret
    .cfi_endproc
.LFE11:
    .size   main, .-main
    .ident  "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
    .section    .note.GNU-stack,"",@progbits

Notice how much that got removed from main. It can be interesting that movl $0, %eax has changed to xorl %eax, %eax. If you think about it, it's pretty obvious that this is a "set zero" operation. One could reasonably argue why anyone would write stuff like that. Well, the optimizer does certainly not optimize for readability. There are a few reasons why it is better. You can read about them here: What is the best way to set a register to zero in x86 assembly: xor, mov or and?

I see that there is a difference between i=3 and i=i+1 statements. One is with movl and the other is addl. From the little assembly I've caught alongside learning C, the movl is a step where value is copied from a register into a location memory. I don't quite understand what addl does exactly...could you explain it's function, please ? Thanks. — painkiller, Nov 02 '19 at 01:11
@painkiller I was editing my post when you commented. More details added. — klutt, Nov 02 '19 at 01:14
So from I understand.... "-4(%rbp)" is the location memory of variable i ? and addl differs from movl because within "i= i+1" the compiler recognize i on the right side and only adds 1 to the "-4(%rbp)", and not 3+1 ? — painkiller, Nov 02 '19 at 01:20
@painkiller I'm not sure exactly how the assembly instructions work in detail. You will have to find and read the documentation for information about such things. — klutt, Nov 02 '19 at 01:22
But the -4 is an offset to the current stackframe. When inspecting assembly, each variable typically (when not using optimizer) have it's own offset. — klutt, Nov 02 '19 at 01:26
Ok, thank you very much. Also, do you with my conclusion about i=i+1 and how does the compiler "reads" i+1 ? (not in assembly code, in general) — painkiller, Nov 02 '19 at 01:28
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/201750/discussion-between-painkiller-and-klutt). — painkiller, Nov 02 '19 at 01:32

How does x=x+1 is evaluated by the compiler and how is represented in assembly?

1 Answers1