Sorry for the vagueness of this question's title, but I'm not sure how to ask this exactly.
The following code, when executed on an Arduino microprocessor (c++ compiled for an ATMega328 microprocessor) works fine. Return values shows in comments in the code:
// Return the index of the first semicolon in a string
int detectSemicolon(const char* str) {
int i = 0;
Serial.print("i = ");
Serial.println(i); // prints "i = 0"
while (i <= strlen(str)) {
if (str[i] == ';') {
Serial.print("Found at i = ");
Serial.println(i); // prints "Found at i = 2"
return i;
}
i++;
}
Serial.println("Error"); // Does not execute
return -999;
}
void main() {
Serial.begin(250000);
Serial.println(detectSemicolon("TE;ST")); // Prints "2"
}
This outputs "2" as the position of the first semicolon, as expected.
However, if I change the first line of the detectSemicolon
function to int i;
i.e. without the explicit initialisation, I get problems. Specifically, the output is "i = 0" (good), "Found at i = 2" (good), "-999" (bad!).
So the function is returning -999 despite having executed the print statement immediately before a return 2;
line and despite never executing the print statement immediately before the return -999;
line.
Can someone help me to understand what's happening here? I understand that variables inside functions in c can theoretically contain any old junk unless they're initialised, but here I'm specifically checking in a print statement that this hasn't happened, and yet...
EDIT: Thanks to everyone who's chipped in, and particularly to underscore_d for their great answer. It seems like undefined behaviour is indeed causing the compiler to just skip anything involving i
. Here's some of the assembly with the serial.prints within detectSemicolon commented out:
void setup() {
Serial.begin(250000);
Serial.println(detectSemicolon("TE;ST")); // Prints "2"
d0: 4a e0 ldi r20, 0x0A ; 10
d2: 50 e0 ldi r21, 0x00 ; 0
d4: 69 e1 ldi r22, 0x19 ; 25
d6: 7c ef ldi r23, 0xFC ; 252
d8: 82 e2 ldi r24, 0x22 ; 34
da: 91 e0 ldi r25, 0x01 ; 1
dc: 0c 94 3d 03 jmp 0x67a ; 0x67a <_ZN5Print7printlnEii>
It looks like the compiler is actually completely disregarding the while loop and concluding that the output will always be "-999", and so it doesn't even bother with a call to the function, instead hard coding 0xFC19. I'll have another look with the serial.prints enabled so that the function still gets called, but this is a strong pointer I think.
EDIT 2:
For those who really care, here's a link to the disassembled code exactly as shown above (in the UB case):
If you look carefully, the compiler seems to be designating register 28 as the location of i
and "initialising" it to zero in line d8
. This register gets treated as if it contains i
throughout in the while loops, if statements etc, which is why the code appears to work and the print statements output as expected (e.g. line 122 where "i" gets incremented).
However, when it comes to returning this pseudo-variable, this is a step too far for our tried and tried-upon compiler; it draws the line, and dumps us to the other return statement (line 120 jumps to line 132, loading "-999" into registers 24 and 25 before returning to main()
).
Or at least, that's as far as I can get with my limited grasp of assembly. Moral of the story is weird stuff happens when your code's behaviour is undefined.