unsigned char fun ( unsigned char a, unsigned char b )
{
return(a+b);
}
Disassembly of section .text:
0000000000000000 <fun>:
0: 8d 04 3e lea (%rsi,%rdi,1),%eax
3: c3 retq
Disassembly of section .text:
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e20000ff and r0, r0, #255 ; 0xff
8: e12fff1e bx lr
Disassembly of section .text:
00000000 <fun>:
0: 1840 adds r0, r0, r1
2: b2c0 uxtb r0, r0
4: 4770 bx lr
Disassembly of section .text:
00000000 <fun>:
0: 952e add x10,x10,x11
2: 0ff57513 andi x10,x10,255
6: 8082 ret
different targets all from gcc.
This is a compiler choice so you need to talk to the compiler authors about it, not Stack Overflow. The compiler needs to functionally implement the high level language, so in these cases all of which have 32 bit GPRs the choice is do you mask every operation or at least before the value is left to be used later or do you assume that the register is dirty and you need to mask it before you use it or do you have architectural features like eax can be accessed in smaller parts ax, al, and design around that? so long as it functionally works any solution is perfectly fine.
One compiler may choose to use al for 8 bit operations another may choose eax (which is likely more efficient from a performance perspective, there is stuff you can read up on that topic) in both cases you have to design for the remaining bits in the rax/eax/ax register and not oops it later and use the larger register.
Where you don't have this option of partial register access you pretty much need to functionally implement the code and the easy way is to do the mask thing. This would match the C code in this case, and one could argue that the x86 code is buggy because it uses eax but doesn't clip so it does not return an unsigned char.
Make it signed though:
signed char fun ( signed char a, signed char b )
{
return(a+b);
}
Disassembly of section .text:
0000000000000000 <fun>:
0: 8d 04 3e lea (%rsi,%rdi,1),%eax
3: c3 retq
Disassembly of section .text:
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e1a00c00 lsl r0, r0, #24
8: e1a00c40 asr r0, r0, #24
c: e12fff1e bx lr
Same story, one compiler design is clearly going to handle the variable size one way and the other right there and then.
Force it to deal with the size in this function
signed char fun ( signed char a, signed char b )
{
if((a+b)>200) return(1);
return(0);
}
Disassembly of section .text:
0000000000000000 <fun>:
0: 40 0f be f6 movsbl %sil,%esi
4: 40 0f be ff movsbl %dil,%edi
8: 01 f7 add %esi,%edi
a: 81 ff c8 00 00 00 cmp $0xc8,%edi
10: 0f 9f c0 setg %al
13: c3 retq
Disassembly of section .text:
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e35000c8 cmp r0, #200 ; 0xc8
8: d3a00000 movle r0, #0
c: c3a00001 movgt r0, #1
10: e12fff1e bx lr
Because the arm design knows the values passed in are already clipped and this was a greater than they chose to not clip it, possibly because I left this as signed. In the case of x86 though because they don't clip on the way out they clipped on the way into the operation.
unsigned char fun ( unsigned char a, unsigned char b )
{
if((a+b)>200) return(1);
return(0);
}
Disassembly of section .text:
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e35000c8 cmp r0, #200 ; 0xc8
8: d3a00000 movle r0, #0
c: c3a00001 movgt r0, #1
10: e12fff1e bx lr
Now that I would disagree with because for example 0xFF + 0x01 = 0x00 and that is not greater than 200, but this code would pass it through as greater than 200. They also used the signed less than and greater than on an unsigned compare.
unsigned char fun ( unsigned char a, unsigned char b )
{
if(((unsigned char)(a+b))>200) return(1);
return(0);
}
00000000 <fun>:
0: e0800001 add r0, r0, r1
4: e20000ff and r0, r0, #255 ; 0xff
8: e35000c8 cmp r0, #200 ; 0xc8
c: 93a00000 movls r0, #0
10: 83a00001 movhi r0, #1
14: e12fff1e bx lr
Ahh, there you go some C language promotion thing. (just like float f; f=f+1.0; vs f=f+1.0F;)
and that changes the x86 results as well
Disassembly of section .text:
0000000000000000 <fun>:
0: 01 fe add %edi,%esi
2: 40 80 fe c8 cmp $0xc8,%sil
6: 0f 97 c0 seta %al
9: c3 retq
Why does GCC use EAX instead of AL?
And why does djgpp use AL only?
Is it performance issues?
These are compiler design choices, not issues, not performance necessarily, but overall compiler design as to how to implement the high level language with the targets instruction set. Each compiler is free to do that however they wish, no reason to expect gcc and clang and djgpp and others to have the same design choices, no reason to expect gcc version x.x.x and y.y.y to have the same design choices either, so if you go far enough back perhaps it was done differently, perhaps not (and if they had then maybe the commit explains the "why" question and or developer group emails from that time would cover it).