8

For the sake of curiosity I'm trying to read the flag register and print it out in a nice way.

I've tried reading it using gcc's asm keyword, but i can't get it to work. Any hints how to do it? I'm running a Intel Core 2 Duo and Mac OS X. The following code is what I have. I hoped it would tell me if an overflow happened:

#include <stdio.h>

int main (void){
  int a=10, b=0, bold=0;
  printf("%d\n",b);
  while(1){
    a++;
  __asm__ ("pushf\n\t"
   "movl 4(%%esp), %%eax\n\t"
   "movl %%eax , %0\n\t"
   :"=r"(b)      
   :         
   :"%eax"        
   ); 
  if(b!=bold){ 
    printf("register changed \n %d\t to\t %d",bold , b);
  }
  bold = b;
  }
}

This gives a segmentation fault. When I run gdb on it I get this:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000005fbfee5c
0x0000000100000eaf in main () at asm.c:9
9       asm ("pushf \n\t"
Benedikt Wutzi
  • 407
  • 7
  • 20

6 Answers6

5

You can use the PUSHF/PUSHFD/PUSHFQ instruction (see http://siyobik.info/main/reference/instruction/PUSHF%2FPUSHFD for details) to push the flag register onto the stack. From there on you can interpret it in C. Otherwise you can test directly (against the carry flag for unsigned arithmetic or the overflow flag for signed arithmetic) and branch.

(to be specific, to test for the overflow bit you can use JO (jump if set) and JNO (jump if not set) to branch -- it's bit #11 (0-based) in the register)

About the EFLAGS bit layout: http://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture#EFLAGS_Register

A very crude Visual C syntax test (just wham-bam / some jumps to debug flow), since I don't know about the GCC syntax:

int test2 = 2147483647; // max 32-bit signed int (0x7fffffff)
unsigned int flags_w_overflow, flags_wo_overflow;
__asm
{
    mov ebx, test2 // ebx = test value

    // test for no overflow
    xor eax, eax // eax = 0
    add eax, ebx // add ebx
    jno no_overflow // jump if no overflow

testoverflow:
    // test for overflow
    xor ecx, ecx // ecx = 0
    inc ecx // ecx = 1
    add ecx, ebx // overflow!
    pushfd // store flags (32 bits)
    jo overflow // jump if overflow
    jmp done // jump if not overflown :(

no_overflow:
    pushfd // store flags (32 bits)
    pop edx // edx = flags w/o overflow
    jmp testoverflow // back to next test

overflow:
    jmp done // yeah we're done here :)

done:
    pop eax // eax = flags w/overflow
    mov flags_w_overflow, eax // store
    mov flags_wo_overflow, edx // store
}

if (flags_w_overflow & (1 << 11)) __asm int 0x3 // overflow bit set correctly
if (flags_wo_overflow & (1 << 11)) __asm int 0x3 // overflow bit set incorrectly

return 0;
nielsj
  • 1,441
  • 12
  • 24
  • I must admit I can't really read GCC inline syntax but make sure you place the pushf instruction *after* the assignment. I'll test it in VCC. Oh also, use PUSHFD for full 32-bits. – nielsj Jul 29 '11 at 17:01
  • I can't use PUSHFD: it's not supported in 64-bit mode. At least thats what gcc tells me ;) . After which assignment should I push? – Benedikt Wutzi Jul 29 '11 at 17:05
  • I wouldn't know about that but there is also a 64-bit variant: PUSHFQ. Typically you put the push instruction *after* an arithmetic operation that would affect the bit you're after, as soon as possible. Just like with a conditional branch like JO I mentioned. Check my edit. – nielsj Jul 29 '11 at 17:12
  • @BenediktWutzi: move your increment to the inline asm block, written in assembly, if you want to do this hacks. – ninjalj Jul 29 '11 at 17:13
  • the siyobik link is dead – hlitz Feb 21 '15 at 00:30
5

The compiler can reorder instructions, so you cannot rely on your lahf being next to the increment. In fact, there may not be an increment at all. In your code, you don't use the value of a, so the compiler can completely optimize it out.

So, either write the increment + check in assembler, or write it in C.

Also, lahf loads only ah (8 bits) from eflags, and the Overflow flag is outside of that. Better use pushf; pop %eax.

Some tests:

#include <stdio.h>

int main (void){
    int a=2147483640, b=0, bold=0;
    printf("%d\n",b);
    while(1){
            a++;
            __asm__ __volatile__ ("pushf \n\t"
                            "pop %%eax\n\t"
                            "movl %%eax, %0\n\t"
                            :"=r"(b)
                            :
                            :"%eax"
                    );
            if((b & 0x800) != (bold & 0x800)){
                    printf("register changed \n %x\t to\t %x\n",bold , b);
            }
            bold = b;
    }
}


$ gcc -Wall  -o ex2 ex2.c
$ ./ex2  # Works by sheer luck
0
register changed
 200206  to      200a96
register changed
 200a96  to      200282

$ gcc -Wall -O -o ex2 ex2.c
$ ./ex2  # Doesn't work, the compiler hasn't even optimized yet!
0
ninjalj
  • 39,486
  • 8
  • 94
  • 141
  • Much more applicable answer, nice :) I stupidly overlooked the LAHF (never really used that instruction either). – nielsj Jul 29 '11 at 17:15
  • 1
    I think `#define overflow32(a,b,c) \ ( ( ((a)>>31)==((b)>>31) ) && ( ((a)>>31)!=((c)>>31) ) ) ` works for an overflow check after addition on C. – ninjalj Jul 29 '11 at 17:22
  • Looks okay. The whole issue makes me think: what kind of situation would you explicitly want to check this in? I mean sure it can happen but generally logic should strike preemptively to prevent this if not desired. Then again there are some very strict pieces of software (avionics et cetera) but they already have suites that generate C code after strenous testing against situations such as these. – nielsj Jul 29 '11 at 17:28
  • @nj: well, I took that from a MIPS emulation core I was working on, back in the jurassic times when I had spare time (MIPS has signed arithmetic that raise exceptions on overflow). – ninjalj Jul 29 '11 at 17:41
  • Yeah that makes good sense then - never wrote MIPS assembler nor much MIPS-targeted code anyway :) And x86 assembler has been 4 years at least as well. – nielsj Jul 29 '11 at 18:02
  • There's no point using a separate `mov` and hard-coding `%%eax`, just `pop %0`. But note that this isn't safe in ABIs with a red-zone (x86-64 System V), because you can't tell the compiler that you clobber memory below `%rsp`. So a 64-bit port of this is would need to `sub $128, %rsp` first. **Or if you only want to check the overflow condition, `seto %0` with an 8-bit output register.** Or in GCC6 syntax, use a flag output condition. Of course this is all pointless because there's no guarantee of how `a++` compiled; GCC might have used LEA or optimized it away / into something else. – Peter Cordes May 21 '19 at 10:52
4

This maybe the case of the XY problem. To check for overflow you do not need to get the hardware overflow flag as you think because the flag can be calculated easily from the sign bits

An illustrative example is what happens if we add 127 and 127 using 8-bit registers. 127+127 is 254, but using 8-bit arithmetic the result would be 1111 1110 binary, which is -2 in two's complement, and thus negative. A negative result out of positive operands (or vice versa) is an overflow. The overflow flag would then be set so the program can be aware of the problem and mitigate this or signal an error. The overflow flag is thus set when the most significant bit (here considered the sign bit) is changed by adding two numbers with the same sign (or subtracting two numbers with opposite signs). Overflow never occurs when the sign of two addition operands are different (or the sign of two subtraction operands are the same).

Internally, the overflow flag is usually generated by an exclusive or of the internal carry into and out of the sign bit. As the sign bit is the same as the most significant bit of a number considered unsigned, the overflow flag is "meaningless" and normally ignored when unsigned numbers are added or subtracted.

https://en.wikipedia.org/wiki/Overflow_flag

So the C implementation is

int add(int a, int b, int* overflowed)
{
    // do an unsigned addition since to prevent UB due to signed overflow
    unsigned int r = (unsigned int)a + (unsigned int)b;

    // if a and b have the same sign and the result's sign is different from a and b
    // then the addition was overflowed
    *overflowed = !!((~(a ^ b) & (a ^ r)) & 0x80000000);
    return (int)r;
}

This way it works portably on any architectures, unlike your solution which only works on x86. Smart compilers may recognize the pattern and change to using the overflow flag if possible. On most RISC architectures like MIPS or RISC-V there is no flag and all signed/unsigned overflow must be checked in software by analyzing the sign bits like that

Some compilers have intrinsics for checking overflow like __builtin_add_overflow in Clang and GCC. And with that intrinsic you can also easily see how the overflow is calculated on non-flag architectures. For example on ARM it's done like this

add     w3, w0, w1  # r = a + b
eon     w0, w0, w1  # a = a ^ ~b
eor     w1, w3, w1  # b = b ^ r
str     w3, [x2]    # store sum ([x2] = r)
and     w0, w1, w0  # a = a & b = (a ^ ~b) & (b ^ r)
lsr     w0, w0, 31  # overflowed = a >> 31
ret

which is just a variation of what I've written above

See also


For unsigned int it's much easier

unsigned int a, b, result = a + b;
int overflowed = (result < a);
Community
  • 1
  • 1
phuclv
  • 27,258
  • 11
  • 104
  • 360
  • The double negation in `*overflowed = !!((~(a ^ b)` is intentional or a typo? – Crouching Kitten May 26 '20 at 02:03
  • 2
    @CrouchingKitten it's intentional. It's the standard way to coerce a value to 0 and 1 and is used a lot in Linux [What does !!(x) mean in C (esp. the Linux kernel)?](https://stackoverflow.com/q/2527086/995714) – phuclv May 26 '20 at 03:38
3

You can't assume anything about how GCC implemented the a++ operation, or whether it even did the computation before your inline asm, or before a function call.

You could make a an (unused) input to your inline asm, but gcc could still have chosen to use lea to copy-and-add instead of inc or add, or constant-propagation after inlining could have turned it into a mov-immediate.

And of course gcc could have done some other computation that writes FLAGS right before your inline asm.

There is no way to make a++; asm(...) safe for this

Stop now, you're on the wrong track. If you insist on using asm, you need to do the add or inc inside the asm so you can read the flags output. If you only care about the overflow flag, use SETCC, specifically seto %0, to create an 8-bit output value. Or better, use GCC6 flag-output syntax to tell the compiler that a boolean output result is in the OF condition in FLAGS at the end of your inline asm.

Also, signed overflow in C is undefined behaviour, so actually causing overflow in a++ is already a bug. It usually won't manifest itself if you somehow detect it after the fact, but if you use a as an array index or something gcc may have widened it to 64-bit to avoid redoing sign-extension.

GCC has builtins for add with overflow detection, since gcc5

There are builtins for signed/unsigned add, sub, and mul, see the GCC manual, that avoid signed-overflow UB and tell you if there was overflow.

  • bool __builtin_add_overflow (type1 a, type2 b, type3 *res) is the generic version
  • bool __builtin_sadd_overflow (int a, int b, int *res) is the signed int version
  • bool __builtin_saddll_overflow (long long int a, long long int b, long long int *res) is the signed 64-bit long long version.

The compiler will attempt to use hardware instructions to implement these built-in functions where possible, like conditional jump on overflow after addition, conditional jump on carry etc.

There's a saddl version in case you want the operation for whatever size long is on the target platform. (For x86-64 gcc, int is always 32-bit, long long is always 64-bit, but long depends on Windows vs. non-Windows. For platforms like AVR, int would be 16-bit, and only long would be 32-bit.)

int checked_add_int(int a, int b, bool *of) {
    int result;
    *of = __builtin_sadd_overflow(a, b, &result);
    return result;
}

compiles with gcc -O3 for x86-64 System V to this asm, on Godbolt

checked_add_int:
        mov     eax, edi
        add     eax, esi             # can't use the normal lea eax, [rdi+rsi]
        seto    BYTE PTR [rdx]
        and     BYTE PTR [rdx], 1    # silly compiler, it's already 0/1
        ret

ICC19 uses setcc into an integer register and then stores that, same difference as far as uops, but worse code-size.

After inlining to a caller that did if(of) {} it should just jo or jno instead of actually using setcc to create an integer 0/1; in general this should inline efficiently.


Also, since gcc7, there's a builtin to ask if an addition (after promotion to a given type) would overflow, without returning the value.

#include <stdbool.h>
int overflows(int a, int b) {
    bool of = __builtin_add_overflow_p(a, b, (int)0);
    return of;
}

compiles with gcc -O3 for x86-64 System V to this asm, also on Godbolt

overflows:
        xor     eax, eax
        add     edi, esi
        seto    al
        ret

See also Detecting signed overflow in C/C++

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
2

Others have offered good alternate code and reasons why what you're trying to do probably doesn't give the result you want, but the actual bug in your code is that you corrupted the stack state by pushing without popping. I would rewrite the asm as:

pushf
pop %0

Or you could just add $4,%%esp at the end of your asm to fix the stack pointer if you prefer the inefficient way.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
R.. GitHub STOP HELPING ICE
  • 195,354
  • 31
  • 331
  • 669
  • pushfd seems to be in order here? and yeah I'm going to do a bit of research on how to read that GCC syntax; it comes across as an eyesore but there's probably some good sense to it. – nielsj Jul 30 '11 at 09:38
  • @nielsj: `pushf` assembles as `pushfd` in 32-bit mode. No need to make it explicit. You'd need to write `pushfw` if you wanted a 16-bit push (with an operand-size prefix in the machine code) in any mode where that wasn't the default (32 or 64-bit mode). – Peter Cordes May 21 '19 at 10:57
0

The following C program will read the FLAGS register when compiled with GCC and any x86 or x86_64 machine following a calling convention in which integers are returned to %eax. You may need to pass the -zexecstack argument to the compiler.

#include<stdio.h>
#include<stdlib.h>

int(*f)()=(void*)L"\xc3589c";

int main( int argc, char **argv ) {
  if( argc < 3 ) {
    printf( "Usage: %s <augend> <addend>\n", *argv );
    return 0;
  }
  int a=atoi(argv[1])+atoi(argv[2]);
  int b=f();
  printf("%d CF %d PF %d AF %d ZF %d SF %d TF %d IF %d DF %d OF %d IOPL %d NT %d RF %d VM %d AC %d VIF %d VIP %d ID %d\n", a, b&1, b/4&1, b>>4&1, b>>6&1, b>>7&1, b>>8&1, b>>9&1, b>>10&1, b>>11&1, b>>12&3, b>>14&1, b>>16&1, b>>17&1, b>>18&1, b>>19&1, b>>20&1, b>>21&1 );
}

Try it online!

The funny looking string literal disassembles to

0x0000000000000000:  9C    pushfq 
0x0000000000000001:  58    pop    rax
0x0000000000000002:  C3    ret    
ceilingcat
  • 563
  • 3
  • 10
  • Why is `f` a non-const function pointer? That means callers will have to actually make an indirect `call [QWORD PTR f[rip]]` instead of optimizing to `call .LC1`. https://godbolt.org/z/5Eu-pS Also, you left it unprototyped (unknown args), instead of `f(void)`, so x86-64 System V requires the caller to set AL=0. Compilers typically do this with `xor eax,eax` destroying the flag-result of `add`. (Which by coincidence gcc with default tuning actually use to compute the `+` right before calling `f()`. ICC chooses to compute `a` later, just saving both atoi results until after the `call .LC1` – Peter Cordes May 21 '19 at 11:11
  • Putting the machine code in its own non-inline function does work around clobbering the red-zone, though. But anyway, with optimization disabled so compilers use `mov eax,0` instead of `xor eax,eax`, your code happens to work. With my changes some compilers will still work with optimization enabled for *this* case, **but in general you need to do the add inside inline asm as well** for it to be safe. Some versions of some compilers might always choose `lea` for addition, e.g. gcc with `-mtune=atom`, and of course the `+` can be optimized into part of something else or done after the call. – Peter Cordes May 21 '19 at 11:15