1

I am trying to debug a piece of assembly program (x86 64-bit), and according to the gdb info, it crashes when using the following instruction:

xorpd  0x1770(%rip),%xmm12        # 0x40337c <S_0x403230>

However, it seems to me that memory 0x40337c is perfectly normal:

(gdb) x /10x 0x40337c
0x40337c <S_0x403230>:  0x00000000      0x80000000      0x00000000      0x00000000
0x40338c <S_0x403240>:  0xf149f2ca      0x00000000      0x746e7973      0x203a7861
0x40339c <S_0x403248+8>:        0x206d626c      0x6d69743c

Another wired thing is that, this piece of code crashes everytime when I run it in the command line, as well as inside gdb. However, when I debug it in the valgrind, it would not crash !

☁  src [master] ⚡ valgrind ./a.out 20 reference.dat 0 1 100_100_130_cf_a.of
==18329== Memcheck, a memory error detector
==18329== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==18329== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==18329== Command: ./a.out 20 reference.dat 0 1 100_100_130_cf_a.of
==18329==
MAIN_printInfo:
    grid size      : 100 x 100 x 130 = 1.30 * 10^6 Cells
    nTimeSteps     : 20
    result file    : reference.dat
    action         : nothing
    simulation type: channel flow
    obstacle file  : 100_100_130_cf_a.of

LBM_showGridStatistics:
   nObstacleCells:  498440 nAccelCells:       0 nFluidCells:  801560
  minRho:   1.0000 maxRho:   1.0000 mass: 1.300000e+06
 minU: 0.000000e+00 maxU: 0.000000e+00

LBM_showGridStatistics:
  nObstacleCells:  498440 nAccelCells:       0 nFluidCells:  801560
  minRho:   1.0000 maxRho:   1.0431 mass: 1.300963e+06
  minU: 0.000000e+00 maxU: 1.272361e-02

==18329==
==18329== HEAP SUMMARY:
  ==18329==     in use at exit: 0 bytes in 0 blocks
 ==18329==   total heap usage: 4 allocs, 4 frees, 428,801,136 bytes allocated
 ==18329==
==18329== All heap blocks were freed -- no leaks are possible
==18329==
==18329== For counts of detected and suppressed errors, rerun with: -v
==18329== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I uploaded the binary code at here for your reference if you are interest. The assembly program is actually produced by a binary rewritter, and I used to be able to produce bug-free code. So I believe this may not be a hard-to-debug pointer dereference issue, should be something very easy to fix.. However, I really have no idea where goes wrong, and it seems perfectly normal in the gdb debugging (x/10x 0x40337)

So here is my question,

  1. given the debug info (x/10x 0x40337c), where could possibly go wrong?

  2. Why the binary code would not crash in valgrind?

lllllllllllll
  • 6,731
  • 6
  • 28
  • 67

1 Answers1

4

Valgrind runs your code on a simulated x86 CPU. Apparently it doesn't simulate alignment checking.


128-bit and larger memory operands to SSE instructions always need to be naturally aligned. (e.g. to a 16B boundary for a 16B load like this one). The exception is MOVUPS (_mm_loadu_ps) (and MOVUPD / MOVDQU).

0x40337c is not 16B aligned, only 4B. (So it's quite weird that you're using it with XORPD, since I'd expect at least 8B alignment for a double).

AVX instructions are the reverse: the default is no alignment required, but VMOVAPS does require alignment.

Community
  • 1
  • 1
Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • Thank you Peter. Now I know better.. Do you have any idea that how should fix this one? Since this pointer (`S_0x403230`) refers to a location in the `.rodata` section, maybe force the 16b alignment for `.rodata` section would work? – lllllllllllll Nov 15 '16 at 05:24
  • If I remembered correctly, around 1.5 years ago, when I used this binary rewriter and work on the exact same piece of code above, I should somehow make some alignment in the produced assembly code, but I didn't remember the details any more.. I will try to figure it out. Thank you! – lllllllllllll Nov 15 '16 at 05:26
  • @computereasy: Yeah, do whatever you need to do so that your vector constants are aligned, rather than changing your program to use unaligned loads into a tmp register. It sounds like a bug in your rewriting software if it's not preserving at least the alignment that sections specify. You can see that with `readelf -S foo.o`. The `.rodata` section should be part of the text segment, which is usually at least 16B aligned. `S_0x403230` sounds like the original address was aligned, since the last hex digit is 0. – Peter Cordes Nov 15 '16 at 05:45
  • Adding the `.align 16` at the beginning of `.rodata` and `.data` section fix the above issue. Thank you so much for your help! – lllllllllllll Nov 15 '16 at 14:21