0

I'm trying compile webrtc on 32bit centos6 and getting following error. but this works well on 64bit centos. Can someone help me?

row_gcc.cc:3574:4: error: ‘asm’ operand has impossible constraints

The compile command:

CXX obj/chromium/src/third_party/libyuv/source/libyuv.row_gcc.o
FAILED: obj/chromium/src/third_party/libyuv/source/libyuv.row_gcc.o 
c++ -MMD -MF obj/chromium/src/third_party/libyuv/source/libyuv.row_gcc.o.d 
-DV8_DEPRECATION_WARNINGS -D_FILE_OFFSET_BITS=64 -DCHROMIUM_BUILD 
-DUI_COMPOSITOR_IMAGE_TRANSPORT -DUSE_AURA=1 -DUSE_PANGO=1 -DUSE_CAIRO=1 
-DUSE_DEFAULT_RENDER_THEME=1 -DUSE_LIBJPEG_TURBO=1 -DUSE_X11=1 
-DUSE_CLIPBOARD_AURAX11=1 -DENABLE_WEBRTC=1 -DENABLE_MEDIA_ROUTER=1 
-DENABLE_PEPPER_CDMS -DENABLE_NOTIFICATIONS -DENABLE_TOPCHROME_MD=1 
-DUSE_UDEV -DFIELDTRIAL_TESTING_ENABLED -DENABLE_TASK_MANAGER=1 
-DENABLE_EXTENSIONS=1 -DENABLE_PDF=1 -DENABLE_PLUGINS=1 
-DENABLE_SESSION_SERVICE=1 -DENABLE_THEMES=1 -DENABLE_PRINTING=1 
-DENABLE_BASIC_PRINTING=1 -DENABLE_PRINT_PREVIEW=1 -DENABLE_SPELLCHECK=1 
-DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 
-DENABLE_SETTINGS_APP=1 -DENABLE_SUPERVISED_USERS=1 -DENABLE_MDNS=1 
-DENABLE_SERVICE_DISCOVERY=1 -DV8_USE_EXTERNAL_STARTUP_DATA 
-DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL 
-DHAVE_JPEG -DUSE_LIBPCI=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-DDYNAMIC_ANNOTATIONS_ENABLED=1 -DWTF_USE_DYNAMIC_ANNOTATIONS=1 
-D_DEBUG -Igen 
-I../../chromium/src/third_party/libyuv/include 
-I../../chromium/src/third_party/libyuv 
-I../../chromium/src/third_party/libjpeg_turbo -fstack-protector 
--param=ssp-buffer-size=4  -pthread -fno-strict-aliasing -Wall -Wno-extra 
-Wno-unused-parameter -Wno-missing-field-initializers -fvisibility=hidden 
-pipe -fPIC 
-B/home/test/webrtc-checkout/src/third_party/binutils/Linux_ia32/Release/bin 
-Wno-unused-local-typedefs -msse2 -mfpmath=sse -mmmx -m32 -O0 -g 
-funwind-tables -fno-exceptions -fno-rtti -fno-threadsafe-statics 
-fvisibility-inlines-hidden 
-std=gnu++11 -Wno-narrowing  
-c ../../chromium/src/third_party/libyuv/source/row_gcc.cc 
-o  obj/chromium/src/third_party/libyuv/source/libyuv.row_gcc.o
../../chromium/src/third_party/libyuv/source/row_gcc.cc: 
In function ‘void libyuv::BlendPlaneRow_SSSE3(const uint8*, 
const uint8*, const uint8*, uint8*, int)’:
../../chromium/src/third_party/libyuv/source/row_gcc.cc:3574:4: 
error: ‘asm’ operand has impossible constraints

gcc version: 4.8.5

as version:GNU assembler (GNU Binutils) 2.26.20160125

third_party/binutils/Linux_ia32/Release/bin/as --version
GNU assembler (GNU Binutils) 2.26.20160125
Copyright (C) 2015 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `i686-pc-linux-gnu'.

The code in chromium/src/third_party/libyuv/source/row_gcc.cc:

void BlendPlaneRow_SSSE3(const uint8* src0, const uint8* src1,
                     const uint8* alpha, uint8* dst, int width) {
asm volatile (
"pcmpeqb    %%xmm5,%%xmm5                  \n"
"psllw      $0x8,%%xmm5                    \n"
"mov        $0x80808080,%%eax              \n"
"movd       %%eax,%%xmm6                   \n"
"pshufd     $0x0,%%xmm6,%%xmm6             \n"
"mov        $0x807f807f,%%eax              \n"
"movd       %%eax,%%xmm7                   \n"
"pshufd     $0x0,%%xmm7,%%xmm7             \n"
"sub        %2,%0                          \n"
"sub        %2,%1                          \n"
"sub        %2,%3                          \n"

// 8 pixel loop.
LABELALIGN
"1:                                          \n"
"movq       (%2),%%xmm0                    \n"
"punpcklbw  %%xmm0,%%xmm0                  \n"
"pxor       %%xmm5,%%xmm0                  \n"
"movq       (%0,%2,1),%%xmm1               \n"
"movq       (%1,%2,1),%%xmm2               \n"
"punpcklbw  %%xmm2,%%xmm1                  \n"
"psubb      %%xmm6,%%xmm1                  \n"
"pmaddubsw  %%xmm1,%%xmm0                  \n"
"paddw      %%xmm7,%%xmm0                  \n"
"psrlw      $0x8,%%xmm0                    \n"
"packuswb   %%xmm0,%%xmm0                  \n"
"movq       %%xmm0,(%3,%2,1)               \n"
"lea        0x8(%2),%2                     \n"
"sub        $0x8,%4                        \n"
"jg        1b                              \n"
: "+r"(src0),       // %0
"+r"(src1),       // %1
"+r"(alpha),      // %2
"+r"(dst),        // %3
"+r"(width)       // %4
:: "memory", "cc", "eax", "xmm0", "xmm1", "xmm2", "xmm5", "xmm6", "xmm7"
);
  • 1
    I don't immediately see anything wrong with this. However, you are using a lot of registers. Given that this works on x64, I'm wondering if you are just running out? For an experiment, what happens if you `push eax` before the $0x80808080 move, pop it after the xmm7 move, and remove it from the clobbers? That would get gcc 1 more general purpose register. Something to try, anyway. – David Wohlferd May 28 '16 at 01:48
  • As an alternative (or perhaps in addition), you can also change width from "+r" to "+m". – David Wohlferd May 28 '16 at 01:54
  • 1
    David is correct, and specifically the libyuv is assuming there are 5 registers available. Normally these general purpose registers are available _EAX_, _EBX_, _ECX_, _EDX_, _ESI_, _EDI_, and _EBP_. _EAX_ is in the clobber list so unusable. Since you aren't omitting stack frame pointers _EBP_ isn't available. Because it is being built with `-fPIC` _EBX_ isn't available. In essence the only available registers are _ECX_, _EDX_, _ESI_, _EDI_ in this 32-bit build. 5 registers are necessary, 4 available. Might try recompiling it all again with `-fomit-frame-pointer` or even `-O3` isntead of `-O0` – Michael Petch May 28 '16 at 02:35
  • Nothing wrong with David's suggestion about `width` using constraint `+rm`. I'm concerned there might be other code that may present a problem (running out of registers), thus my attempt to solve it with compiler options (if possible). – Michael Petch May 28 '16 at 02:37
  • @DavidWohlferd: That's a good suggestion to test the theory for 32bit, but you can't safely push/pop in 64bit inline asm in the SysV ABI: [it steps on the red zone.](http://stackoverflow.com/questions/34520013/using-base-pointer-register-in-c-inline-asm/34522750#34522750). It's also obviously worse code compared to letting the compiler be smart about spilling/reloading. – Peter Cordes May 28 '16 at 03:56
  • 2
    @ Michael Petch: I compile with option '-O2' and it works. – bian xuegong May 28 '16 at 14:11
  • This code would be better if it used a single-register addressing mode for the store, rather than one of the loads, [so the store can micro-fuse](http://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes/31027695#31027695). – Peter Cordes May 28 '16 at 15:57

1 Answers1

3

Like David Wohlfred suggests the problem is that you've run out of registers. There's 8 general purpose registers, one of which is ESP, the stack pointer and can never be used to satisfy a constraint. Two more registers, EBP and EBX also can't be use to satisfy a constraint because of the options you're compiling with. Since you're compiling without optimization, the EBP register is reserved for the frame pointer, and because you're using the -fPIC flag, the EBX register is reserved to access the GOT (Global Object Table).

So that leave five registers that can used to satisfy register constraints, EAX, ECX, EDX, ESI, and EDI. However your asm statement clobbers EAX, so that can't be used, leaving only four registers. The asm statement has five operands that have register constraints, meaning that it needs five separate registers but there's only four available for the compiler to use. So the constraints are impossible to satisfy.

A simple solution would be to free a register by removing the use of EAX in the asm statement. This register is used to load two different constant values into two different XMM registers. Instead its possible to load the constants directly from memory. For example:

void BlendPlaneRow_SSSE3(const uint8* src0, const uint8* src1,
                     const uint8* alpha, uint8* dst, int width) {
        static unsigned const __attribute__((aligned(4))) xmm6[4] = {
                 0x80808080, 0x80808080, 0x80808080, 0x80808080
        };
        static unsigned const __attribute__((aligned(4))) xmm7[4] = {
                 0x807f807f, 0x807f807f, 0x807f807f, 0x807f807f
        };

asm volatile (
"pcmpeqb    %%xmm5,%%xmm5                  \n"
"psllw      $0x8,%%xmm5                    \n"
"movaps     %5,%%xmm6                      \n"
"movaps     %6,%%xmm7                      \n"
"sub        %2,%0                          \n"
"sub        %2,%1                          \n"
"sub        %2,%3                          \n"

// 8 pixel loop.
LABELALIGN
"1:                                          \n"
"movq       (%2),%%xmm0                    \n"
"punpcklbw  %%xmm0,%%xmm0                  \n"
"pxor       %%xmm5,%%xmm0                  \n"
"movq       (%0,%2,1),%%xmm1               \n"
"movq       (%1,%2,1),%%xmm2               \n"
"punpcklbw  %%xmm2,%%xmm1                  \n"
"psubb      %%xmm6,%%xmm1                  \n"
"pmaddubsw  %%xmm1,%%xmm0                  \n"
"paddw      %%xmm7,%%xmm0                  \n"
"psrlw      $0x8,%%xmm0                    \n"
"packuswb   %%xmm0,%%xmm0                  \n"
"movq       %%xmm0,(%3,%2,1)               \n"
"lea        0x8(%2),%2                     \n"
"sub        $0x8,%4                        \n"
"jg        1b                              \n"
: "+r"(src0),       // %0
"+r"(src1),       // %1
"+r"(alpha),      // %2
"+r"(dst),        // %3
"+r"(width)       // %4
: "m" (xmm6),
"m" (xmm7)
: "memory", "cc", "xmm0", "xmm1", "xmm2", "xmm5", "xmm6", "xmm7"
);
}

A better solution would be to rewrite this to use intrinsics and let the compiler do all the register allocation.

Ross Ridge
  • 35,323
  • 6
  • 64
  • 105
  • 1
    Or just ask for the constants to be in xmm regs. You can use an `"x"` constraint with a scalar integer value, and the compiler will `movd` or `movq` it for you. – Peter Cordes May 28 '16 at 04:08
  • @PeterCordes Didn't know that, though I figured loading the constant from memory would be faster than the three instructions it's replacing anyways. – Ross Ridge May 28 '16 at 04:33
  • Probably fine, yeah. The risk of a cache miss is balanced by the reduction in uop-cache footprint. However, the `0x80808080` could be [generated on the fly with 2 more instructions](http://stackoverflow.com/questions/35085059/what-are-the-best-instruction-sequences-to-generate-vector-constants-on-the-fly): SSSE3 `pabsb %xmm5, %xmm6` (0x010101...), then left-shift xmm6 to put those single set bits at the high position. – Peter Cordes May 28 '16 at 13:07
  • Actually, `0x8080...` can also be generated from 0xFF00... with `movdqa %xmm5, %xmm6` / `packsswb %xmm6,%xmm6`. (0xFF00 saturates to 0x80). Saves one ALU instruction outside the loop (assuming mov-elimination), and maybe 2 bytes of code, but reduces ILP. – Peter Cordes May 28 '16 at 13:14