Questions tagged [x86]

x86 is an architecture derived from the Intel 8086 CPU. The x86 family includes the 32-bit IA-32 and 64-bit x86-64 architectures, as well as legacy 16-bit architectures. Questions about the latter should be tagged [x86-16] and/or [emu8086]. Use the [x86-64] tag if your question is specific to 64-bit x86-64. For the x86 FPU, use the tag [x87]. For SSE1/2/3/4 / AVX* also use [sse], and any of [avx] / [avx2] / [avx512] that apply

The x86 family of CPUs contains 16-, 32-, and 64-bit processors from several manufacturers, with backward-compatible instruction sets, going back to the Intel 8086 introduced in 1978.

There is an tag for things specific to that architecture, but most of the info here applies to both. It makes more sense to collect everything here. Questions can be tagged with either or both. Questions specific to features only found in the x86-64 architecture, like RIP-relative addressing, clearly belong in x86-64. Questions like "how to speed up this code with vectors or any other tricks" are fine for x86, even if the intention is to compile for 64bit.

Related tag with tag-wikis:

  • wiki (some good SIMD guides), and (not much there)
  • wiki for guides specific to interfacing with a compiler that way.
  • wiki and wiki have more details about the differences between the two major x86 assembly syntaxes. And for Intel, how to spot which flavour of Intel syntax it is, like NASM vs. MASM/TASM.

Learning resources

Guides for performance tuning / optimisation:

Instruction set / asm syntax references:

OS-specific stuff: ABIs and system-call tables:

  • 16bit interrupt list: PC BIOS system calls (int 10h / int 16h / etc, AH=callnumber), DOS system calls (int 21h/AH=callnumber), and more.

memory ordering:

Specific behaviour of specific implementations

Q&As with good links, or directly useful answers:

FAQs / canonical answers:

If you have a problem involving one of these issues, don't ask a new question until you've read and understood the relevant Q&A.

(TODO: find better question links for these. Ideally questions that make a good duplicate target for new dups. Also, expand this.)

How to get started / Debugging tools + guides

Find a debugger that will let you single-step through your code, and display registers while that happens. This is essential. We get many questions on here that are something like "why doesn't this code work" that could have been solved with a debugger.

On Windows, Visual Studio has a built-in debugger. See Debugging ASM with Visual Studio - Register content will not display. And see Assembly programming - WinAsm vs Visual Studio 2017 for a walk-through of setting up a Visual Studio project for a MASM 32-bit or 64-bit Hello World console application.

On Linux: A widely-available debugger is gdb. See Debugging assembly for some basic stuff about using it on Linux. Also How can one see content of stack with GDB?

There are various GDB front-ends, including GDBgui. Also guides for vanilla GDB:

With layout asm and layout reg enabled, GDB will highlight which registers changes since the last stop. Use stepi to single-step by instructions. Use x to examine memory at a given address (useful when trying to figure out why your code crashed while trying to read or write at a given address). In a binary without symbols (or even sections), you can use starti instead of run to stop before the first instruction. (On older GDB without starti, you can use b *0 as a hack to get gdb to stop on an error.) Use help x or whatever for help on any command.

GNU tools have an Intel-syntax mode that's similar to MASM, which is nice to read but is rarely used for hand-written source (NASM/YASM is nice for that if you want to stick with open-source tools but avoid AT&T syntax):

Another key tool for debugging is tracing system calls. e.g. on a Unix system, strace ./a.out will show you the args and return values of all the system calls your code makes. It knows how to decode the args into symbolic values like O_RDWR, so it's much more convenient (and likely to catch brain-farts or wrong values for constants) than using a debugger to look at registers before/after an int or syscall instruction. Note that it doesn't work correctly on Linux int 0x80 32-bit ABI system calls in 64-bit processes: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.

To debug boot or kernel code, boot it in a bochs, qemu, or maybe even DOSBOX, or any other virtual machine / simulator / emulator. Use the debugging facilities of the VM to get way better information than the usual "it locks up" you will experience with buggy privileged code.

BOCHS is generally recommended for debugging real-mode bootloaders, especially ones that switch to protected mode; BOCHS's built-in debugger understands segmentation (unlike GDB), and can parse a GDT, IDT, and page tables to make sure you got the fields right.

14860 questions
7 answers

Limitations of Intel Assembly Syntax Compared to AT&T

To me, Intel syntax is much easier to read. If I go traipsing through assembly forest concentrating only on Intel syntax, will I miss anything? Is there any reason I would want to switch to AT&T (outside of being able to read others' AT&T assembly)?…
  • 1,196
  • 1
  • 10
  • 10
3 answers

Using gdb to single-step assembly code outside specified executable causes error "cannot find bounds of current function"

I'm outside gdb's target executable and I don't even have a stack that corresponds to that target. I want to single-step anyway, so that I can verify what's going on in my assembly code, because I'm not an expert at x86 assembly. Unfortunately,…
  • 1,614
  • 1
  • 16
  • 17
2 answers

What does "rep; nop;" mean in x86 assembly? Is it the same as the "pause" instruction?

What does rep; nop mean? Is it the same as pause instruction? Is it the same as rep nop (without the semi-colon)? What's the difference to the simple nop instruction? Does it behave differently on AMD and Intel processors? (bonus) Where is the…
Denilson Sá Maia
  • 40,640
  • 31
  • 100
  • 109
3 answers

Double cast to unsigned int on Win32 is truncating to 2,147,483,648

Compiling the following code: double getDouble() { double value = 2147483649.0; return value; } int main() { printf("INT_MAX: %u\n", INT_MAX); printf("UINT_MAX: %u\n", UINT_MAX); printf("Double value: %f\n", getDouble()); …
8 answers

What are IN & OUT instructions in x86 used for?

I've encoutered these to instructions IN & OUT while reading "Understanding Linux Kernel" book. I've looked up reference manual. 5.1.9 I/O Instructions These instructions move data between the processor’s I/O ports and a register or…
  • 47,010
  • 55
  • 140
  • 185
3 answers

Argument order to std::min changes compiler output for floating-point

I was fiddling in Compiler Explorer, and I found that the order of arguments passed to std::min changes the emitted assembly. Here's the example on Godbolt Compiler Explorer double std_min_xy(double x, double y) { return std::min(x,…
  • 1,215
  • 9
  • 24
3 answers

Where is the lock for a std::atomic?

If a data structure has multiple elements in it, the atomic version of it cannot (always) be lock-free. I was told that this is true for larger types because the CPU can not atomically change the data without using some sort of lock. for…
  • 1,241
  • 6
  • 12
7 answers

Is it possible to tell the branch predictor how likely it is to follow the branch?

Just to make it clear, I'm not going for any sort of portability here, so any solutions that will tie me to a certain box is fine. Basically, I have an if statement that will 99% of the time evaluate to true, and am trying to eke out every last…
Andy Shulman
  • 1,723
  • 3
  • 21
  • 31
2 answers

Which variable size to use (db, dw, dd) with x86 assembly?

I am a beginner to assembly and I don't know what all the db, dw, dd, things mean. I have tried to write this little script that does 1+1, stores it in a variable and then displays the result. Here is my code so far: .386 .model flat, stdcall…
  • 1,387
  • 3
  • 24
  • 42
4 answers

What does the "lock" instruction mean in x86 assembly?

I saw some x86 assembly in Qt's source: q_atomic_increment: movl 4(%esp), %ecx lock incl (%ecx) mov $0,%eax setne %al ret .align 4,0x90 .type q_atomic_increment,@function .size …
  • 2,852
  • 5
  • 21
  • 24
3 answers

Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?

LOOP (Intel ref manual entry) decrements ecx / rcx, and then jumps if non-zero. It's slow, but couldn't Intel have cheaply made it fast? dec/jnz already macro-fuses into a single uop on Sandybridge-family; the only difference being that that sets…
Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
8 answers

Is using double faster than float?

Double values store higher precision and are double the size of a float, but are Intel CPUs optimized for floats? That is, are double operations just as fast or faster than float operations for +, -, *, and /? Does the answer change for 64-bit…
Brent Faust
  • 8,214
  • 4
  • 47
  • 52
6 answers

Enhanced REP MOVSB for memcpy

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy. ERMSB was introduced with the Ivy Bridge microarchitecture. See the section "Enhanced REP MOVSB and STOSB operation (ERMSB)" in the Intel optimization…
Z boson
  • 29,230
  • 10
  • 105
  • 195
2 answers

Why is std::fill(0) slower than std::fill(1)?

I have observed on a system that std::fill on a large std::vector was significantly and consistently slower when setting a constant value 0 compared to a constant value 1 or a dynamic value: 5.8 GiB/s vs 7.5 GiB/s However, the results are…
  • 20,904
  • 6
  • 41
  • 90
3 answers

How to generate assembly code with clang in Intel syntax?

As this question shows, with g++, I can do g++ -S -masm=intel test.cpp. Also, with clang, I can do clang++ -S test.cpp, but -masm=intel is not supported by clang (warning argument unused during compilation: -masm=intel). How do I get intel syntax…
Jesse Good
  • 46,179
  • 14
  • 109
  • 158