Questions tagged [x86]

x86 is an architecture derived from the Intel 8086 CPU. The x86 family includes the 32-bit IA-32 and 64-bit x86-64 architectures, as well as legacy 16-bit architectures. Questions about the latter should be tagged [x86-16] and/or [emu8086]. Use the [x86-64] tag if your question is specific to 64-bit x86-64. For the x86 FPU, use the tag [x87]. For SSE1/2/3/4 / AVX* also use [sse], and any of [avx] / [avx2] / [avx512] that apply

The x86 family of CPUs contains 16-, 32-, and 64-bit processors from several manufacturers, with backward-compatible instruction sets, going back to the Intel 8086 introduced in 1978.

There is an tag for things specific to that architecture, but most of the info here applies to both. It makes more sense to collect everything here. Questions can be tagged with either or both. Questions specific to features only found in the x86-64 architecture, like RIP-relative addressing, clearly belong in x86-64. Questions like "how to speed up this code with vectors or any other tricks" are fine for x86, even if the intention is to compile for 64bit.

Related tag with tag-wikis:

  • wiki (some good SIMD guides), and (not much there)
  • wiki for guides specific to interfacing with a compiler that way.
  • wiki and wiki have more details about the differences between the two major x86 assembly syntaxes. And for Intel, how to spot which flavour of Intel syntax it is, like NASM vs. MASM/TASM.

Learning resources


Guides for performance tuning / optimisation:

Instruction set / asm syntax references:

OS-specific stuff: ABIs and system-call tables:






  • 16bit interrupt list: PC BIOS system calls (int 10h / int 16h / etc, AH=callnumber), DOS system calls (int 21h/AH=callnumber), and more.

memory ordering:

Specific behaviour of specific implementations

Q&As with good links, or directly useful answers:


FAQs / canonical answers:

If you have a problem involving one of these issues, don't ask a new question until you've read and understood the relevant Q&A.

(TODO: find better question links for these. Ideally questions that make a good duplicate target for new dups. Also, expand this.)


How to get started / Debugging tools + guides

Find a debugger that will let you single-step through your code, and display registers while that happens. This is essential. We get many questions on here that are something like "why doesn't this code work" that could have been solved with a debugger.

On Windows, Visual Studio has a built-in debugger. See Debugging ASM with Visual Studio - Register content will not display. And see Assembly programming - WinAsm vs Visual Studio 2017 for a walk-through of setting up a Visual Studio project for a MASM 32-bit or 64-bit Hello World console application.

On Linux: A widely-available debugger is gdb. See Debugging assembly for some basic stuff about using it on Linux. Also How can one see content of stack with GDB?

There are various GDB front-ends, including GDBgui. Also guides for vanilla GDB:

With layout asm and layout reg enabled, GDB will highlight which registers changes since the last stop. Use stepi to single-step by instructions. Use x to examine memory at a given address (useful when trying to figure out why your code crashed while trying to read or write at a given address). In a binary without symbols (or even sections), you can use starti instead of run to stop before the first instruction. (On older GDB without starti, you can use b *0 as a hack to get gdb to stop on an error.) Use help x or whatever for help on any command.

GNU tools have an Intel-syntax mode that's similar to MASM, which is nice to read but is rarely used for hand-written source (NASM/YASM is nice for that if you want to stick with open-source tools but avoid AT&T syntax):

Another key tool for debugging is tracing system calls. e.g. on a Unix system, strace ./a.out will show you the args and return values of all the system calls your code makes. It knows how to decode the args into symbolic values like O_RDWR, so it's much more convenient (and likely to catch brain-farts or wrong values for constants) than using a debugger to look at registers before/after an int or syscall instruction. Note that it doesn't work correctly on Linux int 0x80 32-bit ABI system calls in 64-bit processes: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.

To debug boot or kernel code, boot it in a bochs, qemu, or maybe even DOSBOX, or any other virtual machine / simulator / emulator. Use the debugging facilities of the VM to get way better information than the usual "it locks up" you will experience with buggy privileged code.

BOCHS is generally recommended for debugging real-mode bootloaders, especially ones that switch to protected mode; BOCHS's built-in debugger understands segmentation (unlike GDB), and can parse a GDT, IDT, and page tables to make sure you got the fields right.

14860 questions
2323
votes
10 answers

Why are elementwise additions much faster in separate loops than in a combined loop?

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } This loop is executed 10,000 times via another outer…
Johannes Gerer
  • 24,320
  • 5
  • 24
  • 33
1499
votes
11 answers

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. The Benchmark #include #include…
gexicide
  • 35,369
  • 19
  • 80
  • 136
873
votes
11 answers

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: nasm -felf64 p14.asm && gcc p14.o -o p14 The C++…
rosghub
  • 7,772
  • 4
  • 21
  • 31
740
votes
16 answers

What's the purpose of the LEA instruction?

For me, it just seems like a funky MOV. What's its purpose and when should I use it?
user200557
  • 7,559
  • 3
  • 16
  • 7
342
votes
16 answers

How can I determine if a .NET assembly was built for x86 or x64?

I've got an arbitrary list of .NET assemblies. I need to programmatically check if each DLL was built for x86 (as opposed to x64 or Any CPU). Is this possible?
Judah Gabriel Himango
  • 55,559
  • 37
  • 152
  • 206
332
votes
4 answers

Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs

I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your assignment is the opposite of our first lab assignment,…
Cowmoogun
  • 2,367
  • 4
  • 10
  • 17
300
votes
12 answers

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working: I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:125]…
GabrielChu
  • 5,418
  • 8
  • 23
  • 35
266
votes
4 answers

How to run a program without an operating system?

How do you run a program all by itself without an operating system running? Can you create assembly programs that the computer can load and run at startup, e.g. boot the computer from a flash drive and it runs the program that is on the CPU?
user2320609
  • 1,659
  • 3
  • 11
  • 6
258
votes
10 answers

What does multicore assembly language look like?

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc. With modern CPUs that have 4 cores (or even more), at the machine code level does…
Paul Hollingsworth
  • 11,954
  • 12
  • 48
  • 66
258
votes
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…
BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
243
votes
7 answers

What is exactly the base pointer and stack pointer? To what do they point?

Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(), (Note that this diagram has high addresses at the bottom and low addresses at the top.) Could anyone explain me what ebp and esp are in this context? From what I see,…
devoured elysium
  • 90,453
  • 117
  • 313
  • 521
229
votes
5 answers

How does the ARM architecture differ from x86?

Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile? What are the key differences between the two?
user1922878
  • 2,477
  • 3
  • 11
  • 7
193
votes
3 answers

What Every Programmer Should Know About Memory?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on Ulrich Drepper's own site:…
Framester
  • 27,123
  • 44
  • 121
  • 183
185
votes
3 answers

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(int i) { int mantissa, exponent, sign, r; …
orlp
  • 98,226
  • 29
  • 187
  • 285
183
votes
4 answers

What happens when a computer program runs?

I know the general theory but I can't fit in the details. I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it…
gaijinco
  • 2,040
  • 4
  • 16
  • 16
1
2 3
99 100