Questions tagged [x86]

x86 is an architecture derived from the Intel 8086 CPU. The x86 family includes the 32-bit IA-32 and 64-bit x86-64 architectures, as well as legacy 16-bit architectures. Questions about the latter should be tagged [x86-16] and/or [emu8086]. Use the [x86-64] tag if your question is specific to 64-bit x86-64. For the x86 FPU, use the tag [x87]. For SSE1/2/3/4 / AVX* also use [sse], and any of [avx] / [avx2] / [avx512] that apply

The x86 family of CPUs contains 16-, 32-, and 64-bit processors from several manufacturers, with backward-compatible instruction sets, going back to the Intel 8086 introduced in 1978.

There is an x86-64 tag for things specific to that architecture, but most of the info here applies to both. It makes more sense to collect everything here. Questions can be tagged with either or both. Questions specific to features only found in the x86-64 architecture, like RIP-relative addressing, clearly belong in x86-64. Questions like "how to speed up this code with vectors or any other tricks" are fine for x86, even if the intention is to compile for 64bit.

Related tag with tag-wikis:

sse wiki (some good SIMD guides), and avx (not much there)
inline-assembly wiki for guides specific to interfacing with a compiler that way.
intel-syntax wiki and att wiki have more details about the differences between the two major x86 assembly syntaxes. And for Intel, how to spot which flavour of Intel syntax it is, like NASM vs. MASM/TASM.

Learning resources

Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” has a gentle introduction to x86 asm itself for asm beginners who know C or C++, as well a very useful guide to looking at compiler output.

If you don't know how to do something in asm, write a simple C function that does it and see what an optimizing compiler does. e.g. int foo(char *p) { return *p; } shows you how to use movsx. See also How to remove "noise" from GCC/clang assembly output?
Short x86 Assembly Guide targetting 32 bit mode and MASM assembler, but being brief and target-agnostic enough to be used as a starting point for any "Intel" syntax dialect assembler (NASM, YASM, FASM, ...).
Suggestions on how to learn asm, with a recommendation against 16bit DOS. Questions should use the x86-16, emu8086, and/or dos tags if applicable, as well as x86 (which includes all platforms.)
To learn assembly - should I start with 32 bit or 64 bit?
OSdev.org: a great resource if you want to understand / modify OS internals or make your own toy OS. Not useful for writing / debugging normal programs that run under existing OSes.
General Tips for Bootloader Development. (Using legacy BIOS, not UEFI).
Working example of a legacy BIOS int 10h bootloader that loads a "kernel" and calls a C main function in it, in 32-bit protected mode. Includes instructions on how to build and link it with NASM, gcc -m32, and ld (with a linker script). And how to make a disk image and run it on QEMU.
the inline-assembly tag wiki. (But see also https://gcc.gnu.org/wiki/DontUseInlineAsm - inline asm is more complicated than writing stand-alone asm functions you call from C, so it's not good for learning asm.)
Using GNU C/C++ inline ASM. The bottom of that answer has a collection of links to info on how to write inline asm that's efficient and correct. The first part of the answer explains why it's not a good way to learn asm in the first place. Don't try to "get your feet wet" with asm by using inline asm. You have to understand everything to write correct input/output operand constraints and clobbers.
Understanding Carry vs. Overflow conditions/flags, normally relevant for unsigned vs. signed respectively.

Quick guide to what's different in x86-64. AT&T syntax. NASM and YASM behave differently (from each other) in choice of encoding for mov rax, 1, and don't use a separate movabs mnemonic for the 64bit-immediate form.
Introduction to x64 Assembly (published by Intel). Uses MASM syntax. Spends a bit of time talking about the Windows calling convention and / MSVC-specific toolchain issues (like no MSVC inline asm in 64-bit mode), as you might expect from using "x64" in the article title instead of x86-64. But looks like some good generally-applicable stuff that isn't OS-specific. For some bizarre reason, it suggests using the slow LOOP instruction, so it's not perfect.
Encoding Real x86 Instructions: a tutorial (course material) on how instructions are encoded into machine code. Lots of diagrams.
x86 on Wikipedia
x86 Assembly wikibook
Assembly Language for x86 Processors (website for Kip Irvine's book)
Programming from the Ground Up, a free (GFDL) book by Jonathan Bartlett. Errata for the book. Available as a small (1MB) PDF from the "download" link on that page, or as HTML chapters . It uses 32-bit x86 asm with AT&T syntax on Linux, and has some good stuff about how to "think like a computer" to figure out how to get things done in asm. It covers some essential operating-system stuff like virtual memory, and things like that necessary to understand what's going on, as well as assembly / machine language itself.
x86-64 Assembly Language Programming with Ubuntu, a free book using YASM (NASM syntax) for GNU/Linux. The PDF is CC-BY-NC-SA. Unfortunately no mention of default rel or [rel x] RIP-relative addressing so it's missing some stuff that's essential in practice. But does have some introductory stuff about basics like data representation, bits and bytes in memory vs. registers, and other background beyond just what each instruction does.
Assembly tutorial - Dr. Paul Carter
Windows Assembly Programming Tutorial
Why do functions have to save some registers, but not others? See below for links to guides & docs for specific calling conventions.
How to trace what a function does: figure out the inputs and the outputs, then figure out what it does with them.
Linux x86 Program Start Up or - How the heck do we get to main()
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
What do the register-names like esi mean, and what special purposes do they have. They're all acronyms, like Counter register, or Source Index.

Guides for performance tuning / optimisation:

Agner Fog's optimization guides and resources. Includes latency/throughput tables for P5 onwards. Also much qualitative discussion of how to go about making your code faster. Also has a good guide to the different calling conventions across OSes, and covers linking / symbols / relocation.
Intel's Sandybridge microarchitecture family can't micro-fuse indexed addressing modes in the out-of-order core, only in the decoders and uop-cache. Also: Haswell's dedicated store-address unit on port7 only works with simple effective addresses. Complex effective addresses need the AGU on a load port.
Enhanced REP MOVSB for memcpy: single-threaded bandwidth vs. aggregate bandwidth on desktop vs. many-core CPUs, RFO vs. non-RFO stores. (Modern CPUs have more DRAM / L3 bandwidth than a single core can use; there are other bottlenecks especially in many-core chips).
What Every Programmer Should Know About Memory by Ulrich Drepper. (Originally posted as a series of LWN articles, Ulrich published the PDF later). How DRAM and caches work, their behaviour, and how to optimize software for cache locality. Includes some charts with real microbenchmark data to illustrate points, and a cache-blocked SSE2 matrix multiply example. See a 2017 review of what's outdated, e.g. the P4 software prefetch stuff is mostly obsolete.
Why xor same,same is better than mov reg, 0 for zeroing a register There are several reasons, some simple and some subtle (e.g. avoiding partial-register stalls on P6/SnB family).
Serializing RDTSC with LFENCE vs. CPUID for benchmarking short sequences within a program.
How to get the CPU cycle count in x86_64 from C++? (including a bunch of info on what rdtsc measures, exactly, and caveats for using it, with links to even more details).
What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?: intro to static performance analysis.
Intel's IACA (Intel Architecture Code Analyzer): analyze marked sections of code for throughput (e.g. cycles per iteration) or latency of the critical path. Assumes perfect cache, and other simplifications, and isn't always correct, but can be useful. Was stalled, but updated again for Skylake-X (AVX512). See What is IACA and how do I use it? for a tutorial.
Haswell microarchitecture, Bulldozer microarchitecture. David Kanter's analysis. He's also done writeups on earlier uarches, like Sandybridge and Nehalem.
Modern Microprocessors A 90-Minute Guide!: from in-order pipelined to super-scalar out-of-order. And brainiac (PPro) vs. speed demon (Pentium 4), and Pentium 4 hitting the "power wall" in CPU design.
A whirlwind introduction to dataflow graphs: how to analyze dependency chains for throughput and latency.
http://www.uops.info/ very detailed uop / execution port testing on Intel CPUs, finding some things that repeating a large block of the same instruction (like Agner Fog's testing) sometimes misses.
New CPUs will usually have AIDA64 InstLatx64 results before Agner Fog can test and publish updated tables. For example, Skylake-avx512, and see also https://github.com/InstLatx64/InstLatx64 for a mirror + a spreadsheet of Skylake-AVX512 port assignments (compiled from IACA-2.3 output). BDW vs. SKL points out some of the interesting changes in SKL (more throughput for more instructions, different FP latency).
2015 IDF slides from the Skylake power management talk Unfortunately the main site (http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65D7520025A8/7/5) which had video (of slides + audio) is offline now.

Instruction set / asm syntax references:

Intel's vector intrinsics finder/search (very good): search by asm mnemonic or C intrinsic name
x86/x64 SIMD Instruction List (SSE to AVX512) Beta: A nice compact table listing instruction mnemonics and their intrinsics, broken down by type and element-size. Detailed pages with graphical data-movement diagrams for each instruction.
SIMD guides in the SSE tag wiki, focusing on how to actually make good use of SIMD in general, not just what the available instructions are.
Intel's manuals, including instruction set reference manual. Extremely detailed description of everything every instruction does to the architectural state. Big, but has a decent index / table of contents. Also on that page: Intel's optimization manual. Some of the same advice as Agner Fog's guides, but sometimes without explaining exactly why in terms of microarch execution ports and other under-the-hood reasons. Also sometimes obsolete, for example recommending against inc/dec long after P4 is irrelevant.
AMD's x86 manuals, including instruction-set reference and optimization manuals.
HTML version of Intel's insn set reference, auto-generated from the PDF. One page per instruction, great for linking in answers.
Another HTML extract, including AVX512, CLFLUSHOPT, etc.. This makes it more cluttered, and harder to find what you need, if you're not targeting AVX512. (But note that CLFLUSH has changed to being strongly-ordered, but felixcloutier.com's HTML extract still has the old documentation. There may be other inaccuracies in the old docs, even for old instructions.)
https://sandpile.org - CPUID maps, instruction encoding, register diagrams, opcode map, miscellaneous other technical details.
x86 Instruction Reference including when introduced (8086, 186, 586, etc) - NASM appendix B. Includes undocumented instructions, and Cyrix-only MMX instructions, and stuff like that.

A fork of an older version includes English descriptions. The original had some errors in which generation introduced each form of each insn but this version keeps the nice formatting while fixing those. Handy for people still developing for x86-16. The similar wikipedia page doesn't mention that 386 is required for the faster 2-operand form of imul r16, r/m16 that doesn't have to calculate the upper half of the result.
x86 Opcode reference guide, sorted by opcode or by mnemonic. 32, 64, or both in one table. The "geek" version includes non-standard / undocumented opcodes, the "coder" one includes columns showing which if any flags are read and written.
Original 8086 errata / anomalies, such as mov ss, src not properly disabling interrupts until the end of the next instruction. Also see the parent directory for some errata, undocumented instructions, and stuff for 186/286/386.
Simply FPU: x87 tutorial. Helpful for understanding old x87 code, esp. the early sections about how the register stack works. (Use SSE for new code.)
fsin's precision is far worse than 1ulp for inputs close to pi, contrary to Intel's previous documentation. The other FP articles in Bruce Dawson's series are also excellent (index in this one on FP comparisons).
GNU as manual, aka gas manual
The NASM manual
YASM manual: describes YASM syntax and macros. Excellent register diagram showing partial registers, with their machine-code encodings, and a reminder on zero-extending vs. unmodified upper parts. (Another simpler register-subset diagram for a single reg).

Possible canonical duplicates for register subsets: Assembly registers in 64-bit architecture includes some calling-convention / usage stuff. How do AX, AH, AL map onto EAX? is a good one for bugs where AL and RAX were used for different things, corrupting each other.
MASM Reference Documentation, and an old MASM 6.1 manual from 1996. Confusing brackets in MASM32 shows that MASM surprisingly ignores brackets around symbolic immediates.
MASM syntax as used by JWasm. JWasm is a portable assembler.
FASM manual
table of AT&T(GNU) vs. NASM syntax for addressing modes and indirect jmp/call
All the available addressing modes (32/64-bit) (Intel syntax, with a note about NASM vs. MASM for mov reg, symbol), with links to further guides.
AT&T addressing-mode syntax
16-bit addressing modes.
TODO: find a good link for AMD's XOP instruction set. (Not recommended for general use; even AMD is dropping XOP support in their Zen architecture.)
Cheat sheet PDF
Win32-specific cheat sheet

OS-specific stuff: ABIs and system-call tables:

x86 ABIs (wikipedia): calling conventions for functions, including x86-64 Windows and System V (Linux). See also Agner Fog's nice calling convention guide
32-bit absolute addresses no longer allowed in x86-64 Linux? (PIE executables are now the default on most distros, with gcc configured with --enable-default-pie.)
Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array (OS X's image base is above the low 32, unlike Linux position-dependent executables). Also mentions 2 known bugs in some NASM versions with macho64 and RIP-relative or 64-bit absolute addressing.

System V ABI summary on osdev: i386 and x86-64, with links to random copies of the per-architecture supplement for various architectures, and the generic gABI that all the processor-specific supplement (psABI) documents expand on.
System V psABI official standard current revisions for x86-64 and i386 (wiki page on github, kept up to date by H.J. Lu). Direct link to x86-64 revision 1.0. Also links to the official forum for ABI discussion by maintainers/contributors.
clang/gcc sign/zero extend narrow args to 32bit, even though the System V ABI as written doesn't (yet?) require it. Clang-generated code also depends on it.
System V 32bit (i386) psABI (official standard, rev 1.1 Dec2015), used by Linux and Unix. (Some OSes don't require 16-byte stack alignment for 32-bit code; GNU/Linux does)
(Historical: very old SCO version of the i386 SysV ABI, before 16B stack alignment was required).

OS X 32bit x86 calling convention, with links to the others. The 64bit calling convention is System V. Apple's site just links to a FreeBSD pdf for that.

Windows x86-64 __fastcall calling convention
Windows __vectorcall: documents the 32bit and 64bit versions
Windows 32bit __stdcall: used used to call Win32 API functions. That page links to the other calling convention docs (e.g. __cdecl).
ABI cheat sheet: x86 vs. x64 vectorcall and non-vectorcall, vs. SysV. SysV section is incomplete.
Why does Windows64 use a different calling convention from all other OSes on x86-64?: some interesting history, esp. for the SysV ABI where the mailing list archives are public and go back before AMD's release of first silicon.
MSVC's 32bit CRT startup code sets the x87 FPU precision to 53 (double). That entire series of articles (table of contents in this one) is excellent, including asm output from MSVC in some examples.

The Definitive Guide to Linux System Calls (on x86). Examples of how to use int 0x80, 32-bit sysenter, and 64-bit syscall, and how to call through the vDSO for gettimeofday, and has some info about glibc's syscall wrappers. Lots of details, and also some background info / basics for beginners.
Linux system call tables. 64bit syscall numbers, with parameter->register mapping (derived from the kernel source code, and the standard rule for order of args).
FreeBSD system calls: question has FreeBSD syscalls, answer has Linux and others.
What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64: Note that 32bit int 0x80 restores all registers (including flags) except eax, while 64bit syscall also clobbers rcx and r11 as well as putting the return value in rax.

16bit interrupt list: PC BIOS system calls (int 10h / int 16h / etc, AH=callnumber), DOS system calls (int 21h/AH=callnumber), and more.

memory ordering:

Weak vs. Strong Memory Models: what it means when people say x86 has a "strongly ordered memory model". See also the c++ info page for many good links if you're using C11/C++11 atomics.
Memory Reordering Caught in the Act: A test case that demonstrates memory reordering in practice on a multicore x86 CPU.
A better x86 memory model: x86-TSO (extended version) A formal definition of the x86 memory model which hopefully matches how real hardware behaves.
Why isn't add dword [num], 1 atomic, even though it's a single instruction. Also asks about compiling num++ in C++. or See also Atomicity on x86: What does it mean for a load or store to be atomic, and how is it implemented internally?

Specific behaviour of specific implementations

TLB and Pagewalk Coherence in x86 Processors. Many x86 microarchitectures, especially Intel's, provide stronger ordering guarantees than the ISA requires for modifying a page-table entry that's not already cached in the TLB. Win95 even depended on this. (Don't write new code that depends on this.)
Measuring Reorder Buffer Capacity Another experimental test that demonstrates the capabilities and limits of out-of-order execution in real hardware.
What are the exhaustion characteristics of RDRAND on Ivy Bridge? With an answer from David Johnston (Intel RNG HW designer and librdrand author).

Q&As with good links, or directly useful answers:

Using GNU C/C++ inline ASM. (Same link from the learning-resources section, but worth repeating here.)
What are the best instruction sequences to generate vector constants on the fly?
Parallel programming using Haswell architecture
Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs. Has a long answer including some introductory computer-architecture stuff as well as details of what can stall a Haswell pipeline.
INC instruction vs ADD 1: Does it matter?
How can I run this assembly code on OS X?: OS X getting-started guide. (Symbol names are prepended with _ on OS X, unlike for Linux ELF systems.)
add/sub/LEA can be used with garbage in high bits, so LEA eax, [rdi + rsi*2 - 15] to compute a + 2*b - 15 works fine, even if a and b are only supposed to be 8 or 16 bits.
TODO: find a question about how to use a profiler to measure uops and stuff. perf comes with most Linux distros, and ocperf.py is a wrapper for it that provides more symbolic names for stuff like micro-arch-specific uop counters.

FAQs / canonical answers:

If you have a problem involving one of these issues, don't ask a new question until you've read and understood the relevant Q&A.

(TODO: find better question links for these. Ideally questions that make a good duplicate target for new dups. Also, expand this.)

My program crashes / segfaults: You need to use a debugger to find what instruction is crashing (see the bottom of this tag wiki for GDB and Visual Studio tips). Most buggy asm programs crash, so without more info this is not useful. Reasons can include clobbering registers or stack memory you shouldn't have, leaving esp pointing to the wrong place before a ret, or many many other reasons besides the following other common problems.
external assembly file in visual studio - VS mixed-source x64 project, for asm files as part of a C/C++ program.
Also Assembly programming - WinAsm vs Visual Studio 2017 for a pure asm project.
Building 32bit code on a 64bit system (with the GNU toolchain). gcc example.s makes a binary that runs in 64bit mode, which will crash if the code was written for 32bit mode. Related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
Building an executable from asm source that defines _start vs. source that defines main, with gcc/as/ld and/or NASM. With or without libc, and static vs. dynamic executable.
Falling off the end of _start without making a sys_exit. ret doesn't work. What happens if there is no exit system call in an assembly program?

Execution just keeps going if there's no jump or ret. What if there is no return statement in a CALLed block of code in assembly programs and Why is no value returned if a function does not explicity use 'ret'.
Code executes condition wrong? fall through from the if into the else body in an if/else. Nicely explains that labels aren't magic and execution falls through them.
Segmentation fault when using DB (define byte) inside a function Putting data where it's executed as code. (Assembly (x86): <label> db 'string',0 does not get executed unless there's a jump instruction for legacy BIOS bootloaders with data at the top.)
idiv / div problems: Zero edx first, or sign-extend eax into it.. 32-bit div faults with #DE if the 64b/32b => 32b quotient doesn't actually fit in 32b. (On POSIX systems including Linux, this raises SIGFPE).

8-bit operand size like div dl is the special case where dx isn't involved, just AX and AH/AL. It still faults if the quotient overflows 8 bits.
No output from printf when I pipe the output, or print something without a newline? When you use the exit system call.
Calling printf in x86_64 using GNU assembler calling convention, stack alignment, and working example. Related NASM-syntax version Segfault while calling C function (printf) from Assembly

Canonical duplicate for scanf segfaulting on misaligned stack in modern Linux builds of glibc: glibc scanf Segmentation faults when called from a function that doesn't align RSP
Library functions modify registers / which registers do my functions need to save and restore? This is specified by the calling convention (part of the ABI) for the platform you're targeting. Search for those terms on this page. What registers must be preserved by an x86 function? is a decent canonical duplicate.
mismatched push/pop: if the stack pointer isn't pointing at the return address when you ret, you crash.
How do I handle multi-digit numbers? Linux, Windows, OS X, and DOS system calls for handling user input/output give you ASCII (or UTF-8) characters, or strings of characters. (Canonical Q&A for single-digit failure to do sub al, '0'). You normally need to convert between strings and binary integers to do math on them, like the C functions atoi or sprintf(buf, "%d", number). None of the common system-call APIs for major OSes that run on x86 provide these functions for you; only as libraries.

string-to-integer (32-bit NASM, algorithm works everywhere). (multiply by 10 for place value) Also includes an int-to-string loop.

Printing integers: 16-bit code to print 16 or 32-bit integers (in dx:ax) (1 digit at a time with MS-DOS int 21h, but could be adapted to store into a string or use a different output method.) Another example for unsigned 16b numbers in DOS that calculates digits and stores them into a string in memory.

2-digit decimal numbers (00-99), using BIOS int 10h for each digit: Displaying Time in Assembly. (Just a special case of the general algorithm, not looping.)

NASM x86-64 function to convert and print a 32-bit unsigned integer (using a single Linux write system call on a buffer). Other answers on the same question show printing one character at a time. AT&T version of the same function, also showing a 5x faster version that uses a multiplicative inverse instead of div to divide by the compile-time constant 10.

How to convert a binary integer number to a hex string? (32-bit NASM code. Scalar, SSE2, SSSE3, AVX512F, and AVX512VBMI versions.)
Loading pointers into registers vs. loading data into registers: Make sure you understand the different between mov reg, symbol and mov reg, [symbol] (NASM syntax), or MASM syntax: mov reg, OFFSET symbol vs. mov reg, symbol. Many beginner questions are caused by mistakes in dereferencing addresses, or not dereferencing. This is the same as pointers in C.
Invalid combination of opcode and operands error on mov [msg], [ebp+8]? You can't use two memory operands to one instruction. (Why IA32 does not allow memory to memory mov?)
Bit-shifts and rotates need the count in cl, not any other register, or as an immediate constant. shl eax, ebx is impossible, shl eax, 2 is fine, and so is shl eax, cl
Call an absolute pointer in x86 machine code or jmp to an absolute address. With examples in NASM and AT&T syntax.
Why do most x86-64 instructions zero the upper part of a 32 bit register? In fact, all instructions that write a 32bit register zero the upper 32 of the full 64bit register, so mov eax, 1234 is more efficient than mov rax, 1234, but equivalent. This is not the case for writing to 8 and 16bit registers, like al/ah/ax, so you need movzx or movsx if the upper bits might hold garbage and you need to clear them (e.g. before using as part of a memory address).
Using LEA on values that aren't addresses / pointers? It's just a shift-and-add ALU instruction that uses memory-operand syntax and machine encoding.
How to tell the length of an x86 instruction? – with an overview over the x86 instruction encoding
Reversing a string? This well-commented answer uses 16-bit ms-dos system calls to read the string, but the actual loop over the string works the same for 32 or 64-bit code.
Indexing an array without scaling the index by the element width, resulting in overlapping loads or stores. Declaring and indexing an integer array of qwords in assembly (x86-64 AT&T syntax)
How do I do X in assembly: usually the same way you would in another programming language, like C. Figure out what needs to happen to the data before you get bogged down in writing instructions to make it happen.

How to get started / Debugging tools + guides

Find a debugger that will let you single-step through your code, and display registers while that happens. This is essential. We get many questions on here that are something like "why doesn't this code work" that could have been solved with a debugger.

On Windows, Visual Studio has a built-in debugger. See Debugging ASM with Visual Studio - Register content will not display. And see Assembly programming - WinAsm vs Visual Studio 2017 for a walk-through of setting up a Visual Studio project for a MASM 32-bit or 64-bit Hello World console application.

On Linux: A widely-available debugger is gdb. See Debugging assembly for some basic stuff about using it on Linux. Also How can one see content of stack with GDB?

There are various GDB front-ends, including GDBgui. Also guides for vanilla GDB:

With layout asm and layout reg enabled, GDB will highlight which registers changes since the last stop. Use stepi to single-step by instructions. Use x to examine memory at a given address (useful when trying to figure out why your code crashed while trying to read or write at a given address). In a binary without symbols (or even sections), you can use starti instead of run to stop before the first instruction. (On older GDB without starti, you can use b *0 as a hack to get gdb to stop on an error.) Use help x or whatever for help on any command.

GNU tools have an Intel-syntax mode that's similar to MASM, which is nice to read but is rarely used for hand-written source (NASM/YASM is nice for that if you want to stick with open-source tools but avoid AT&T syntax):

clang or gcc -Wall -O3 -masm=intel foo.c -fverbose-asm -S -o- | less (affects inline-asm)
GDB: set disassembly-flavor intel (can go in your ~/.gdbinit)
objdump -drwC -Mintel
perf report -Mintel

Another key tool for debugging is tracing system calls. e.g. on a Unix system, strace ./a.out will show you the args and return values of all the system calls your code makes. It knows how to decode the args into symbolic values like O_RDWR, so it's much more convenient (and likely to catch brain-farts or wrong values for constants) than using a debugger to look at registers before/after an int or syscall instruction. Note that it doesn't work correctly on Linux int 0x80 32-bit ABI system calls in 64-bit processes: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.

To debug boot or kernel code, boot it in a bochs, qemu, or maybe even DOSBOX, or any other virtual machine / simulator / emulator. Use the debugging facilities of the VM to get way better information than the usual "it locks up" you will experience with buggy privileged code.

BOCHS is generally recommended for debugging real-mode bootloaders, especially ones that switch to protected mode; BOCHS's built-in debugger understands segmentation (unlike GDB), and can parse a GDT, IDT, and page tables to make sure you got the fields right.

14860 questions

183

votes

10 answers

What is the difference between Trap and Interrupt?

What is the difference between Trap and Interrupt? If the terminology is different for different systems, then what do they mean on x86?

x86 operating-system kernel interrupt cpu-architecture

asked Jun 30 '10 at 12:23

David

2,830
6
22
29

172

votes

4 answers

An expensive jump with GCC 5.4.0

I had a function which looked like this (showing only the important part): double CompareShifted(const std::vector& l, const std::vector &curr, int shift, int shiftY) { ... for(std::size_t i=std::max(0,-shift);i

c++ gcc x86 conditional-statements branch-prediction

asked Dec 06 '16 at 09:22

Jakub Jůza

1,113
1
7
13

170

votes

5 answers

The point of test %eax %eax

Possible Duplicate: x86 Assembly - ‘testl’ eax against eax? I'm very very new to assembly language programming, and I'm currently trying to read the assembly language generated from a binary. I've run across test %eax,%eax or test %rdi,…

assembly x86 att

asked Oct 25 '12 at 08:43

pauliwago

5,359
10
36
48

159

votes

3 answers

How do you use gcc to generate assembly code in Intel syntax?

The gcc -S option will generate assembly code in AT&T syntax, is there a way to generate files in Intel syntax? Or is there a way to convert between the two?

gcc x86 gnu intel assembly

asked Oct 14 '08 at 03:52

hyperlogic

6,885
7
36
32

152

votes

12 answers

What is the difference between MOV and LEA?

I would like to know what the difference between these instructions is: MOV AX, [TABLE-ADDR] and LEA AX, [TABLE-ADDR]

assembly x86 instruction-set

asked Nov 09 '09 at 08:32

naveen

48,336
43
154
235

145

votes

7 answers

How does this milw0rm heap spraying exploit work?

I usually do not have difficulty to read JavaScript code but for this one I can’t figure out the logic. The code is from an exploit that has been published 4 days ago. You can find it at milw0rm. Here is the code:

javascript x86 exploit assembly

asked Dec 19 '08 at 14:56

Patrick Desjardins

125,683
80
286
335

143

votes

4 answers

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

In the x86-64 Tour of Intel Manuals, I read Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode…

assembly x86 x86-64 cpu-registers zero-extension

asked Jun 24 '12 at 11:40

Nubok

3,113
6
24
45

141

votes

1 answer

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax

performance assembly optimization x86 micro-optimization

asked Nov 12 '15 at 07:55

balajimc55

1,804
2
11
15

141

votes

5 answers

Header files for x86 SIMD intrinsics

Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.

x86 header-files sse simd intrinsics

asked Jun 27 '12 at 14:44

fredoverflow

237,063
85
359
638

134

votes

3 answers

What is the meaning of "non temporal" memory accesses in x86

This is a somewhat low-level question. In x86 assembly there are two SSE instructions: MOVDQA xmmi, m128 and MOVNTDQA xmmi, m128 The IA-32 Software Developer's Manual says that the NT in MOVNTDQA stands for Non-Temporal, and that otherwise…

x86 sse assembly

asked Aug 31 '08 at 20:18

Nathan Fellman

108,984
95
246
308

133

votes

3 answers

What does `dword ptr` mean?

Could someone explain what this means? (Intel Syntax, x86, Windows) and dword ptr [ebp-4], 0

assembly x86 dword pointers

asked Jun 07 '10 at 08:00

小太郎

4,912
5
31
47

132

votes

6 answers

Why does integer overflow on x86 with GCC cause an infinite loop?

The following code goes into an infinite loop on GCC: #include using namespace std; int main(){ int i = 0x10000000; int c = 0; do{ c++; i += i; cout << i << endl; }while (i > 0); cout << c <<…

c++ c gcc x86 undefined-behavior

asked Oct 07 '11 at 02:24

Mysticial

438,104
44
323
322

132

votes

7 answers

What is the purpose of XORing a register with itself?

xor eax, eax will always set eax to zero, right? So, why does MSVC++ sometimes put it in my executable's code? Is it more efficient that mov eax, 0? 012B1002 in al,dx 012B1003 push ecx int i = 5; 012B1004 mov dword…

assembly x86

asked Sep 08 '09 at 21:54

devoured elysium

90,453
117
313
521

128

votes

5 answers

Purpose of ESI & EDI registers?

What is the actual purpose and use of the EDI & ESI registers in assembler? I know they are used for string operations for one thing. Can someone also give an example?

assembly x86

asked Dec 06 '09 at 19:26

Tony The Lion

57,181
57
223
390

122

votes

8 answers

`testl` eax against eax?

I am trying to understand some assembly. The assembly as follows, I am interested in the testl line: 000319df 8b4508 movl 0x08(%ebp), %eax 000319e2 8b4004 movl 0x04(%eax), %eax 000319e5 85c0 testl %eax, %eax …

assembly x86 instructions

asked Sep 29 '08 at 01:22

maxpenguin

4,709
6
26
21

Prev 1

…

99 100 Next