Questions tagged [assembly]

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: (assembly, x86, gnu) Note that you should use the ".net-assembly" tag instead for .NET assembly languages, and for Java bytecode, use the tag java-bytecode-asm instead.

Assembly is a family of very low-level programming languages, just above machine code. In assembly, each statement corresponds to a single machine code instruction. These instructions are represented as mnemonics in the given assembly language and are converted into executable machine code by a utility program referred to as an assembler; the conversion process is referred to as assembly, or assembling the code.

Language design

Basic elements

There is a large degree of diversity in the way that assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of three types of instruction statements that are used to define program operations:

Opcode mnemonics
Data sections
Assembly directives

Opcode mnemonics and extended mnemonics

Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (value coded in the instruction itself), registers specified in the instruction or implied, or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand. For example, the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP for BC with a mask of 0.

Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks. For the SPARC architecture, these are known as synthetic instructions

Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b. These are sometimes known as pseudo-opcodes.

Tag use

Use the assembly tag for assembly language programming questions, on any processor. You should also use a tag for your processor or instruction set architecture (arm, avr, mips, x86, x86-64, etc). Consider a tag for your assembler as well (gas, masm, nasm, et cetera).

If your question is about inline assembly in C or other programming languages, see inline-assembly. For questions about .NET assemblies, use .net-assembly instead and for .NET's Common Intermediate Language, use cil. For Java ASM, use the tag java-bytecode-asm.

Resources

Beginner's resources

Professional Assembly Language - Richard Blum
Assembly Language Step-by-Step: Programming with Linux - Jeff Duntemann
Assembly primer - Write your own OS
Introduction to Assembly Language - Dandamudi

Assembly language tutorials, guides, and reference material

The x86 tag wiki has a large collection of links, including beginner material and reference docs.
GNU C inline asm docs/guides/info (at the bottom of that answer): how to use GNU C inline asm well, to make efficient code without a lot of wasted instructions.
OSdev.org Pretty much everything you need to write your own OS (toy or otherwise). Mostly x86, but some mention of ARM.
X86 Assembly/wikibooks
Programming from the Ground Up (PDF)
Paul Carter's Tutorial on x86 Assembly
Software optimization resources by Agner Fog
A whirlwind introduction to dataflow graphs: how to analyze dependency chains for throughput and latency.

37939 questions

2194

votes

12 answers

Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow,…

gcc assembly floating-point compiler-optimization fast-math

asked Jun 21 '11 at 18:49

xis

22,592
8
39
55

1645

votes

14 answers

Is < faster than <=?

Is if (a < 901) faster than if (a <= 900)? Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated machine code in case it's even true.

c++ performance assembly relational-operators

asked Aug 27 '12 at 02:10

snoopy

14,122
3
22
49

1499

votes

11 answers

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. The Benchmark #include #include…

c++ performance assembly x86 compiler-optimization

asked Aug 01 '14 at 10:33

gexicide

35,369
19
80
136

873

votes

11 answers

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: nasm -felf64 p14.asm && gcc p14.o -o p14 The C++…

c++ performance assembly optimization x86

asked Nov 01 '16 at 06:12

rosghub

7,772
4
21
31

740

votes

16 answers

What's the purpose of the LEA instruction?

For me, it just seems like a funky MOV. What's its purpose and when should I use it?

assembly x86 x86-64 x86-16

asked Nov 01 '09 at 20:57

user200557

7,559
3
16
7

667

votes

4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE add and five cycles for a mul to complete on most…

c++ assembly x86-64 cpu-architecture flops

asked Dec 05 '11 at 17:54

user1059432

7,158
3
17
16

503

votes

40 answers

When is assembly faster than C?

One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. However, I've also heard it stated many times…

c performance assembly

asked Feb 23 '09 at 13:03

Adam Bellaire

99,441
19
144
160

423

votes

17 answers

How do you get assembler output from C/C++ source in gcc?

How does one do this? If I want to analyze how something is getting compiled, how would I get the emitted assembly code?

c++ c debugging gcc assembly

asked Sep 26 '08 at 00:10

Doug T.

59,839
22
131
193

281

votes

5 answers

Why does Java switch on contiguous ints appear to run faster with added cases?

I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying double variables by 10 raised to arbitrary…

java performance assembly compiler-construction switch-statement

asked Mar 25 '13 at 17:28

Andrew Bissell

2,767
2
12
19

277

votes

10 answers

Using GCC to produce readable assembly?

I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able to find a way with GCC. I am trying to re-write…

c gcc assembly

asked Aug 17 '09 at 19:22

James

3,272
3
20
21

272

votes

16 answers

Is it possible to "decompile" a Windows .exe? Or at least view the Assembly?

A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in Assembly or attach a debugger? Edit to say it is not…

debugging winapi assembly decompiling

asked Nov 07 '08 at 18:44

swilliams

44,959
24
94
129

266

votes

4 answers

How to run a program without an operating system?

How do you run a program all by itself without an operating system running? Can you create assembly programs that the computer can load and run at startup, e.g. boot the computer from a flash drive and it runs the program that is on the CPU?

assembly x86 operating-system bootloader osdev

asked Feb 26 '14 at 22:13

user2320609

1,659
3
11
6

258

votes

10 answers

What does multicore assembly language look like?

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc. With modern CPUs that have 4 cores (or even more), at the machine code level does…

assembly x86 cpu multicore smp

asked Jun 11 '09 at 13:16

Paul Hollingsworth

11,954
12
48
66

258

votes

3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…

security assembly x86 cpu-architecture

asked Jan 04 '18 at 05:52

BeeOnRope

51,419
13
149
309

253

votes

12 answers

Is 'switch' faster than 'if'?

Is a switch statement actually faster than an if statement? I ran the code below on Visual Studio 2010's x64 C++ compiler with the /Ox flag: #include #include #include #define MAX_COUNT (1 << 29) size_t counter =…

c performance switch-statement assembly jump-table

asked Jul 24 '11 at 05:00

user541686

189,354
112
476
821

2 3

…

99 100 Next