Questions tagged [assembly]

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: (assembly, x86, gnu) Note that you should use the ".net-assembly" tag instead for .NET assembly languages, and for Java bytecode, use the tag java-bytecode-asm instead.

Assembly is a family of very low-level programming languages, just above machine code. In assembly, each statement corresponds to a single machine code instruction. These instructions are represented as mnemonics in the given assembly language and are converted into executable machine code by a utility program referred to as an assembler; the conversion process is referred to as assembly, or assembling the code.

Language design

Basic elements

There is a large degree of diversity in the way that assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of three types of instruction statements that are used to define program operations:

  • Opcode mnemonics
  • Data sections
  • Assembly directives

Opcode mnemonics and extended mnemonics

Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (value coded in the instruction itself), registers specified in the instruction or implied, or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand. For example, the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP for BC with a mask of 0.

Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks. For the SPARC architecture, these are known as synthetic instructions

Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b. These are sometimes known as pseudo-opcodes.

Tag use

Use the tag for assembly language programming questions, on any processor. You should also use a tag for your processor or instruction set architecture (, , , , , etc). Consider a tag for your assembler as well (, , , et cetera).

If your question is about inline assembly in C or other programming languages, see . For questions about .NET assemblies, use instead and for .NET's Common Intermediate Language, use . For Java ASM, use the tag .

Resources

Beginner's resources

Assembly language tutorials, guides, and reference material

37939 questions
2194
votes
12 answers

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow,…
xis
  • 22,592
  • 8
  • 39
  • 55
1645
votes
14 answers

Is < faster than <=?

Is if (a < 901) faster than if (a <= 900)? Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated machine code in case it's even true.
snoopy
  • 14,122
  • 3
  • 22
  • 49
1499
votes
11 answers

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. The Benchmark #include #include…
gexicide
  • 35,369
  • 19
  • 80
  • 136
873
votes
11 answers

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: nasm -felf64 p14.asm && gcc p14.o -o p14 The C++…
rosghub
  • 7,772
  • 4
  • 21
  • 31
740
votes
16 answers

What's the purpose of the LEA instruction?

For me, it just seems like a funky MOV. What's its purpose and when should I use it?
user200557
  • 7,559
  • 3
  • 16
  • 7
667
votes
4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE add and five cycles for a mul to complete on most…
user1059432
  • 7,158
  • 3
  • 17
  • 16
503
votes
40 answers

When is assembly faster than C?

One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. However, I've also heard it stated many times…
Adam Bellaire
  • 99,441
  • 19
  • 144
  • 160
423
votes
17 answers

How do you get assembler output from C/C++ source in gcc?

How does one do this? If I want to analyze how something is getting compiled, how would I get the emitted assembly code?
Doug T.
  • 59,839
  • 22
  • 131
  • 193
281
votes
5 answers

Why does Java switch on contiguous ints appear to run faster with added cases?

I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying double variables by 10 raised to arbitrary…
277
votes
10 answers

Using GCC to produce readable assembly?

I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able to find a way with GCC. I am trying to re-write…
James
  • 3,272
  • 3
  • 20
  • 21
272
votes
16 answers

Is it possible to "decompile" a Windows .exe? Or at least view the Assembly?

A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in Assembly or attach a debugger? Edit to say it is not…
swilliams
  • 44,959
  • 24
  • 94
  • 129
266
votes
4 answers

How to run a program without an operating system?

How do you run a program all by itself without an operating system running? Can you create assembly programs that the computer can load and run at startup, e.g. boot the computer from a flash drive and it runs the program that is on the CPU?
user2320609
  • 1,659
  • 3
  • 11
  • 6
258
votes
10 answers

What does multicore assembly language look like?

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc. With modern CPUs that have 4 cores (or even more), at the machine code level does…
Paul Hollingsworth
  • 11,954
  • 12
  • 48
  • 66
258
votes
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…
BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
253
votes
12 answers

Is 'switch' faster than 'if'?

Is a switch statement actually faster than an if statement? I ran the code below on Visual Studio 2010's x64 C++ compiler with the /Ox flag: #include #include #include #define MAX_COUNT (1 << 29) size_t counter =…
user541686
  • 189,354
  • 112
  • 476
  • 821
1
2 3
99 100