13

I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.

Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.

To be more precise I used the following C code.

 >> cat csimp.c 
 int main (void){
 int i,j;
   for(i=1;i<21;i++)
     j= i + 100;
  return 0;
  }

Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:

csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected

This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S


Edit:

I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below. Therefore I still need your help.

>>cat csimp.asm 
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64

global main:  ; **the ':' should be removed !!!** 


SECTION .text                                           ; section number 1, code

main:   ; Function begin
        push    rbp                                     ; 0000 _ 55
        mov     rbp, rsp                                ; 0001 _ 48: 89. E5
        mov     dword [rbp-4H], 1                       ; 0004 _ C7. 45, FC, 00000001
        jmp     ?_002                                   ; 000B _ EB, 0D

?_001:  mov     eax, dword [rbp-4H]                     ; 000D _ 8B. 45, FC
        add     eax, 100                                ; 0010 _ 83. C0, 64
        mov     dword [rbp-8H], eax                     ; 0013 _ 89. 45, F8
        add     dword [rbp-4H], 1                       ; 0016 _ 83. 45, FC, 01
?_002:  cmp     dword [rbp-4H], 20                      ; 001A _ 83. 7D, FC, 14
        jle     ?_001                                   ; 001E _ 7E, ED
        pop     rbp                                     ; 0020 _ 5D
        ret                                             ; 0021 _ C3
; main End of function


SECTION .data                                           ; section number 2, data


SECTION .bss                                            ; section number 3, bss

Apparent solution:

I made a mistake when cleaning up the output of objconv. I should have run:

sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g;  /default *rel/d" csimp.asm

All steps can be condensed in a bash script

#! /bin/bash

a=$( echo $1 | sed  "s/\.c//" ) # strip the file extension .c

# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c 

# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o 

# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g;  /default *rel/d" ${a}.asm

# run nasm for 64-bit binary

nasm -f elf64 ${a}.asm 

# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o 

Running this code I get the ld warning:

 ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080 

The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.

Community
  • 1
  • 1
Alexander Cska
  • 667
  • 1
  • 6
  • 23
  • 6
    The output is intended for the gnu assembler (`as`), any particular reason you don't use that? It will "just work". `nasm` has a different syntax, unfortunately. – Jester Jan 30 '16 at 13:19
  • I didn't know this. I will try as, thank you for the answer. I am surprised that the assembly syntax is not universal. – Alexander Cska Jan 30 '16 at 13:39
  • @NateEldredge unfortunately what is written there does not work for me. I guess that `C` assembly conversion is not that straightforward. – Alexander Cska Jan 30 '16 at 19:22
  • Run `gcc -S -fverbose-asm -O csimp.c` to get a `csimp.s` assembler file for GNU `as` – Basile Starynkevitch Jan 31 '16 at 12:37
  • 2
    What do you need `nasm` for? – edmz Jan 31 '16 at 12:43
  • In your edit, your generated code is 64-bit, but your original question uses `nasm -f elf` and `ld -m elf_i386` (which targets 32-bit). So I'm venturing to guess whatever error you are seeing with the objconv generated code is possible related to you mixing up 32-bit and 64-bit code. So first off, are you trying to create 64-bit code or 32-bit code? And are you on a 32-bit or 64-bit OS? – Michael Petch Feb 01 '16 at 17:59
  • @MichaelPetch Thank you for your comment. I am using both and hence the confusion. I would like to generate `64` bit binary and run it on Suse Linux running `3.16.7-21-desktop` kernel. However, the errors you see are generated by running `nasm`, so the linker is not the problem here. I will correct the error. Thank you for your feedback. – Alexander Cska Feb 01 '16 at 18:11
  • 1
    Your edit doesn't show how you are compiling and linking that OBJCONV code, but it should be compilable with nasm with something like `nasm -felf64 csimpc.asm` . If you used `nasm -felf csimpc.asm` the `-f elf` tries to generate 32-bit output. You need `-f elf64` if you are trying to assemble 64-bit code. If on a 64-bit system _LD_ will generally output 64-bit executable by default. So you should drop `-m elf_i386` from the LD command or use `ld -m elf_x86_64`. LD with `-m elf_i386` is trying to output to a 32-bit executable – Michael Petch Feb 01 '16 at 18:14
  • @MichaelPetch I found the error, I made a mistake when cleaning up the output of `objconv`. – Alexander Cska Feb 01 '16 at 18:26
  • Instead of editing the answer into the question, you should post the answer as an answer. Yes, you're allowed to answer your own question. Also, if you're going to do this, use `gcc -O3`! The default `-O0` is mostly a literal translation of C into braindead asm. You might also want `-march=native` or `-mtune=native`, depending on whether you need the code to run on CPUs older than the host, or want it tuned specifically for the host. – Peter Cordes Feb 01 '16 at 22:31
  • Also, in a bash script, you can just do `a="${1%.c}"` to strip off a trailing `.c`, without having to pipe the string through an external command. Don't forget to double-quote variables when you expand them, or else your script will break on names with spaces in them. – Peter Cordes Feb 01 '16 at 22:47

4 Answers4

4

You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.

The closest thing is probably having objdump show the assembly in Intel format:

objdump -d $file -M intel

Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).

edmz
  • 7,675
  • 2
  • 21
  • 43
  • Also see: https://stackoverflow.com/questions/8406188/does-gcc-really-know-how-to-output-nasm-assembly?lq=1 – edmz Jan 31 '16 at 12:47
  • 1
    gcc / gas Intel syntax still uses GNU assembler directives like `.align`, `.globl`, while NASM/YASM use directives like `align` and `global`. So you'd have to port by hand. – Peter Cordes Feb 01 '16 at 22:32
  • @PeterCordes Yes, that is true. GCC "limits" itself to telling GAS to switch syntax through another GAS-like directive, `.intel_syntax`. – edmz Feb 02 '16 at 12:49
4

The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.

There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.

For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:

Generate the object file:

gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj

Convert with objconv to nasm format assembly file

objconv -fnasm s3.obj

(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)

Using a slightly modified version of your sed call, tweak the contents:

sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm

(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)

-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'

(there are probably more elegant expressions for this, this was just for example)

Compile with nasm (replacing original object file):

nasm -felf64 -o s3.obj s3.asm

Using gcc for link:

gcc -o s3 s3.obj

Test

$ ./s3

 sizeof test : 40

 myint  : 0  0
 mychar : 4  4
 myptr  : 8  8
 myarr  : 16  16
 myuint : 32  32
David C. Rankin
  • 69,681
  • 6
  • 44
  • 72
  • I changed the `main` to start and the `ld` error disappeared. But the code still produces a `Segmentation fault` error. I have no `printf` in my code, in fact it is just a main and a `for` loop but somehow it still does not run. In general if I would use `gcc` as a linker everything runs smoothly. The problem is compiling with `nasm` and linking with `ld`. – Alexander Cska Feb 01 '16 at 21:48
  • @AlexanderCska: Of course it segfaults. It tries to `ret` from `_start`, instead of making an `exit(2)` system call. `_start` isn't called by anything: it's the real entry point. The x86-64 ABI specifies that the stack holds argc, *argv, and *envp, not a return address. It should work if you change your code to call `exit(0)` instead of `return 0`, but then you'd need to link with `libc`. So you should just link using gcc like David said. IDK if I missed it, but why are you even doing this? Are you going to start hand-modifying the asm once you get it to compile and run? – Peter Cordes Feb 01 '16 at 22:39
  • If you want to use system calls *directly*, without going through the glibc wrapper, there used to be macros like `_syscall1(type, name, type1, arg1)` that would define an inline function to make the system call. See `_syscall(2)`. Or you could modify the asm around `call` instructions to put the args in the right registers for a syscall instead of a function call, and use `syscall`. It clobbers rax, rcx, and r11: see http://stackoverflow.com/questions/2535989/what-are-the-calling-conventions-for-unix-linux-system-calls-on-x86-64 – Peter Cordes Feb 01 '16 at 22:45
  • To call `exit(2)` (aka `sys_exit`): `movq $return_code, %rdi; movq $60, %rax; syscall` or using the old interrupt interface: `movq $1, %rax; mov $rc, %rbx; int $0x80`. This way you won't have to go through libc, but it's also less portable (e.g. AMD uses `sysenter` instead of `syscall`) – edmz Feb 02 '16 at 13:08
3

There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.

GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).

Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.

Brendan
  • 26,293
  • 1
  • 28
  • 50
0

Not enough to post a comment, but following David C. Rankin's answer above results in a relocation error and suggestion to compile with -fPIC for me. simp.c:

#include <stdio.h> 

int main (void){
 int i,j;
   for(i=1;i<21;i++){ 
     j= i + 100;
     printf("got int: %d\n",j); 
   }
 return(0);
}

Then I run the following:

rm *.obj *.o *.asm 
gcc -fno-asynchronous-unwind-tables -s -c simp.c -o simp.obj
objconv -fnasm simp.obj
dos2unix simp.asm 
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e 's/: *function//g' -e '/default *rel/d' simp.asm 
nasm -felf64 -o simp2.obj simp.asm
gcc -o my_simp simp2.obj

And get the following error:

/usr/bin/ld: simp2.obj: relocation R_X86_64_PC32 against symbol `printf@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
collect2: error: ld returned 1 exit status

Note: I tried using the -fPIC to the object compilation and it does add an extern _GLOBAL_OFFSET_TABLE_ entry into the generated nasm from objconv but it doesn't appear to actually be using it.

user4829160
  • 141
  • 3
  • You might need NASM `call printf wrt ..plt` to call printf in a PIE executable. `objconv` probably doesn't disassemble that way, instead assuming a non-PIE where `call printf` Just Works, with the linker adding indirection via the PLT if you dynamically link libc. Use `gcc -fno-pie -no-pie` (code-gen and linker options). Your GCC is configured with `-fPIE` as the default so adding `-fPIC` won't help. – Peter Cordes Oct 22 '19 at 19:51
  • Voila! It works. Adding to the linking above $> gcc -fno-pie -no-pie my_simp simp2.obj. Links and runs. – user4829160 Oct 22 '19 at 23:22
  • See [Can't call C standard library function on 64-bit Linux from assembly (yasm) code](//stackoverflow.com/q/52126328) and [32-bit absolute addresses no longer allowed in x86-64 Linux?](//stackoverflow.com/q/43367427). And [How to print a number in assembly NASM?](//stackoverflow.com/a/32853546) also shows NASM PIE syntax. – Peter Cordes Oct 22 '19 at 23:25