57

I'm trying to call native machine-language code. Here's what I have so far (it gets a bus error):

char prog[] = {'\xc3'}; // x86 ret instruction

int main()
{
    typedef double (*dfunc)();

    dfunc d = (dfunc)(&prog[0]);
    (*d)();
    return 0;
}

It does correctly call the function and it gets to the ret instruction. But when it tries to execute the ret instruction, it has a SIGBUS error. Is it because I'm executing code on a page that is not cleared for execution or something like that?

So what am I doing wrong here?

BartoszKP
  • 32,105
  • 13
  • 92
  • 123
user5406764
  • 1,203
  • 1
  • 11
  • 19
  • This probably depends on OS and compiler, so you should add that info. You may need compiler-specific stuff to tell that `prog` should be executable, just as you suspect. – hyde Oct 05 '16 at 07:09
  • Also, make the function return `void` to avoid any problems related to that. – hyde Oct 05 '16 at 07:12
  • You need to allocate a page of memory and make it executable. – David Schwartz Oct 05 '16 at 07:14
  • 1
    Incase it helps anyone: I've often found that SIGBUS is indicative of bad alignment. – Doddy Oct 05 '16 at 11:18
  • 7
    @user Please post your solution as an answer instead of editing it into your question. – dandan78 Oct 05 '16 at 12:33
  • 3
    I reverted the changes to the code sample, so the question makes sense again. Please do what @dandan78 already suggested and accept an answer instead of updating your question, changing its meaning in the process. – You Oct 05 '16 at 14:27
  • 3
    Would it be more practical to use the `asm()` function? – Stavr00 Oct 05 '16 at 14:46
  • 9
    Please, please, please, please, please, please, please use `asm()` (per @Stavr00's comment and Graham's answer) rather than any of the other approaches, especially if your code has *any* possibility of ever seeing the light of day on a piece of silicon that is in any way connected to the internet or has any possibility of interacting with anyone beyond yourself. – Kyle Strand Oct 05 '16 at 21:40
  • I took the liberty of replacing "native code" with "machine code" in the title - that should avoid misunderstandings. – sleske Oct 06 '16 at 11:38

6 Answers6

51

One first problem might be that the location where the prog data is stored is not executable.

On Linux at least, the resulting binary will place the contents of global variables in the "data" segment or here, which is not executable in most normal cases.

The second problem might be that the code you are invoking is invalid in some way. There's a certain procedure to calling a method in C, called the calling convention (you might be using the "cdecl" one, for example). It might not be enough for the called function to just "ret". It might also need to do some stack cleanup etc. otherwise the program will behave unexpectedly. This might prove an issue once you get past the first problem.

Community
  • 1
  • 1
Horia Coman
  • 8,074
  • 2
  • 19
  • 23
  • 4
    [This article](http://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html) goes into remarkably complete detail on how to embed and call machine code in C. It starts with the premise of turning main() into a char array. – event44 Oct 06 '16 at 13:49
51

You need to call memprotect in order to make the page where prog lives executable. The following code does make this call, and can execute the text in prog.

#include <unistd.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>

char prog[] = {
   0x55,             // push   %rbp
   0x48, 0x89, 0xe5, // mov    %rsp,%rbp
   0xf2, 0x0f, 0x10, 0x05, 0x00, 0x00, 0x00,
       //movsd  0x0(%rip),%xmm0        # c <x+0xc>
   0x00,
   0x5d,             // pop    %rbp
   0xc3,             // retq
};

int main()
{
    long pagesize = sysconf(_SC_PAGE_SIZE);
    long page_no = (long)prog/pagesize;
    int res = mprotect((void*)(page_no*pagesize), (long)page_no+sizeof(prog), PROT_EXEC|PROT_READ|PROT_WRITE);
    if(res)
    {
        fprintf(stderr, "mprotect error:%d\n", res);
        return 1;
    }
    typedef double (*dfunc)(void);

    dfunc d = (dfunc)(&prog[0]);
    double x = (*d)();
    printf("x=%f\n", x);
    fflush(stdout);
    return 0;
}
BartoszKP
  • 32,105
  • 13
  • 92
  • 123
Rudi
  • 17,566
  • 3
  • 50
  • 74
  • In this case, you can also declare the array `const` to let it be stored in the executable part of the process memory: http://stackoverflow.com/q/12446965/1025391 – moooeeeep Oct 06 '16 at 14:23
  • 3
    @moooeeeep You can't generally assume that the memory section for readonly data is executable, even if some shitty linkers do it like that. – CodesInChaos Oct 06 '16 at 14:26
  • 3
    @CodesInChaos You can't generally assume that you can execute machine code stored in an array, yet the OP asked for it. – moooeeeep Oct 06 '16 at 14:30
  • 1
    @moooeeeep: sometime in the last year or so, GNU `ld` started linking `.rodata` into its own ELF segment so it can be read-only *without* exec permission. Not part of the text segment like it used to do. So that simple trick no longer works. You could use an `__attribute__((section(".text")))` on a const array, though, in GNU C. – Peter Cordes Jun 08 '20 at 17:03
  • Your shellcode reads past the end of its array with `movsd 0x0(%rip),%xmm0`. That's an 8-byte load starting at the `0x5d` byte (the byte after the `movsd` instruction because RIP+0). x86 is little-endian so the exponent field of the `double` will be from whatever garbage comes next. It looks like you naively copied `objdump` output for compiler-generated debug-mode code for a function that returns a `double`. It will of course load that constant from `.rodata` because x86 doesn't have FP immediate operands. But you didn't put the referenced double into the shellcode. – Peter Cordes Jun 08 '20 at 17:05
  • Some FP constants can be generated on the fly in a few instructions, like in [What are the best instruction sequences to generate vector constants on the fly?](https://stackoverflow.com/q/35085059). Or just convert your example to returning an `int` because `mov $123, eax` / `ret` is self-contained. – Peter Cordes Jun 08 '20 at 17:06
31

As everyone already said, you must ensure prog[] is executable, however the proper way to do it, unless you're writing a JIT compiler, is to put the symbol in an executable area, either by using a linker script or by specifying the section in the C code if the compiler allows , e.g.:

const char prog[] __attribute__((section(".text"))) = {...}
Ismael Luceno
  • 1,800
  • 13
  • 24
30

Virtually all C compilers will let you do this by embedding regular assembly language in your code. Of course it's a non-standard extension to C, but compiler writers recognise that it's often necessary. As a non-standard extension, you'll have to read your compiler manual and check how to do it, but the GCC "asm" extension is a fairly standard approach.

 void DoCheck(uint32_t dwSomeValue)
 {
    uint32_t dwRes;

    // Assumes dwSomeValue is not zero.
    asm ("bsfl %1,%0"
      : "=r" (dwRes)
      : "r" (dwSomeValue)
      : "cc");

    assert(dwRes > 3);
 }

Since it's easy to trash the stack in assembler, compilers often also allow you to identify registers you'll use as part of your assembler. The compiler can then ensure the rest of that function steers clear of those registers.

If you're writing the assembler code yourself, there is no good reason to set up that assembler as an array of bytes. It's not just a code smell - I'd say it is a genuine error which could only happen by being unaware of the "asm" extension which is the right way to embed assembler in your C.

Graham
  • 1,597
  • 6
  • 17
  • 7
    Good lord, how did five separate users answer this question without even *mentioning* `asm`? Bleeeeeeaaaaaaaarrrrrrrgggggghhhhh. – Kyle Strand Oct 05 '16 at 21:37
  • 6
    @KyleStrand Maybe everyone else makes the distinction between _machine language_ (what the user wants) and _assembler_. `asm` is for example less useful if you want to generate the code on-the-fly. – pipe Oct 05 '16 at 22:29
  • 1
    Also you can't use true assembly in every C compiler. E.g. MSVC will treat its `__asm` codes as code in yet another high-level language: it will try to optimize it, and won't let you emit raw bytes (like with `db` directive in MASM). – Ruslan Oct 06 '16 at 07:15
  • 2
    @pipe Except that he's setting up constant arrays with instruction byte codes. If he knows what instructions he wants, all he's doing is a complicated version of embedding an "asm" block. – Graham Oct 06 '16 at 10:33
  • @KyleStrand Very much so! Scary stuff... :) – Graham Oct 06 '16 at 10:34
  • @Graham No, he is not, it's a plain `char` array, nothing constant about that. He's _initializing_ it with a constant. – pipe Oct 06 '16 at 15:30
  • @pipe If you're going to be pedantic, then note that Graham said "constant", not "`const`". – Kyle Strand Oct 06 '16 at 18:35
  • 3
    @KyleStrand I never mentioned `const`. I'm pretty sure that someone who's trying to execute machine language from C is aware of the `asm()` construct, and I'm glad that he reduced the _example code snippet_ to the bare minimum that demonstrates the problem. – pipe Oct 06 '16 at 18:48
  • 1
    @pipe If that's definitely what he wants to do, then fine. But based on what he originally asked for, and the fact that his byte array is set by a constant initialiser and is not subsequently changed, it'd be negligent ***not*** to point out the "asm" solution as the best (and only good) solution for running a known chunk of assembler. – Graham Oct 07 '16 at 10:14
9

Essentially this has been clamped down on because it was an open invitation to virus writers. But you can allocate and buffer and set it up with native machinecode in straight C - that's no problem. The issue is calling it. Whilst you can try setting up a function pointer with the address of the buffer and calling it, that's highly unlikely to work, and highly likely to break on the next version of the compiler if somehow you do manage to coax it into doing what you want. So the best bet is to simply resort to a bit of inline assembly, to set up the return and jump to the automatically generated code. But if the system protects against this, you'll have to find methods of circumventing the protection, as Rudi described in his answer (but very specific to one particular system).

Malcolm McLean
  • 6,063
  • 1
  • 13
  • 18
6

One obvious error is that \xc3 is not returning the double that you claim it's returning.

MSalters
  • 159,923
  • 8
  • 140
  • 320