2

I have been trying to loosely follow this tutorial on basic kernel dev. Currently, the target architecture is i386.

The implementation of IRQs is causing me issues ; my interrupt handler reports a cascade of Invalid Op Code exceptions whenever I try to pass registers (defined as a struct) as an argument to a function. Here is the code for the interrupt handler which raises the exception:

void interrupt_handler(registers_t all_registers) {
    // Printing exception's name
    kprint("interrupt_handler.c (l. 53) : Exception raised was:", 0xB0);
    kprint(exception_messages[(int) all_registers.int_no], 0xB0);
    kprint("\n", 0xB0);

    // Celling test_handle to display the value of some registers
    // INVALID OP CODE ================>
    test_handle(all_registers); // works as expected if this line is commented out
   
}

void test_handle(registers_t all_registers) {
    kprint("interrupt_handler.c (l. 78) : Register DS contains", 0xD0);
    kprint("to be implemented", 0xD0);
}

The structure registers_t is defined as follows (copied from the tutorial):

typedef struct {
   u32int ds;                                      /* Data segment selector */
   u32int edi, esi, ebp, esp, ebx, edx, ecx, eax;  /* Pushed by pusha. */
   u32int int_no, err_code;                        /* Interrupt number and error code (if applicable) */
   u32int eip, cs, eflags, useresp, ss;            /* Pushed by the processor automatically */
} __attribute__((packed)) registers_t;

Trying function calling with other struct, I found that the number of variables in the struct matters ; any struct that has between 5 and 16 u32int triggers the exception. For instance, the following structure, when initialized and passed empty to test_handle, does not raise exceptions:

// Same as registers_t with less arguments
typedef struct {
    u32int ds;
    u32int edi, esi;
}  __attribute__((packed))  test_t;

Disassembling the .o file reveals that the generated code uses the mov instruction to pass test_t structures and movsd to pass registers_t. So my suspicion is that the compilation process is at fault, since the compiler generates unrecognized instructions.

Here are the relevant excerpts of my Makefile:

C_FLAGS=-ffreestanding -nostartfiles -nodefaultlibs -fno-builtin -Wall -Wextra -fno-exceptions -m32 -target i386-pc-elf  -fno-rtti

# Compiling C code
%.o: %.c
    clang $(C_FLAGS) -c $< -o $@ 

# Linking
kernel/kernel.bin: $(O_FILES)
    ld -o $@ -Ttext 0x1000 $^ --oformat binary -m elf_i386

Is there anything wrong about the compiling process? Or does the problem stem from elsewhere?

Ahmad B
  • 77
  • 5
  • 3
    Your exception handler should receive the faulting address. Examine the instruction at that address. Also consider passing your struct via pointer. – Jester Mar 06 '21 at 20:53
  • 1
    Thanks for the suggestion! Passing as pointer works fine. I'll try to see if I can figure out the code for retrieving the address, though the "movsd" instruction is the most likely suspect, since it's the only thing that differs between the code generated by clang for ``test_t`` and ``registers_t`` – Ahmad B Mar 06 '21 at 20:58
  • 6
    My guess is that the compiler is copying the structures with SSE instructions but you haven't enabled SSE instructions. – Ross Ridge Mar 06 '21 at 21:03
  • Ah, I didn't know about SSE! But that does seem to be it, thanks! I'll write a reply later today. – Ahmad B Mar 06 '21 at 21:10

1 Answers1

0

@Ross Ridge figured it out (thanks to him!). The details below are what I learned from the OSDev wiki

The Streaming SIMD Extension (SSE) expands the set of instructions recognized by the CPU with some 70 additional instructions and adds some more registers. SSE needs to be enabled before its instructions and registers can be used. The compiler generates machine code which can include SSE instructions and therefore, SSE needs to be enabled.

In the code above, the passing of struct to the function was compiled into machine code which involved the xmm0 register, which is part of the SSE.

The assembly code to enable SSE is given below (adapted from the OSDev wiki). I added it to my bootloader, right after entering the 32-bit protected mode and before entering the kernel. That fixed the problem!

mov eax, cr0        ; cr0 cannot be manipulated directly, manipulate eax instead
and ax, 0xFFFB      ; clear coprocessor emulation CR0.EM
or ax, 0x2          ; set coprocessor monitoring  CR0.MP
mov cr0, eax
mov eax, cr4        ; cr4 too cannot be manipulated directly
or ax, 3 << 9       ; set CR4.OSFXSR and CR4.OSXMMEXCPT at the same time
mov cr4, eax
Ahmad B
  • 77
  • 5
  • 1
    `and ax, 0xFFFB` is longer and slower than `and eax, 0xFFFFFFFB`. The latter can be encoded in 3 bytes (using a sign extended one-byte immediate). Also the first one causes a decoding delay because of the two-byte immediate, since the instruction decoder is optimized for one- and four-byte immediates. And it requires merging the modified value of ax with the prior top 16 bits of eax. The same applies to `or ax, 0x02` and `or eax, 0x02`. The instructions `and al, 0xfb` and `or al, 0x2` avoid most of these problems, but still require merging the upper bits of eax. – prl Mar 07 '21 at 00:56
  • @prl: This only runs once at startup so code-size is the way to go. Also, Haswell and later don't rename AL separately from the rest of RAX, and AMD CPUs never did, so RMW instructions on AL are totally fine, except on older Intel especially P6-family where it would actually lead to a partial-reg stall later ([Why doesn't GCC use partial registers?](https://stackoverflow.com/q/41573502) / [How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent](https://stackoverflow.com/q/45660139)) – Peter Cordes Mar 07 '21 at 02:51
  • 1
    Also related: [How do I enable SSE for my freestanding bootable code?](https://stackoverflow.com/q/31563078) except there the question *wanted* to be using SSE. Here, you might want to *disable* SSE with `-mno-sse` like most kernels so you don't have to save/restore user-space FPU/SIMD state in interrupt handlers! (You could still enable SSE for use by user-space, or by the main part of your kernel if you have no user-space, though. Like maybe if there's an inverse of `__attribute__((target("sse")))`?) – Peter Cordes Mar 07 '21 at 02:53