Interrupt and Exception

This chapter is the second in a series where we replace components initially set up by UEFI with Ymir's ones. This time, we'll focus on interrupts and exceptions. When UEFI transfers control to Surtr, it has already set up basic exception handlers - though they all seem to just abort. We'll replace these with proper interrupt handlers.

This chapter is based on SDM Vol.3A Chapter 6: INTERRUPT AND EXCEPTION HANDLING. If you're interested in the details, it's worth reading alongside this post.

important

The source code for this branch is in whiz-ymir-interrupt branch.

Overview of Interrupt and Exception
Implementation of Gate Descriptor
Initialization of Empty IDT
Common Parts of Interrupt Handlers
Vector-Specific Handler
Setting Handlers in IDT
Summary

Overview of Interrupt and Exception

CPU can receive two types of events: interrupts and exceptions ¹. Interrupts are generated asynchronously by hardware signals², while exceptions occur synchronously when a CPU detects an error during the execution of an instruction. It's important to note that both types of events are handled only at instruction boundaries, so an interrupt handler will never be invoked in the middle of an instruction.

Name	Description
External Interrupts	Interrupts that originate from outside the CPU.
Software-generated Interrupts	Generated by software using INT instruction. Allows generationg an interrupt with any specified vector number.

Name	Can Resume?	Return Address
Faults	Yes	Faulting instruction.
Traps	Yes	Next instruction.
Aborts	No	-

Vector	Name	Class
0	#DE: Divide Error	Fault
1	#DB: Debug	Fault/Trap
2	NMI: Non-Maskable Interrupt	-
3	#BP: Breakpoint	Trap
4	#OF: Overflow	Trap
5	#BR: BOUND Range Exceeded	Fault
6	#UD: Invalid Opcode	Fault
7	#NM: Device Not Available	Fault
8	#DF: Double Fault	Abort
9	-	-
10	#TS: Invalid TSS	Fault
11	#NP: Segment Not Present	Fault
12	#SS: Stack-Segment Fault	Fault
13	#GP: General Protection	Fault
14	#PF: Page Fault	Fault
15	-	-
16	#MF: x87 FPU Floating-Point Error	Fault
17	#AC: Alignment Check	Fault
18	#MC: Machine Check	Abort
19	#XM: SIMD Floating-Point Exception	Fault
20	#VE: Virtualization Exception	Fault
21	#CP: Control Protection Exception	Fault

Interrupt Descriptor Table

IDT: Interrupt Descriptor Table is a table that holds handlers for interrupts and exceptions. Like GDT, it consists of an array of 8-byte entries called Gate Descriptors³. There are three types of gate descriptors: Task Gate, Interrupt Gate, and Trap Gate. Task gates are used for hardware task switching, which we will not cover in this series. The difference between interrupt gates and trap gates is whether interrupts are disabled (by clearing RFLAGS.IF) when the handler is called. In this series, we only use interrupt gates.

A gate descriptor has the following structure:

64-Bit IDT Descriptors SDM Vol.3A 6.14.1 Figure 6-8. 64-Bit IDT Gate Descriptors

Except for Offset, Segment Selector, and DPL, the other values are fixed. Segment Selector selects the segment where the exception or interrupt handler resides. The offset within the segment specified by the segment selector is given by the Offset.

How Handler is Called

When an interrupt occurs, the gate descriptor corresponding to its vector is retrieved from the IDT. The IDT contains the Segment Selector, which specifies the segment where the handler resides - just like FS/GS segment selectors. The handler's offset within the segment is specified by Offset. In other words, the handler's physical address is calculated using the exact same method as logical to linear address translation:

SDM Vol.3A 6.12.1 Figure 6-3. Interrupt Procedure Call

Implementation of Gate Descriptor

First, let's define a gate descriptor structure. We'll also define GateType, although in this series we only use interrupt gate:

ymir/arch/x86/idt.zig

/// Entry in the Interrupt Descriptor Table.
pub const GateDescriptor = packed struct(u128) {
    /// Lower 16 bits of the offset to the ISR.
    offset_low: u16,
    /// Segment Selector that must point to a valid code segment in the GDT.
    seg_selector: u16,
    /// Interrupt Stack Table. Not used.
    ist: u3 = 0,
    /// Reserved.
    _reserved1: u5 = 0,
    /// Gate Type.
    gate_type: GateType,
    /// Reserved.
    _reserved2: u1 = 0,
    /// Descriptor Privilege Level is the required CPL to call the ISR via the INT inst.
    /// Hardware interrupts ignore this field.
    dpl: u2,
    /// Present flag. Must be 1.
    present: bool = true,
    /// Middle 16 bits of the offset to the ISR.
    offset_middle: u16,
    /// Higher 32 bits of the offset to the ISR.
    offset_high: u32,
    /// Reserved.
    _reserved3: u32 = 0,

    pub fn offset(self: GateDescriptor) u64 {
        return @as(u64, self.offset_high) << 32 | @as(u64, self.offset_middle) << 16 | @as(u64, self.offset_low);
    }
};

pub const GateType = enum(u4) {
    Invalid = 0b0000,
    Interrupt64 = 0b1110,
    Trap64 = 0b1111,
};

Next, let's define IDT. Similar to GDT, it will be an array allocated in the .data section. Unlike the GDT's NULL Descriptor, the element at index 0 can actually be used:

ymir/arch/x86/idt.zig

pub const max_num_gates = 256;
var idt: [max_num_gates]GateDescriptor align(4096) = [_]GateDescriptor{std.mem.zeroes(GateDescriptor)} ** max_num_gates;

Finally, let's create a function to add entries to IDT:

ymir/arch/x86/idt.zig

pub const Isr = fn () callconv(.Naked) void;

pub fn setGate(
    index: usize,
    gate_type: GateType,
    offset: Isr,
) void {
    idt[index] = GateDescriptor{
        .offset_low = @truncate(@intFromPtr(&offset)),
        .seg_selector = gdt.kernel_cs_index << 3,
        .gate_type = gate_type,
        .offset_middle = @truncate(@as(u64, @intFromPtr(&offset)) >> 16),
        .offset_high = @truncate(@as(u64, @intFromPtr(&offset)) >> 32),
        .dpl = 0,
    };
}

Isr refers to the function type of the interrupt handlers described later. Since handlers reside in the code segment, the segment selector uses CS (the lower 3 bits are for RPL/TI⁴, so we shift accordingly).

Initialization of Empty IDT

Before setting up proper handlers, let's initialize an empty IDT and try using it. Here's a function to initialize the IDT:

ymir/arch/x86/idt.zig

const IdtRegister = packed struct {
    limit: u16,
    base: *[max_num_gates]GateDescriptor,
};

var idtr = IdtRegister{
    .limit = @sizeOf(@TypeOf(idt)) - 1,
    .base = undefined,
};

pub fn init() void {
    idtr.base = &idt;
    am.lidt(@intFromPtr(&idtr));
}

The address of the IDT itself is set in IDTR: Interrupt Descriptor Table Register, which corresponds to GDTR for GDT. Similar to GDT, due to a bug in Zig 0.13.0, you cannot define .base = &idt directly, so the IDT address is set inside the init() function. The am.lidt() function is an assembly that executes LIDT instruction.

Call it from kernelMain() to initialize:

ymir/main.zig

arch.idt.init();
log.info("Initialized IDT.", .{});

With this, the empty IDT is set up. Let's trigger an exception to test it. Since causing a #DE: Divide Error in Zig is a bit tricky, we'll generate a #GP: General Protection Fault this time:

ymir/main.zig

const ptr: *u64 = @ptrFromInt(0xDEAD_0000_0000_0000);
log.info("ptr.* = {d}", .{ptr.*});

The address 0xDEAD000000000000 is not in Canonical Form. Canonical form is the required format for virtual addresses, where the Most Significant Implemented Bit (likely the 47th bit on recent CPUs) and all higher bits must be the same. In this case, the 47th bit is 0, but the higher bits are not 0x00000000, which causes a #GP (General Protection Fault).

When you run this, QEMU will likely exit. This is because a Triple Fault has occurred. First, the CPU tries to read from the specified address and triggers a #GP. The CPU then fetches the 13th entry (the vector for #GP) from the IDT. Since the IDT is filled with zeros, the Segment Selector is interpreted as 0 (the NULL Segment Selector). Attempting memory access with a NULL segment selector causes another #GP⁵, but since this is the second consecutive #GP, a #DF: Double Fault is triggered⁶. At this point, resolving the handler's address also causes a #GP, ultimately leading to a triple fault. A triple fault shuts down the system, causing QEMU to exit.

In any case, it seems we have successfully replaced the UEFI-provided IDT with Ymir's own IDT.

note

In main.zig, if you comment out the function calls that initialize the IDT and GDT and then trigger a #GP, the UEFI-provided IDT will be used. This handler dumps the exception type along with the register state:

txt

!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP  - FFFFFFFF801003DD, CS  - 0000000000000038, RFLAGS - 0000000000010002
RAX  - DEAD000000000000, RCX - 0000000000000000, RDX - 00000000000003F8
RBX  - 000000001FE91F78, RSP - FFFFFFFF80106EC0, RBP - 000000001FE908A0
RSI  - 0000000000000030, RDI - 000000000000000A
R8   - 000000001FE8FF8C, R9  - 000000001F9EC018, R10 - 000000001FAE6880
R11  - 0000000089F90BEB, R12 - 000000001FEAFF40, R13 - 000000001FE93720
R14  - FFFFFFFF801005E0, R15 - 00000000FF000000
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000001E23A000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000001F9DC000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000001F537018 0000000000000FFF,   TR - 0000000000000000

Note that if you only comment out the IDT initialization, a triple fault will still occur. You also need to comment out the GDT initialization. It might be a good exercise to think about why this happens.

Common Parts of Interrupt Handlers

From here, we'll write interrupt handlers to set in the IDT. Interrupt handlers are also called ISR: Interrupt Service Routines. Since it's shorter and easier to write, we'll refer to them as ISRs from now on.

ISRs share some common tasks regardless of the vector, such as saving and restoring registers. Therefore, in Ymir, the ISR calls a common handler first, then dispatches to the specific handler corresponding to the vector.

Let's implement the common part of the ISR below. The common ISR section is generated as follows:

ymir/arch/x86/isr.zig

pub fn generateIsr(comptime vector: usize) idt.Isr {
    return struct {
        fn handler() callconv(.Naked) void {
            // Clear the interrupt flag.
            asm volatile (
                \\cli
            );
            // If the interrupt does not provide an error code, push a dummy one.
            if (vector != 8 and !(vector >= 10 and vector <= 14) and vector != 17) {
                asm volatile (
                    \\pushq $0
                );
            }
            // Push the vector.
            asm volatile (
                \\pushq %[vector]
                :
                : [vector] "n" (vector),
            );
            // Jump to the common ISR.
            asm volatile (
                \\jmp isrCommon
            );
        }
    }.handler;
}

generateIsr() takes an interrupt vector as input and generates the ISR corresponding to that vector. Although this is a common part for all ISRs, the function itself is generated separately for each vector. This is because a CPU does not save the vector number on the stack or registers when calling the ISR. If you only have one function for all ISRs, there’s no way to know which interrupt vector is currently being handled. To avoid this, we generate a separate function for each vector.

note

In Zig, returning a struct from a function is straightforward and natural. However, when returning a function, you need to create a struct first and then return a member function pointer, which is somewhat awkward. If you know a better approach, please let me know.

In an ISR, the first step is to disable interrupts using CLI instruction. When a CPU jumps to an ISR via an interrupt gate, it clears RFLAGS.IF flag automatically, but it does not do so when jumping via a trap gate. Therefore, if you want to disable interrupts within a trap gate, you need to explicitly execute CLI⁷.

Next, if an exception does not provide an Error Code, a dummy error code is pushed onto the stack. Some exceptions come with an error code that provides more details about the cause of the exception. For example, #PF: Page Fault includes information such as whether the faulting access was a read or write in its error code. To keep the stack layout consistent between exceptions that do and do not provide an error code, the ISR pushes a dummy error code for those. For details on which exceptions provide an error code and their meanings, refer to SDM Vol.3A 6.15 Exception and Interrupt Reference.

Finally, before jumping to the vector-independent common part of the ISR, the vector is pushed onto the stack. This allows the single shared function for all interrupts to identify the vector.

The common part used by all interrupts is as follows:

ymir/arch/x86/isr.zig

export fn isrCommon() callconv(.Naked) void {
    // Save the general-purpose registers.
    asm volatile (
        \\pushq %%rax
        \\pushq %%rcx
        \\pushq %%rdx
        \\pushq %%rbx
        \\pushq %%rsp
        \\pushq %%rbp
        \\pushq %%rsi
        \\pushq %%rdi
        \\pushq %%r15
        \\pushq %%r14
        \\pushq %%r13
        \\pushq %%r12
        \\pushq %%r11
        \\pushq %%r10
        \\pushq %%r9
        \\pushq %%r8
    );

    // Push the context and call the handler.
    asm volatile (
        \\pushq %%rsp
        \\popq %%rdi
        // Align stack to 16 bytes.
        \\pushq %%rsp
        \\pushq (%%rsp)
        \\andq $-0x10, %%rsp
        // Call the dispatcher.
        \\call intrZigEntry
        // Restore the stack.
        \\movq 8(%%rsp), %%rsp
    );

    // Remove general-purpose registers, error code, and vector from the stack.
    asm volatile (
        \\popq %%r8
        \\popq %%r9
        \\popq %%r10
        \\popq %%r11
        \\popq %%r12
        \\popq %%r13
        \\popq %%r14
        \\popq %%r15
        \\popq %%rdi
        \\popq %%rsi
        \\popq %%rbp
        \\popq %%rsp
        \\popq %%rbx
        \\popq %%rdx
        \\popq %%rcx
        \\popq %%rax
        \\add   $0x10, %%rsp
        \\iretq
    );
}

First, save the current register state. Then call intrZigEntry(), the function responsible for dispatching to the vector-specific handlers. At this point, the current RSP is passed as an argument by copying RSP into RDI⁸. After calling the vector-specific handler, the saved registers are restored.

Finally, return from the interrupt using IRET instruction. IRET pops RIP, CS, and RFLAGS from the stack in that order. If you're wondering where these three values were pushed onto the stack, you're correct - CPU automatically pushes them when entering the ISR. Just before calling the ISR, the CPU pushes the following onto the stack:

SDM Vol.3A 6.4 Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines

Since Ymir does not implement userland in this series, we only consider the case of "No Privilege-Level Change." Even if an error code is not provided, the common part of the ISR pushes a dummy error code, so the stack layout always matches the diagram. Additionally, Ymir's ISR also pushes the vector onto the stack. Therefore, before executing IRET, we remove the vector and error code from the stack by performing add $0x10, %%rsp.

note

Some x64 instructions require the stack to be 16-byte aligned. One example is MOVAPS instruction, which raises a #GP if the stack is not properly aligned. When calling intrZigEntry() from the ISR, we ensure the stack is aligned to 16 bytes.

Vector-Specific Handler

With the common part of the ISR completed, let's call the handler corresponding to each vector. The intrZigEntry() function, which was called from the common ISR earlier, is defined as follows:

ymir/arch/x86/isr.zig

const intr = @import("interrupt.zig");

pub const Context = packed struct {
    /// General purpose registers.
    registers: Registers,
    /// Interrupt Vector.
    vector: u64,
    /// Error Code.
    error_code: u64,

    // CPU status:
    rip: u64,
    cs: u64,
    rflags: u64,
};
const Registers = packed struct {
    r8: u64,
    r9: u64,
    r10: u64,
    r11: u64,
    r12: u64,
    r13: u64,
    r14: u64,
    r15: u64,
    rdi: u64,
    rsi: u64,
    rbp: u64,
    rsp: u64,
    rbx: u64,
    rdx: u64,
    rcx: u64,
    rax: u64,
};

export fn intrZigEntry(ctx: *Context) callconv(.C) void {
    intr.dispatch(ctx);
}

The Context argument represents the register state immediately after the ISR is invoked. The stack just before calling intrZigEntry() directly corresponds to the contents of Context. This is why we passed the RSP value as an argument to intrZigEntry(). Inside the vector-specific handler, you can access this Context information.

By specifying callconv(.C), we ensure that the first argument is always passed in RDI (which in this case is the RSP value). Without this, Zig might choose different registers or stack locations to pass arguments for optimization purposes. The intr.dispatch() function calls the vector-specific handler as follows:

ymir/arch/x86/interrupt.zig

pub const Handler = *const fn (*Context) void;
var handlers: [256]Handler = [_]Handler{unhandledHandler} ** 256;

pub fn dispatch(context: *Context) void {
    const vector = context.vector;
    handlers[vector](context);
}

handlers is an array holding pointers to the handlers for each vector. The dispatch() function calls the handler registered for the given vector. If no handler is registered, it calls the default unhandledHandler().

ymir/arch/x86/interrupt.zig

fn unhandledHandler(context: *Context) void {
    @setCold(true);

    log.err("============ Oops! ===================", .{});
    log.err("Unhandled interrupt: {s} ({})", .{
        exceptionName(context.vector),
        context.vector,
    });
    log.err("Error Code: 0x{X}", .{context.error_code});
    log.err("RIP    : 0x{X:0>16}", .{context.rip});
    log.err("EFLAGS : 0x{X:0>16}", .{context.rflags});
    log.err("RAX    : 0x{X:0>16}", .{context.registers.rax});
    log.err("RBX    : 0x{X:0>16}", .{context.registers.rbx});
    log.err("RCX    : 0x{X:0>16}", .{context.registers.rcx});
    log.err("RDX    : 0x{X:0>16}", .{context.registers.rdx});
    log.err("RSI    : 0x{X:0>16}", .{context.registers.rsi});
    log.err("RDI    : 0x{X:0>16}", .{context.registers.rdi});
    log.err("RSP    : 0x{X:0>16}", .{context.registers.rsp});
    log.err("RBP    : 0x{X:0>16}", .{context.registers.rbp});
    log.err("R8     : 0x{X:0>16}", .{context.registers.r8});
    log.err("R9     : 0x{X:0>16}", .{context.registers.r9});
    log.err("R10    : 0x{X:0>16}", .{context.registers.r10});
    log.err("R11    : 0x{X:0>16}", .{context.registers.r11});
    log.err("R12    : 0x{X:0>16}", .{context.registers.r12});
    log.err("R13    : 0x{X:0>16}", .{context.registers.r13});
    log.err("R14    : 0x{X:0>16}", .{context.registers.r14});
    log.err("R15    : 0x{X:0>16}", .{context.registers.r15});
    log.err("CS     : 0x{X:0>4}", .{context.cs});

    ymir.endlessHalt();
}

This handler simply dumps the state saved in Context and then enters an infinite HLT loop.

Setting Handlers in IDT

The common part of the ISR and the vector-specific handler dispatching are now complete. The remaining task is to set the ISR in the IDT. Let's set the ISR for all IDT entries:

ymir/arch/x86/interrupt.zig

pub fn init() void {
    inline for (0..idt.max_num_gates) |i| {
        idt.setGate(
            i,
            .Interrupt64,
            isr.generateIsr(i),
        );
    }

    idt.init();

    am.sti();
}

Using inline for, we call idt.setGate() 256 times. The inline for is necessary because isr.generateIsr() returns a function, so the loop needs to be evaluated at compile time. Each ISR generated by generateIsr(i) is set in the IDT as an interrupt gate via the previously implemented idt.setGate(). Finally, we enable interrupts by setting RFLAGS.IF with STI instruction.

Summary

In this chapter, we implemented the ISR by separating the common parts shared across all vectors from the vector-specific parts. We generated an ISR for every interrupt and registered them in the IDT. The registered ISRs call the corresponding handler for each vector. Since no vector-specific handlers have been registered yet, all interrupts are handled by unhandledHandler(). Running the earlier code that triggers #GP again results in the following:

txt

[INFO ] main    | Booting Ymir...
[INFO ] main    | Initialized GDT.
[INFO ] main    | Initialized IDT.
[ERROR] intr    | ============ Oops! ===================
[ERROR] intr    | Unhandled interrupt: #GP: General protection fault (13)
[ERROR] intr    | Error Code: 0x0
[ERROR] intr    | RIP    : 0xFFFFFFFF80104464
[ERROR] intr    | EFLAGS : 0x0000000000010202
[ERROR] intr    | RAX    : 0xDEADBEEF00000000
[ERROR] intr    | RBX    : 0x000000001FE91F78
[ERROR] intr    | RCX    : 0xFFFFFFFF80111812
[ERROR] intr    | RDX    : 0x00000000FFFF03F8
[ERROR] intr    | RSI    : 0x80108E0000104760
[ERROR] intr    | RDI    : 0x000000000000000A
[ERROR] intr    | RBP    : 0x000000001FE908A0
[ERROR] intr    | R8     : 0xFFFFFFFF801130E0
[ERROR] intr    | R9     : 0xFFFFFFFF80111080
[ERROR] intr    | R10    : 0x8010000000000000
[ERROR] intr    | R11    : 0x80108E0000105670
[ERROR] intr    | R12    : 0x000000001FEAFF40
[ERROR] intr    | R13    : 0x000000001FE93720
[ERROR] intr    | R14    : 0xFFFFFFFF80105680
[ERROR] intr    | R15    : 0x00000000FF000000
[ERROR] intr    | CS     : 0x0010

The register state is properly dumped as expected.

Note that in this series, exceptions occurring in Ymir are not expected. As will be covered in later chapters, the kernel maps the entire virtual address space during initialization, so #PF (Page Fault) will not occur, just like in Linux. On the other hand, since some interrupts will be registered, the vector-specific handler registration mechanism implemented here will be useful when that time comes.

With this, we've successfully switched from the UEFI-provided IDT to our own custom IDT. The only remaining dependency on UEFI is the page table. However, switching page tables requires a page allocator. In the next chapter, we'll implement the page allocator.

In this series, we refer to both interrupts and exceptions collectively as "interrupts" whenever there is no need to distinguish between them.

Interrupt signals can be delivered to a CPU at any time, but it is guaranteed that the CPU will report the interrupt only at an instruction boundary. In other words, an interrupt will never occur in the middle of executing an instruction. The same applies to exceptions as well.

While GDT can hold up to $2^{13} = 8192$ entries, IDT only needs to support up to 256 entries because there are at most 256 interrupt vectors.

⁴

If the TI bit of the segment selector is 0, the GDT is used; if it is 1, the LDT is used.

⁵

SDM Vol.3A 5.4.1

⁶

Strictly speaking, the combinations of exceptions that cause a double fault are very limited. For more details, refer to the Intel SDM or Double Faults - Writing an OS in Rust.

⁷

By the way, since Ymir only uses interrupt gates, this CLI is actually unnecessary...

⁸

On x64, you cannot MOV directly to RSP.

Writing Hypervisor in Zig

Interrupt and Exception

Table of Contents

Overview of Interrupt and Exception

Category

Interrupt Descriptor Table

How Handler is Called

Implementation of Gate Descriptor

Initialization of Empty IDT

Common Parts of Interrupt Handlers

Vector-Specific Handler

Setting Handlers in IDT

Summary