Interrupt and Exception
This chapter is the second in a series where we replace components initially set up by UEFI with Ymir's ones. This time, we'll focus on interrupts and exceptions. When UEFI transfers control to Surtr, it has already set up basic exception handlers - though they all seem to just abort. We'll replace these with proper interrupt handlers.
This chapter is based on SDM Vol.3A Chapter 6: INTERRUPT AND EXCEPTION HANDLING. If you're interested in the details, it's worth reading alongside this post.
important
The source code for this branch is in whiz-ymir-interrupt
branch.
Table of Contents
- Overview of Interrupt and Exception
- Implementation of Gate Descriptor
- Initialization of Empty IDT
- Common Parts of Interrupt Handlers
- Vector-Specific Handler
- Setting Handlers in IDT
- Summary
Overview of Interrupt and Exception
CPU can receive two types of events: interrupts and exceptions 1. Interrupts are generated asynchronously by hardware signals2, while exceptions occur synchronously when a CPU detects an error during the execution of an instruction. It's important to note that both types of events are handled only at instruction boundaries, so an interrupt handler will never be invoked in the middle of an instruction.
Category
Interrupts are categorized into the following two types based on their interrupt source:
Name | Description |
---|---|
External Interrupts | Interrupts that originate from outside the CPU. |
Software-generated Interrupts | Generated by software using INT instruction. Allows generationg an interrupt with any specified vector number. |
Exceptions are also classified into three types based on their interrupt source, but these distinctions are rarely used in practice, so we’ll skip the details here. Similar to interrupts, exceptions can be triggered using INT instruction with any specified vector. However, one important caveat is that exceptions generated via INT instruction do not push an Error Code onto the stack.
Exceptions are categorized into the following three classes based on whether the task that caused the exception can be resumed:
Name | Can Resume? | Return Address |
---|---|---|
Faults | Yes | Faulting instruction. |
Traps | Yes | Next instruction. |
Aborts | No | - |
Since exceptions are defined by CPU, their types depend on the architecture. On x64, exceptions are defined as follows:
Vector | Name | Class |
---|---|---|
0 | #DE: Divide Error | Fault |
1 | #DB: Debug | Fault/Trap |
2 | NMI: Non-Maskable Interrupt | - |
3 | #BP: Breakpoint | Trap |
4 | #OF: Overflow | Trap |
5 | #BR: BOUND Range Exceeded | Fault |
6 | #UD: Invalid Opcode | Fault |
7 | #NM: Device Not Available | Fault |
8 | #DF: Double Fault | Abort |
9 | - | - |
10 | #TS: Invalid TSS | Fault |
11 | #NP: Segment Not Present | Fault |
12 | #SS: Stack-Segment Fault | Fault |
13 | #GP: General Protection | Fault |
14 | #PF: Page Fault | Fault |
15 | - | - |
16 | #MF: x87 FPU Floating-Point Error | Fault |
17 | #AC: Alignment Check | Fault |
18 | #MC: Machine Check | Abort |
19 | #XM: SIMD Floating-Point Exception | Fault |
20 | #VE: Virtualization Exception | Fault |
21 | #CP: Control Protection Exception | Fault |
Interrupt and exception vectors range from 0 to 255, totaling 256. Of these, vectors 0 through 31 are reserved, so usable interrupt vectors range from 32 to 255.
Interrupt Descriptor Table
IDT: Interrupt Descriptor Table is a table that holds handlers for interrupts and exceptions. Like GDT, it consists of an array of 8-byte entries called Gate Descriptors3. There are three types of gate descriptors: Task Gate, Interrupt Gate, and Trap Gate. Task gates are used for hardware task switching, which we will not cover in this series. The difference between interrupt gates and trap gates is whether interrupts are disabled (by clearing RFLAGS.IF
) when the handler is called. In this series, we only use interrupt gates.
A gate descriptor has the following structure:
SDM Vol.3A 6.14.1 Figure 6-8. 64-Bit IDT Gate Descriptors
Except for Offset, Segment Selector, and DPL, the other values are fixed. Segment Selector selects the segment where the exception or interrupt handler resides. The offset within the segment specified by the segment selector is given by the Offset.
How Handler is Called
When an interrupt occurs, the gate descriptor corresponding to its vector is retrieved from the IDT. The IDT contains the Segment Selector, which specifies the segment where the handler resides - just like FS/GS segment selectors. The handler's offset within the segment is specified by Offset. In other words, the handler's physical address is calculated using the exact same method as logical to linear address translation:
SDM Vol.3A 6.12.1 Figure 6-3. Interrupt Procedure Call
Implementation of Gate Descriptor
First, let's define a gate descriptor structure. We'll also define GateType
, although in this series we only use interrupt gate:
/// Entry in the Interrupt Descriptor Table.
pub const GateDescriptor = packed struct(u128) {
/// Lower 16 bits of the offset to the ISR.
offset_low: u16,
/// Segment Selector that must point to a valid code segment in the GDT.
seg_selector: u16,
/// Interrupt Stack Table. Not used.
ist: u3 = 0,
/// Reserved.
_reserved1: u5 = 0,
/// Gate Type.
gate_type: GateType,
/// Reserved.
_reserved2: u1 = 0,
/// Descriptor Privilege Level is the required CPL to call the ISR via the INT inst.
/// Hardware interrupts ignore this field.
dpl: u2,
/// Present flag. Must be 1.
present: bool = true,
/// Middle 16 bits of the offset to the ISR.
offset_middle: u16,
/// Higher 32 bits of the offset to the ISR.
offset_high: u32,
/// Reserved.
_reserved3: u32 = 0,
pub fn offset(self: GateDescriptor) u64 {
return @as(u64, self.offset_high) << 32 | @as(u64, self.offset_middle) << 16 | @as(u64, self.offset_low);
}
};
pub const GateType = enum(u4) {
Invalid = 0b0000,
Interrupt64 = 0b1110,
Trap64 = 0b1111,
};
Next, let's define IDT. Similar to GDT, it will be an array allocated in the .data
section. Unlike the GDT's NULL Descriptor, the element at index 0 can actually be used:
pub const max_num_gates = 256;
var idt: [max_num_gates]GateDescriptor align(4096) = [_]GateDescriptor{std.mem.zeroes(GateDescriptor)} ** max_num_gates;
Finally, let's create a function to add entries to IDT:
pub const Isr = fn () callconv(.Naked) void;
pub fn setGate(
index: usize,
gate_type: GateType,
offset: Isr,
) void {
idt[index] = GateDescriptor{
.offset_low = @truncate(@intFromPtr(&offset)),
.seg_selector = gdt.kernel_cs_index << 3,
.gate_type = gate_type,
.offset_middle = @truncate(@as(u64, @intFromPtr(&offset)) >> 16),
.offset_high = @truncate(@as(u64, @intFromPtr(&offset)) >> 32),
.dpl = 0,
};
}
Isr
refers to the function type of the interrupt handlers described later. Since handlers reside in the code segment, the segment selector uses CS (the lower 3 bits are for RPL/TI4, so we shift accordingly).
Initialization of Empty IDT
Before setting up proper handlers, let's initialize an empty IDT and try using it. Here's a function to initialize the IDT:
const IdtRegister = packed struct {
limit: u16,
base: *[max_num_gates]GateDescriptor,
};
var idtr = IdtRegister{
.limit = @sizeOf(@TypeOf(idt)) - 1,
.base = undefined,
};
pub fn init() void {
idtr.base = &idt;
am.lidt(@intFromPtr(&idtr));
}
The address of the IDT itself is set in IDTR: Interrupt Descriptor Table Register, which corresponds to GDTR for GDT. Similar to GDT, due to a bug in Zig 0.13.0, you cannot define .base = &idt
directly, so the IDT address is set inside the init()
function. The am.lidt()
function is an assembly that executes LIDT instruction.
Call it from kernelMain()
to initialize:
arch.idt.init();
log.info("Initialized IDT.", .{});
With this, the empty IDT is set up. Let's trigger an exception to test it. Since causing a #DE: Divide Error
in Zig is a bit tricky, we'll generate a #GP: General Protection Fault
this time:
const ptr: *u64 = @ptrFromInt(0xDEAD_0000_0000_0000);
log.info("ptr.* = {d}", .{ptr.*});
The address 0xDEAD000000000000
is not in Canonical Form. Canonical form is the required format for virtual addresses, where the Most Significant Implemented Bit (likely the 47th bit on recent CPUs) and all higher bits must be the same. In this case, the 47th bit is 0
, but the higher bits are not 0x00000000
, which causes a #GP
(General Protection Fault).
When you run this, QEMU will likely exit. This is because a Triple Fault has occurred. First, the CPU tries to read from the specified address and triggers a #GP
. The CPU then fetches the 13th entry (the vector for #GP
) from the IDT. Since the IDT is filled with zeros, the Segment Selector is interpreted as 0 (the NULL Segment Selector). Attempting memory access with a NULL segment selector causes another #GP
5, but since this is the second consecutive #GP
, a #DF: Double Fault
is triggered6. At this point, resolving the handler's address also causes a #GP
, ultimately leading to a triple fault. A triple fault shuts down the system, causing QEMU to exit.
In any case, it seems we have successfully replaced the UEFI-provided IDT with Ymir's own IDT.
note
In main.zig
, if you comment out the function calls that initialize the IDT and GDT and then trigger a #GP
, the UEFI-provided IDT will be used. This handler dumps the exception type along with the register state:
!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP - FFFFFFFF801003DD, CS - 0000000000000038, RFLAGS - 0000000000010002
RAX - DEAD000000000000, RCX - 0000000000000000, RDX - 00000000000003F8
RBX - 000000001FE91F78, RSP - FFFFFFFF80106EC0, RBP - 000000001FE908A0
RSI - 0000000000000030, RDI - 000000000000000A
R8 - 000000001FE8FF8C, R9 - 000000001F9EC018, R10 - 000000001FAE6880
R11 - 0000000089F90BEB, R12 - 000000001FEAFF40, R13 - 000000001FE93720
R14 - FFFFFFFF801005E0, R15 - 00000000FF000000
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000001E23A000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000001F9DC000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000001F537018 0000000000000FFF, TR - 0000000000000000
Note that if you only comment out the IDT initialization, a triple fault will still occur. You also need to comment out the GDT initialization. It might be a good exercise to think about why this happens.
Common Parts of Interrupt Handlers
From here, we'll write interrupt handlers to set in the IDT. Interrupt handlers are also called ISR: Interrupt Service Routines. Since it's shorter and easier to write, we'll refer to them as ISRs from now on.
ISRs share some common tasks regardless of the vector, such as saving and restoring registers. Therefore, in Ymir, the ISR calls a common handler first, then dispatches to the specific handler corresponding to the vector.
Let's implement the common part of the ISR below. The common ISR section is generated as follows:
pub fn generateIsr(comptime vector: usize) idt.Isr {
return struct {
fn handler() callconv(.Naked) void {
// Clear the interrupt flag.
asm volatile (
\\cli
);
// If the interrupt does not provide an error code, push a dummy one.
if (vector != 8 and !(vector >= 10 and vector <= 14) and vector != 17) {
asm volatile (
\\pushq $0
);
}
// Push the vector.
asm volatile (
\\pushq %[vector]
:
: [vector] "n" (vector),
);
// Jump to the common ISR.
asm volatile (
\\jmp isrCommon
);
}
}.handler;
}
generateIsr()
takes an interrupt vector as input and generates the ISR corresponding to that vector. Although this is a common part for all ISRs, the function itself is generated separately for each vector. This is because a CPU does not save the vector number on the stack or registers when calling the ISR. If you only have one function for all ISRs, there’s no way to know which interrupt vector is currently being handled. To avoid this, we generate a separate function for each vector.
note
In Zig, returning a struct from a function is straightforward and natural. However, when returning a function, you need to create a struct first and then return a member function pointer, which is somewhat awkward. If you know a better approach, please let me know.
In an ISR, the first step is to disable interrupts using CLI instruction. When a CPU jumps to an ISR via an interrupt gate, it clears RFLAGS.IF flag automatically, but it does not do so when jumping via a trap gate. Therefore, if you want to disable interrupts within a trap gate, you need to explicitly execute CLI7.
Next, if an exception does not provide an Error Code, a dummy error code is pushed onto the stack. Some exceptions come with an error code that provides more details about the cause of the exception. For example, #PF: Page Fault
includes information such as whether the faulting access was a read or write in its error code. To keep the stack layout consistent between exceptions that do and do not provide an error code, the ISR pushes a dummy error code for those. For details on which exceptions provide an error code and their meanings, refer to SDM Vol.3A 6.15 Exception and Interrupt Reference.
Finally, before jumping to the vector-independent common part of the ISR, the vector is pushed onto the stack. This allows the single shared function for all interrupts to identify the vector.
The common part used by all interrupts is as follows:
export fn isrCommon() callconv(.Naked) void {
// Save the general-purpose registers.
asm volatile (
\\pushq %%rax
\\pushq %%rcx
\\pushq %%rdx
\\pushq %%rbx
\\pushq %%rsp
\\pushq %%rbp
\\pushq %%rsi
\\pushq %%rdi
\\pushq %%r15
\\pushq %%r14
\\pushq %%r13
\\pushq %%r12
\\pushq %%r11
\\pushq %%r10
\\pushq %%r9
\\pushq %%r8
);
// Push the context and call the handler.
asm volatile (
\\pushq %%rsp
\\popq %%rdi
// Align stack to 16 bytes.
\\pushq %%rsp
\\pushq (%%rsp)
\\andq $-0x10, %%rsp
// Call the dispatcher.
\\call intrZigEntry
// Restore the stack.
\\movq 8(%%rsp), %%rsp
);
// Remove general-purpose registers, error code, and vector from the stack.
asm volatile (
\\popq %%r8
\\popq %%r9
\\popq %%r10
\\popq %%r11
\\popq %%r12
\\popq %%r13
\\popq %%r14
\\popq %%r15
\\popq %%rdi
\\popq %%rsi
\\popq %%rbp
\\popq %%rsp
\\popq %%rbx
\\popq %%rdx
\\popq %%rcx
\\popq %%rax
\\add $0x10, %%rsp
\\iretq
);
}
First, save the current register state. Then call intrZigEntry()
, the function responsible for dispatching to the vector-specific handlers. At this point, the current RSP is passed as an argument by copying RSP into RDI8. After calling the vector-specific handler, the saved registers are restored.
Finally, return from the interrupt using IRET instruction. IRET pops RIP, CS, and RFLAGS from the stack in that order. If you're wondering where these three values were pushed onto the stack, you're correct - CPU automatically pushes them when entering the ISR. Just before calling the ISR, the CPU pushes the following onto the stack:
SDM Vol.3A 6.4 Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines
Since Ymir does not implement userland in this series, we only consider the case of "No Privilege-Level Change." Even if an error code is not provided, the common part of the ISR pushes a dummy error code, so the stack layout always matches the diagram. Additionally, Ymir's ISR also pushes the vector onto the stack. Therefore, before executing IRET, we remove the vector and error code from the stack by performing add $0x10, %%rsp
.
note
Some x64 instructions require the stack to be 16-byte aligned. One example is MOVAPS instruction, which raises a #GP
if the stack is not properly aligned. When calling intrZigEntry()
from the ISR, we ensure the stack is aligned to 16 bytes.
Vector-Specific Handler
With the common part of the ISR completed, let's call the handler corresponding to each vector. The intrZigEntry()
function, which was called from the common ISR earlier, is defined as follows:
const intr = @import("interrupt.zig");
pub const Context = packed struct {
/// General purpose registers.
registers: Registers,
/// Interrupt Vector.
vector: u64,
/// Error Code.
error_code: u64,
// CPU status:
rip: u64,
cs: u64,
rflags: u64,
};
const Registers = packed struct {
r8: u64,
r9: u64,
r10: u64,
r11: u64,
r12: u64,
r13: u64,
r14: u64,
r15: u64,
rdi: u64,
rsi: u64,
rbp: u64,
rsp: u64,
rbx: u64,
rdx: u64,
rcx: u64,
rax: u64,
};
export fn intrZigEntry(ctx: *Context) callconv(.C) void {
intr.dispatch(ctx);
}
引数の Context
は ISR が呼び出された直後のレジスタの状態です。 intrZigEntry()
を呼び出す直前のスタックが、そのまま Context
の中身になります。 先ほど intrZigEntry()
に RSP の値を引数として渡していたのは、このためです。 vector 固有のハンドラ内では、これらの Context
の情報を使うことができます。
By specifying callconv(.C)
, we ensure that the first argument is always passed in RDI (which in this case is the RSP value). Without this, Zig might choose different registers or stack locations to pass arguments for optimization purposes. The intr.dispatch()
function calls the vector-specific handler as follows:
pub const Handler = *const fn (*Context) void;
var handlers: [256]Handler = [_]Handler{unhandledHandler} ** 256;
pub fn dispatch(context: *Context) void {
const vector = context.vector;
handlers[vector](context);
}
handlers
is an array holding pointers to the handlers for each vector. The dispatch()
function calls the handler registered for the given vector. If no handler is registered, it calls the default unhandledHandler()
.
fn unhandledHandler(context: *Context) void {
@setCold(true);
log.err("============ Oops! ===================", .{});
log.err("Unhandled interrupt: {s} ({})", .{
exceptionName(context.vector),
context.vector,
});
log.err("Error Code: 0x{X}", .{context.error_code});
log.err("RIP : 0x{X:0>16}", .{context.rip});
log.err("EFLAGS : 0x{X:0>16}", .{context.rflags});
log.err("RAX : 0x{X:0>16}", .{context.registers.rax});
log.err("RBX : 0x{X:0>16}", .{context.registers.rbx});
log.err("RCX : 0x{X:0>16}", .{context.registers.rcx});
log.err("RDX : 0x{X:0>16}", .{context.registers.rdx});
log.err("RSI : 0x{X:0>16}", .{context.registers.rsi});
log.err("RDI : 0x{X:0>16}", .{context.registers.rdi});
log.err("RSP : 0x{X:0>16}", .{context.registers.rsp});
log.err("RBP : 0x{X:0>16}", .{context.registers.rbp});
log.err("R8 : 0x{X:0>16}", .{context.registers.r8});
log.err("R9 : 0x{X:0>16}", .{context.registers.r9});
log.err("R10 : 0x{X:0>16}", .{context.registers.r10});
log.err("R11 : 0x{X:0>16}", .{context.registers.r11});
log.err("R12 : 0x{X:0>16}", .{context.registers.r12});
log.err("R13 : 0x{X:0>16}", .{context.registers.r13});
log.err("R14 : 0x{X:0>16}", .{context.registers.r14});
log.err("R15 : 0x{X:0>16}", .{context.registers.r15});
log.err("CS : 0x{X:0>4}", .{context.cs});
ymir.endlessHalt();
}
This handler simply dumps the state saved in Context
and then enters an infinite HLT loop.
Setting Handlers in IDT
The common part of the ISR and the vector-specific handler dispatching are now complete. The remaining task is to set the ISR in the IDT. Let's set the ISR for all IDT entries:
pub fn init() void {
inline for (0..idt.max_num_gates) |i| {
idt.setGate(
i,
.Interrupt64,
isr.generateIsr(i),
);
}
idt.init();
am.sti();
}
Using inline for
, we call idt.setGate()
256 times. The inline for
is necessary because isr.generateIsr()
returns a function, so the loop needs to be evaluated at compile time. Each ISR generated by generateIsr(i)
is set in the IDT as an interrupt gate via the previously implemented idt.setGate()
. Finally, we enable interrupts by setting RFLAGS.IF
with STI instruction.
Summary
In this chapter, we implemented the ISR by separating the common parts shared across all vectors from the vector-specific parts. We generated an ISR for every interrupt and registered them in the IDT. The registered ISRs call the corresponding handler for each vector. Since no vector-specific handlers have been registered yet, all interrupts are handled by unhandledHandler()
. Running the earlier code that triggers #GP
again results in the following:
[INFO ] main | Booting Ymir...
[INFO ] main | Initialized GDT.
[INFO ] main | Initialized IDT.
[ERROR] intr | ============ Oops! ===================
[ERROR] intr | Unhandled interrupt: #GP: General protection fault (13)
[ERROR] intr | Error Code: 0x0
[ERROR] intr | RIP : 0xFFFFFFFF80104464
[ERROR] intr | EFLAGS : 0x0000000000010202
[ERROR] intr | RAX : 0xDEADBEEF00000000
[ERROR] intr | RBX : 0x000000001FE91F78
[ERROR] intr | RCX : 0xFFFFFFFF80111812
[ERROR] intr | RDX : 0x00000000FFFF03F8
[ERROR] intr | RSI : 0x80108E0000104760
[ERROR] intr | RDI : 0x000000000000000A
[ERROR] intr | RBP : 0x000000001FE908A0
[ERROR] intr | R8 : 0xFFFFFFFF801130E0
[ERROR] intr | R9 : 0xFFFFFFFF80111080
[ERROR] intr | R10 : 0x8010000000000000
[ERROR] intr | R11 : 0x80108E0000105670
[ERROR] intr | R12 : 0x000000001FEAFF40
[ERROR] intr | R13 : 0x000000001FE93720
[ERROR] intr | R14 : 0xFFFFFFFF80105680
[ERROR] intr | R15 : 0x00000000FF000000
[ERROR] intr | CS : 0x0010
The register state is properly dumped as expected.
Note that in this series, exceptions occurring in Ymir are not expected. As will be covered in later chapters, the kernel maps the entire virtual address space during initialization, so #PF
(Page Fault) will not occur, just like in Linux. On the other hand, since some interrupts will be registered, the vector-specific handler registration mechanism implemented here will be useful when that time comes.
With this, we've successfully switched from the UEFI-provided IDT to our own custom IDT. The only remaining dependency on UEFI is the page table. However, switching page tables requires a page allocator. In the next chapter, we'll implement the page allocator.
In this series, we refer to both interrupts and exceptions collectively as "interrupts" whenever there is no need to distinguish between them.
Interrupt signals can be delivered to a CPU at any time, but it is guaranteed that the CPU will report the interrupt only at an instruction boundary. In other words, an interrupt will never occur in the middle of executing an instruction. The same applies to exceptions as well.
While GDT can hold up to \(2^{13} = 8192\) entries, IDT only needs to support up to 256 entries because there are at most 256 interrupt vectors.
If the TI bit of the segment selector is 0
, the GDT is used; if it is 1
, the LDT is used.
SDM Vol.3A 5.4.1
Strictly speaking, the combinations of exceptions that cause a double fault are very limited. For more details, refer to the Intel SDM or Double Faults - Writing an OS in Rust.
By the way, since Ymir only uses interrupt gates, this CLI is actually unnecessary...
On x64, you cannot MOV directly to RSP.