Simple Page Table

In the previous chapter, we enabled loading the Ymir ELF image from the filesystem into memory. Ideally, we would like to load the kernel right after that, but to do so, we need to manipulate the page table to map the virtual addresses requested by the ELF file. Since the Surtr bootloader only needs to manipulate page tables to load kernel image, this chapter focuses on implementing the minimal necessary page table operations.

important

Source code for this chapter is in whiz-surtr-simple_pg branch.

Table of Contents

Creating the arch Directory

Page table structures and operations heavily depend on the CPU architecture. In this series, we only support x86-64, but even so, architecture-specific code should be organized into separate layers.

Create an x86 directory inside the arch directory, resulting in the following structure:

bash
> tree ./surtr
./surtr
├── arch
│   └── x86
│       └── arch.zig
├── arch.zig
├── boot.zig
└── log.zig
  • arch.zig: This file is used directly from boot.zig. It provides an API that abstracts architecture-dependent concepts as much as possible, preventing direct access to the APIs provided by files under arch/.
  • arch/x86/arch.zig: The root file that exports x86-64 specific APIs and related functionality.

In /arch.zig, the code under arch/ is exported according to the target architecture as follows:

surtr/arch.zig
const builtin = @import("builtin");
pub usingnamespace switch (builtin.target.cpu.arch) {
    .x86_64 => @import("arch/x86/arch.zig"),
    else => @compileError("Unsupported architecture."),
};

builtin.target.cpu.arch corresponds to the target CPU architecture specified by .cpu_arch in build.zig. Although we only have x86_64, this can vary depending on the target if we support other architectures. Since this value is determined at compile time, the switch statement is also evaluated at compile time, and the root file for the corresponding architecture is exported.

info

usingnamespace is a feature that brings all the fields of a specified struct into the current scope. In this case, if you simply use @import("arch/x86/arch.zig"), the caller would need to specify an extra field domain as follows:

zig
// -- surtr/arch.zig --
pub const impl = @import("arch/x86/arch.zig");
// -- surtr/boot.zig --
const arch = @import("arch.zig");
arch.impl.someFunction();

By using usingnamespace, you can eliminate this extra level, making the code cleaner:

zig
// -- surtr/arch.zig --
pub usingnamespace @import("arch/x86/arch.zig");
// -- surtr/boot.zig --
const arch = @import("arch.zig");
arch.someFunction();

Although it might look somewhat like black magic, rest assured that usingnamespace cannot import struct fields directly into the top-level scope of the current file:

zig
usingnamespace @import("some.zig"); // someFunction() が定義されているファイル
someFunction(); // このようなことはできない

arch/x86/arch.zig defines files containing architecture-dependent code that should be accessible from /arch or higher levels. Since we want to implement page tables, we will create arch/x86/page.zig and then export it from arch/x86/arch.zig.

surtr/arch/x86/arch.zig
pub const page = @import("page.zig");

With this setup, you can now access x64 paging-related functionality directly from boot.zig.

Page Table Entry

From here, we will enable page table manipulation. When control is transferred from UEFI FW to Surtr, UEFI has already switched to 64-bit mode and constructed the initial page tables. Let's check the memory map using the vmmap command of gef. Since -s is specified as the QEMU startup command in build.zig, you can attach to the GDB server on port 1234. Currently, Surtr enters an infinite loop at the end of main(), so you can attach with GDB during this time:

Memory Map
sh
gef> target remote:1234
gef> vmmap
--------------------------------------- Memory map ---------------------------------------
Virtual address start-end              Physical address start-end             Total size   Page size   Count  Flags
0x0000000000000000-0x0000000000200000  0x0000000000000000-0x0000000000200000  0x200000     0x200000    1      [RWX KERN ACCESSED DIRTY]
0x0000000000200000-0x0000000000800000  0x0000000000200000-0x0000000000800000  0x600000     0x200000    3      [RWX KERN ACCESSED]
0x0000000000800000-0x0000000000a00000  0x0000000000800000-0x0000000000a00000  0x200000     0x200000    1      [RWX KERN ACCESSED DIRTY]
0x0000000000a00000-0x000000001be00000  0x0000000000a00000-0x000000001be00000  0x1b400000   0x200000    218    [RWX KERN ACCESSED]
0x000000001be00000-0x000000001c000000  0x000000001be00000-0x000000001c000000  0x200000     0x200000    1      [RWX KERN ACCESSED DIRTY]
0x000000001c000000-0x000000001e200000  0x000000001c000000-0x000000001e200000  0x2200000    0x200000    17     [RWX KERN ACCESSED]
0x000000001e200000-0x000000001ee00000  0x000000001e200000-0x000000001ee00000  0xc00000     0x200000    6      [RWX KERN ACCESSED DIRTY]
0x000000001ee00000-0x000000001f000000  0x000000001ee00000-0x000000001f000000  0x200000     0x200000    1      [R-X KERN ACCESSED DIRTY]
0x000000001f000000-0x000000001fa00000  0x000000001f000000-0x000000001fa00000  0xa00000     0x200000    5      [RWX KERN ACCESSED DIRTY]
0x000000001fa00000-0x000000001fac6000  0x000000001fa00000-0x000000001fac6000  0xc6000      0x1000      198    [RWX KERN ACCESSED DIRTY]
0x000000001fac6000-0x000000001fac7000  0x000000001fac6000-0x000000001fac7000  0x1000       0x1000      1      [RW- KERN ACCESSED DIRTY]
0x000000001fac7000-0x000000001fac8000  0x000000001fac7000-0x000000001fac8000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001fac8000-0x000000001faca000  0x000000001fac8000-0x000000001faca000  0x2000       0x1000      2      [RW- KERN ACCESSED DIRTY]
0x000000001faca000-0x000000001facb000  0x000000001faca000-0x000000001facb000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001facb000-0x000000001facd000  0x000000001facb000-0x000000001facd000  0x2000       0x1000      2      [RW- KERN ACCESSED DIRTY]
0x000000001facd000-0x000000001facf000  0x000000001facd000-0x000000001facf000  0x2000       0x1000      2      [R-X KERN ACCESSED DIRTY]
0x000000001facf000-0x000000001fad1000  0x000000001facf000-0x000000001fad1000  0x2000       0x1000      2      [RW- KERN ACCESSED DIRTY]
0x000000001fad1000-0x000000001fad2000  0x000000001fad1000-0x000000001fad2000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001fad2000-0x000000001fad4000  0x000000001fad2000-0x000000001fad4000  0x2000       0x1000      2      [RW- KERN ACCESSED DIRTY]
0x000000001fad4000-0x000000001fadb000  0x000000001fad4000-0x000000001fadb000  0x7000       0x1000      7      [R-X KERN ACCESSED DIRTY]
0x000000001fadb000-0x000000001fade000  0x000000001fadb000-0x000000001fade000  0x3000       0x1000      3      [RW- KERN ACCESSED DIRTY]
0x000000001fade000-0x000000001fadf000  0x000000001fade000-0x000000001fadf000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001fadf000-0x000000001fae2000  0x000000001fadf000-0x000000001fae2000  0x3000       0x1000      3      [RW- KERN ACCESSED DIRTY]
0x000000001fae2000-0x000000001fae3000  0x000000001fae2000-0x000000001fae3000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001fae3000-0x000000001fae6000  0x000000001fae3000-0x000000001fae6000  0x3000       0x1000      3      [RW- KERN ACCESSED DIRTY]
0x000000001fae6000-0x000000001fae7000  0x000000001fae6000-0x000000001fae7000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001fae7000-0x000000001faea000  0x000000001fae7000-0x000000001faea000  0x3000       0x1000      3      [RW- KERN ACCESSED DIRTY]
0x000000001faea000-0x000000001faeb000  0x000000001faea000-0x000000001faeb000  0x1000       0x1000      1      [R-X KERN ACCESSED DIRTY]
0x000000001faeb000-0x000000001faed000  0x000000001faeb000-0x000000001faed000  0x2000       0x1000      2      [RW- KERN ACCESSED DIRTY]
0x000000001faed000-0x000000001fc00000  0x000000001faed000-0x000000001fc00000  0x113000     0x1000      275    [RWX KERN ACCESSED DIRTY]
0x000000001fc00000-0x000000001fe00000  0x000000001fc00000-0x000000001fe00000  0x200000     0x200000    1      [R-X KERN ACCESSED DIRTY]
0x000000001fe00000-0x0000000020000000  0x000000001fe00000-0x0000000020000000  0x200000     0x1000      512    [RWX KERN ACCESSED DIRTY]
0x0000000020000000-0x0000000040000000  0x0000000020000000-0x0000000040000000  0x20000000   0x200000    256    [RWX KERN ACCESSED]
0x0000000040000000-0x0000000080000000  0x0000000040000000-0x0000000080000000  0x40000000   0x40000000  1      [RWX KERN ACCESSED]

The page tables prepared by UEFI map virtual addresses directly to same physical addresses. While the full-fledged page table setup will be handled by Ymir, Surtr will only perform a simple paging configuration. In this series, we adopt 4-level paging. The page table entries at each level of 4-level paging have the following structure:

Formats of CR3 and Paging-Structure Entries with 4-Level Paging Formats of CR3 and Paging-Structure Entries with 4-Level Paging. SDM Vol.3A 4.5.5

There are four types of entries, which Intel refers to as PML4E, PDPTE, PDE, and PTE. The naming varies depending on the software; for example, Linux calls them PGD, PUD, PMD, and PTE. Since these names can be unintuitive, in this series we will refer to them as Lv4, Lv3, Lv2, and Lv1 respectively.

First, let's define structures representing each of the four entries. As you can see from the diagram above, all four entries share a very similar structure1. Therefore, we define a function called EntryBase() that returns a type, and then use it to define the four entries as follows:

surtr/arch/x86/page.zig
const TableLevel = enum { lv4, lv3, lv2, lv1 };

fn EntryBase(table_level: TableLevel) type {
    return packed struct(u64) {
        const Self = @This();
        const level = table_level;

        /// Present.
        present: bool = true,
        /// Read/Write.
        /// If set to false, write access is not allowed to the region.
        rw: bool,
        /// User/Supervisor.
        /// If set to false, user-mode access is not allowed to the region.
        us: bool,
        /// Page-level write-through.
        /// Indirectly determines the memory type used to access the page or page table.
        pwt: bool = false,
        /// Page-level cache disable.
        /// Indirectly determines the memory type used to access the page or page table.
        pcd: bool = false,
        /// Accessed.
        /// Indicates whether this entry has been used for translation.
        accessed: bool = false,
        /// Dirty bit.
        /// Indicates whether software has written to the 2MiB page.
        /// Ignored when this entry references a page table.
        dirty: bool = false,
        /// Page Size.
        /// If set to true, the entry maps a page.
        /// If set to false, the entry references a page table.
        ps: bool,
        /// Ignored when CR4.PGE != 1.
        /// Ignored when this entry references a page table.
        /// Ignored for level-4 entries.
        global: bool = true,
        /// Ignored
        _ignored1: u2 = 0,
        /// Ignored except for HLAT paging.
        restart: bool = false,
        /// When the entry maps a page, physical address of the page.
        /// When the entry references a page table, 4KB aligned address of the page table.
        phys: u51,
        /// Execute Disable.
        xd: bool = false,
    };
}

const Lv4Entry = EntryBase(.lv4);
const Lv3Entry = EntryBase(.lv3);
const Lv2Entry = EntryBase(.lv2);
const Lv1Entry = EntryBase(.lv1);

In Zig, functions can return types, enabling a feature somewhat similar to templates in C++. The EntryBase() function takes an enum called TableLevel, which corresponds to the entry's level, and returns a struct representing the page table entry for that level. The passed argument can be used as a constant within the returned struct. Additionally, by declaring it as a packed struct(u64), we ensure the total size of the fields is exactly 64 bits2.

Finally, by calling EntryBase() for each LvXEntry, we define the four types of page table entries. This is similar to template instantiation in C++. Since these are resolved at compile time, there is no runtime overhead.

We add a method to this struct to retrieve the physical address pointed to by the entry, which can be either the next-level page table or the physical page itself:

surtr/arch/x86/page.zig
pub const Phys = u64;
pub const Virt = u64;

pub inline fn address(self: Self) Phys {
    return @as(u64, @intCast(self.phys)) << 12;
}

Phys and Virt represent physical and virtual address types, respectively. We often risk mixing up physical and virtual addresses3 when implementing paging functions, so these types clearly indicate whether an address is physical or virtual to prevent such mistakes4. The address() function is a helper that converts its internal phys field to by shifting it to produce the physical address. The returned physical address corresponds to either the physical address of the mapped page if the entry maps a page (.ps == true), or the physical address of the referenced page table if the entry points to another page table (.ps == false).

Next, define a function to create a page table entry. It is straightforward when creating an entry that maps a page:

surtr/arch/x86/page.zig
pub fn newMapPage(phys: Phys, present: bool) Self {
    if (level == .lv4) @compileError("Lv4 entry cannot map a page");
    return Self{
        .present = present,
        .rw = true,
        .us = false,
        .ps = true,
        .phys = @truncate(phys >> 12),
    };
}

To map a page, set .ps to true and specify the physical address of the page to be mapped. Note that Surtr/Ymir does not support 512 GiB pages, so if this function is called for a Lv4Entry (i.e., level == .lv4), it will result in a compile-time error.

Similarly, we define a function to create an entry that references a page table. In this case, the argument is not the physical address of a page, but a pointer to an entry one level lower than itself. To do this, we need to define the "type of the entry one level below." Add the following constant to the struct:

surtr/arch/x86/page.zig
const LowerType = switch (level) {
    .lv4 => Lv3Entry,
    .lv3 => Lv2Entry,
    .lv2 => Lv1Entry,
    .lv1 => struct {},
};

If the entry is Lv4Entry, then LowerType is Lv3Entry. Since there is no entry below Lv1Entry, in that case LowerType returns an empty struct. Using this, the function to create an entry that references a page table looks like this:

surtr/arch/x86/page.zig
pub fn newMapTable(table: [*]LowerType, present: bool) Self {
    if (level == .lv1) @compileError("Lv1 entry cannot reference a page table");
    return Self{
        .present = present,
        .rw = true,
        .us = false,
        .ps = false,
        .phys = @truncate(@intFromPtr(table) >> 12),
    };
}

table is a pointer to the page table that this entry points to. In contrast to the previous case, if the entry itself is an Lv1Entry, this will result in a compile-time error.

Mapping a 4KiB page

With the table entries defined, let's implement the function to actually map pages. Surtr will support only 4KiB pages.

Page walking starts from the Lv4 table stored in the CR3 register, as shown in the diagram5. The bits [47:39] of the virtual address serve as the index into the Lv4 table. By using this index, you obtain the Lv4 entry, which contains a pointer to the Lv3 table. Repeating this process leads you down to the Lv1 entry, which maps the 4KiB page.

Linear-Address Translation to a 4-KByte Page Using 4-Level Paging Linear-Address Translation to a 4-KByte Page Using 4-Level Paging. SDM Vol.3A 4.5.4

First, here are the functions to retrieve the page tables at each level:

surtr/arch/x86/page.zig
const page_mask_4k: u64 = 0xFFF;
const num_table_entries: usize = 512;

fn getTable(T: type, addr: Phys) []T {
    const ptr: [*]T = @ptrFromInt(addr & ~page_mask_4k);
    return ptr[0..num_table_entries];
}
fn getLv4Table(cr3: Phys) []Lv4Entry {
    return getTable(Lv4Entry, cr3);
}
fn getLv3Table(lv3_paddr: Phys) []Lv3Entry {
    return getTable(Lv3Entry, lv3_paddr);
}
fn getLv2Table(lv2_paddr: Phys) []Lv2Entry {
    return getTable(Lv2Entry, lv2_paddr);
}
fn getLv1Table(lv1_paddr: Phys) []Lv1Entry {
    return getTable(Lv1Entry, lv1_paddr);
}

getTable() is the internal implementation that takes the type of the page table entry you want to retrieve and the physical address. Since page tables are always 4KiB-aligned, it masks out the lower 12 bits to get the base address of the page table. Then, it returns the table as a slice containing 512 entries starting from that address. The other four functions are helper functions to get the page tables of each respective level.

Next, we implement a function to get the page table entry corresponding to a given virtual address. This function takes the virtual address and the physical address of the page table as inputs:

surtr/arch/x86/page.zig
fn getEntry(T: type, vaddr: Virt, paddr: Phys) *T {
    const table = getTable(T, paddr);
    const shift = switch (T) {
        Lv4Entry => 39,
        Lv3Entry => 30,
        Lv2Entry => 21,
        Lv1Entry => 12,
        else => @compileError("Unsupported type"),
    };
    return &table[(vaddr >> shift) & 0x1FF];
}

fn getLv4Entry(addr: Virt, cr3: Phys) *Lv4Entry {
    return getEntry(Lv4Entry, addr, cr3);
}
fn getLv3Entry(addr: Virt, lv3tbl_paddr: Phys) *Lv3Entry {
    return getEntry(Lv3Entry, addr, lv3tbl_paddr);
}
fn getLv2Entry(addr: Virt, lv2tbl_paddr: Phys) *Lv2Entry {
    return getEntry(Lv2Entry, addr, lv2tbl_paddr);
}
fn getLv1Entry(addr: Virt, lv1tbl_paddr: Phys) *Lv1Entry {
    return getEntry(Lv1Entry, addr, lv1tbl_paddr);
}

The internal implementation, getEntry(), first calls the previously defined getTable() to obtain the page table as a slice. Then, following the earlier diagram, it calculates the index of the entry from the virtual address and retrieves the corresponding entry from the table. Similar to getTable(), the remaining four functions are helper functions that instantiate the generic parameter T with specific types.

With this, we are almost ready to retrieve page table entries from virtual addresses. As a helper function, add a wrapper to read the CR3 register in asm.zig:

surtr/arch/x86/asm.zig
pub inline fn readCr3() u64 {
    var cr3: u64 = undefined;
    asm volatile (
        \\mov %%cr3, %[cr3]
        : [cr3] "=r" (cr3),
    );
    return cr3;
}

Finally, the function to map a 4KiB page is shown below:

surtr/arch/x86/page.zig
const am = @import("asm.zig");

pub const PageAttribute = enum {
    /// RO
    read_only,
    /// RW
    read_write,
    /// RX
    executable,
};

pub const PageError = error{ NoMemory, NotPresent, NotCanonical, InvalidAddress, AlreadyMapped };

pub fn map4kTo(virt: Virt, phys: Phys, attr: PageAttribute, bs: *BootServices) PageError!void {
    const rw = switch (attr) {
        .read_only, .executable => false,
        .read_write => true,
    };

    const lv4ent = getLv4Entry(virt, am.readCr3());
    if (!lv4ent.present) try allocateNewTable(Lv4Entry, lv4ent, bs);

    const lv3ent = getLv3Entry(virt, lv4ent.address());
    if (!lv3ent.present) try allocateNewTable(Lv3Entry, lv3ent, bs);

    const lv2ent = getLv2Entry(virt, lv3ent.address());
    if (!lv2ent.present) try allocateNewTable(Lv2Entry, lv2ent, bs);

    const lv1ent = getLv1Entry(virt, lv2ent.address());
    if (lv1ent.present) return PageError.AlreadyMapped;
    var new_lv1ent = Lv1Entry.newMapPage(phys, true);

    new_lv1ent.rw = rw;
    lv1ent.* = new_lv1ent;
    // No need to flush TLB because the page was not present before.
}
ArgumentDescription
virtVirtual address
physPhysical address
attrAttribute of the mapped page
bsUEFI Boot Services. This is used to allocate memory for page table data.

As seen in the previous figure, it starts from CR3 and pagewalks to the Lv1 entry. If the page table does not exists, that is, lvNent.present == false, a new page table is created with allocateNewTable(). Note that this function does not assume the target virtual address is already mapped. In other words, this function only assumes that a new mapping will always be created. Thus, if there is already an existing mapping, it behaves as follows:

  • If the virtual address is already mapped to a 4KiB page: return AlreadyMapped error.
  • If a virtual address is already mapped to a page of 2MiB or more: overwrite the existing map.

Once it reaches Lv1, it maps the page to the specified physical address using newMapPage(). When it does so, it sets the rw flag according to the attributes of the page. If rw == true, it will be read/write, otherwise it will be read-only.

Note that this function does not need to flush the TLB at the end, since it is only intended to create a new mapping. If you want to modify an existing map, you need to reload CR3 or flush TLB using invlpg or other instructions.

allocateNewTable(), which allocates a new page table, is implemented as follows:

surtr/arch/x86/page.zig
pub const kib = 1024;
pub const page_size_4k = 4 * kib;

fn allocateNewTable(T: type, entry: *T, bs: *BootServices) PageError!void {
    var ptr: Phys = undefined;
    const status = bs.allocatePages(.AllocateAnyPages, .BootServicesData, 1, @ptrCast(&ptr));
    if (status != .Success) return PageError.NoMemory;

    clearPage(ptr);
    entry.* = T.newMapTable(@ptrFromInt(ptr), true);
}

fn clearPage(addr: Phys) void {
    const page_ptr: [*]u8 = @ptrFromInt(addr);
    @memset(page_ptr[0..page_size_4k], 0);
}

We use Boot Services' AllocatePages() to allocate only one page. In this case, specify BootServicesData6 as the memory type. This page table will continue to be used even after the execution is transferred from Surtr to Ymir, until she creates a new mapping. In other words, this is not an area that Ymir can free and use as she likes. As described in the later chapter, by setting the memory type to BootServicesData, Surtur is telling Ymir that this area is unavailable until she creates its own new page table.

The allocated page is zero-cleared with clearPage(). Finally, the physical address of the newly created page table is set to the entry.

Making Lv4 Table Writable

We can now map a 4KiB page. Let's check that we can actually map an appropriate virtual address. In boot.zig, map an appropriate address as follows:

surtr/boot.zig
const arch = @import("arch.zig");

arch.page.map4kTo(
    0xFFFF_FFFF_DEAD_0000,
    0x10_0000,
    .read_write,
    boot_service,
) catch |err| {
    log.err("Failed to map 4KiB page: {?}", .{err});
    return .Aborted;
};

The result looks like:

txt
[DEBUG] (surtr): Kernel ELF information:
  Entry Point         : 0x10012B0
  Is 64-bit           : 1
  # of Program Headers: 4
  # of Section Headers: 16
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000003  I:0 R:0 U:0 W:1 P:1 PK:0 SS:0 SGX:0
RIP  - 000000001E230E20, CS  - 0000000000000038, RFLAGS - 0000000000010206
RAX  - 000000001FC01FF8, RCX - 000000001E4DA103, RDX - 0007FFFFFFFFFFFF
RBX  - 0000000000000000, RSP - 000000001FE963B0, RBP - 000000001FE96420
RSI  - 0000000000000000, RDI - 000000001E284D18
R8   - 0000000000001000, R9  - 0000000000000000, R10 - 0000000000001000
R11  - 000000001FE96410, R12 - 0000000000000000, R13 - 000000001ED8D000
R14  - 0000000000000000, R15 - 000000001FEAFA20
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 000000001FC01FF8, CR3 - 000000001FC01000

A page fault has occurred 7. The address where the page fault occurred is contained in CR2, this time 0x1FC01FF8. This address matches the page pointed to by CR3, i.e., the page containing the Lv4 table. The offset 0xFF8 is the offset of the entry in the table corresponding to the specified virtual address 0xFFFFFF_FFFFFF_DEAD_0000. Let's take a look at the Lv4 table with the vmmap command of gef:

txt
Virtual address start-end              Physical address start-end             Total size   Page size   Count  Flags
...
0x000000001fc00000-0x000000001fe00000  0x000000001fc00000-0x000000001fe00000  0x200000     0x200000    1      [R-X KERN ACCESSED DIRTY]
...

As you can see from the Flags, it seems that UEFI-provided Lv4 tables are read-only. Since it is not possible to rewrite entries in the Lv4 table, we need to make the Lv4 table writable.

In order to change the attributes of the page on which the Lv4 table resides, the page table entry must be modified. However, that entry itself is now read-only and cannot be modified. It has become a chicken first or egg first kind of problem. It is a deadlock. The only way to make the Lv4 table writable is to copy the Lv4 table itself. Define a function like the following to make the Lv4 table writable:

surtr/arch/x86/page.zig
pub fn setLv4Writable(bs: *BootServices) PageError!void {
    var new_lv4ptr: [*]Lv4Entry = undefined;
    const status = bs.allocatePages(.AllocateAnyPages, .BootServicesData, 1, @ptrCast(&new_lv4ptr));
    if (status != .Success) return PageError.NoMemory;

    const new_lv4tbl = new_lv4ptr[0..num_table_entries];
    const lv4tbl = getLv4Table(am.readCr3());
    @memcpy(new_lv4tbl, lv4tbl);

    am.loadCr3(@intFromPtr(new_lv4tbl.ptr));
}

Also, add a helper function loadCr3():

surtr/arch/x86/asm.zig
pub inline fn loadCr3(cr3: u64) void {
    asm volatile (
        \\mov %[cr3], %%cr3
        :
        : [cr3] "r" (cr3),
    );
}

setLv4Writable() allocates a new page table using Boot Services and copies all entries in the Lv4 table. Finally, loadCr3() sets CR3 to the physical address of the newly allocated Lv4 table. Since reloading CR3 flushes all TLBs, the new Lv4 table will be used from now on. The newly created page table is not set to read-only. The Lv4 table is now writable.

Let's make the Lv4 table writable before we map 4KiB page in boot.zig:

surtr/boot.zig
arch.page.setLv4Writable(boot_service) catch |err| {
    log.err("Failed to set page table writable: {?}", .{err});
    return .LoadError;
};
log.debug("Set page table writable.", .{});

The hlt loop should now performed normally without a page fault. The page table at this point looks like this:

txt
Virtual address start-end              Physical address start-end             Total size   Page size   Count  Flags
0x0000000000000000-0x0000000000200000  0x0000000000000000-0x0000000000200000  0x200000     0x200000    1      [RWX KERN ACCESSED DIRTY]
...
0xffffffffdead0000-0xffffffffdead1000  0x0000000000100000-0x0000000000101000  0x1000       0x1000      1      [RWX KERN ACCESSED GLOBAL]

You can see that the specified virtual address 0xFFFFFFFFFFDEAD0000 is mapped to the physical address 0x100000.

Summary

In this chapter, we defined a structure for page table entries in 4-level paging and implemented a function to map 4KiB pages. When actually mapping 4KiB pages, a page fault occured because the Lv4 table was read-only. Therefore, we implemented a function to make the Lv4 table writable.

We are now ready to load the Ymir kernel to the requested virtual address. In the next chapter, we will implement functions to actually load the Ymir Kernel.

1

Strictly speaking, some of the fields in each entry are different, but for this series we will use the same structure for simplicity.

2

Such a structure is called an integer-backed packed struct. If the total size of the fields is different from the specified size, a compile error will happen.

3

The page table provided by UEFI is set up so that virtual and physical addresses are the same. So confusing the two will still work.

4

After all, both are u64 in Zig, and passing u64 where Phys is required will not cause an error. Passing Virt in a place that requires Phys will not give you an error as well. It is used only as an annotation to read the code.

5

Since segmentation is rarely used in x64, Linear-Address in the figure is almost synonymous with virtual address.

7

This exception handler is provided by UEFI.