Loading Kernel

In the previous chapter, we implemented support for mapping 4KiB pages. The original motivation for enabling page operations was to map virtual addresses according to the layout required by the Ymir Kernel during its loading process. In this chapter, we'll parse the Ymir Kernel's ELF file and load it into memory at the virtual addresses it expects.

important

Source code for this chapter is in whiz-surtr-parse_kernel branch.

Linker Script of Ymir
Allocating Memory for Kernel
Mapping Virtual Address
Reading and Loading Kernel Image
- Loading Segments
- Initialization of BSS Segment
Summary

Linker Script of Ymir

First, we'll define the virtual address layout for Ymir. There's no strict rule for how to organize the address layout. For example, Linux separates the virtual address space used by userland and the kernel. Even within the kernel, certain regions are mapped for .text, while others are direct-mapped to physical addresses¹.

BitVisor, on the other hand, appears to use the following virtual address layout²:

Virtual Address	Description
`0x0000000000` - `0x003FFFFFFF`	Userland Process
`0x0040000000` - `0x007FFFFFFF`	Kernel
`0x00F0000000` - `0x00FEFFFFFF`	Dynamically allocated physical memory
`0x8000000000` - `0x8FFFFFFFFF`	Statically allocated physical memory

Ymir adopts a layout similar to Linux’s, though there's no particular reason for this choice. The virtual address layout for Ymir is as follows:

Virtual Address	Description
`0xFFFF888000000000` - `0xFFFF88FFFFFFFFFF`	Direct Map. This region is mapped to physical address 0. Heap also locates here.
`0xFFFFFFFF80000000` -	Kernel Base.
`0xFFFFFFFF80100000` -	Kernel Text.

To achieve this layout, let's prepare the below linker script:

ymir/linker.ld

KERNEL_VADDR_BASE = 0xFFFFFFFF80000000;
KERNEL_VADDR_TEXT = 0xFFFFFFFF80100000;

SECTIONS {
    . = KERNEL_VADDR_TEXT;

    .text ALIGN(4K) : AT (ADDR(.text) - KERNEL_VADDR_BASE) {
        *(.text)
        *(.ltext)
    }

    .rodata ALIGN(4K) : AT (ADDR(.rodata) - KERNEL_VADDR_BASE) {
        *(.rodata)
    }

    .data ALIGN(4K) : AT (ADDR(.data) - KERNEL_VADDR_BASE) {
        *(.data)
        *(.ldata)
    }

    .bss ALIGN(4K) : AT (ADDR(.bss) - KERNEL_VADDR_BASE) {
        *(COMMON)
        *(.bss)
        *(.lbss)
    }
}

With this linker script, all sections are placed starting from the virtual address 0xFFFFFFFF80100000. Additionally, each section is mapped to a physical address by subtracting 0xFFFFFFFF80000000 from its virtual address.

note

For more details on Ymir's linker script and segment layout, see the Booting Kernel chapter.

To include the linker script in the build, write build.zig as follows:

build.zig

ymir.linker_script = b.path("ymir/linker.ld");

Let's see the segments of generated ELF file:

bash

> readelf --segment ./zig-out/bin/ymir.elf

Elf file type is EXEC (Executable file)
Entry point 0xffffffff80100000
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000001000 0xffffffff80100000 0x0000000000100000
                 0x0000000000000003 0x0000000000000003  R E    0x1000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000001000000  RW     0x0

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01

Since the Ymir kernel currently does almost nothing, even the .text section is only 3 bytes long. However, we can confirm that the entry point is at 0xFFFFFFFF80100000, and that the segment is mapped to the physical address 0x100000. It seems everything is placed as intended.

warning

Ymir adopts a layout similar to Linux. There’s no particular reason for this choice - it just felt more intuitive for those who have some experience with Linux. However, adopting a layout similar to Linux might actually be a poor choice.

That's because if we adopt a layout similar to Linux, it becomes difficult to tell whether a given address belongs to Ymir or to the guest Linux. For example, if you set a breakpoint at the address 0xFFFFFFFF80100000, it will be hit regardless of whether Ymir or the guest is running. After hitting the breakpoint, figuring out which was actually running can be a bit of a hassle. Given that, it might actually be better to use a memory region that is guaranteed not to overlap with Linux from the start for debugging purposes .

Allocating Memory for Kernel

Now that the layout of the Ymir kernel has been decided, the next step is to prepare for loading the kernel into memory. Specifically, we need to calculate how much memory is required to load the kernel and allocate that amount of memory accordingly.

The kernel size is calculated from the kernel's memory map obtained by parsing the ELF file. First, we create an iterator for the ELF segment headers (program headers). Since Zig's standard library already provides an iterator for segment headers, we will use that:

surtr/boot.zig

const Addr = elf.Elf64_Addr;
var kernel_start_virt: Addr = std.math.maxInt(Addr);
var kernel_start_phys: Addr align(page_size) = std.math.maxInt(Addr);
var kernel_end_phys: Addr = 0;

var iter = elf_header.program_header_iterator(kernel);

Prepare variables to record the minimum and maximum physical addresses where the kernel will be placed, as well as the minimum virtual address. The segment header iterator can be created using std.elf.Header.program_header_iterator(). Using this iterator, iterate through the segment headers and calculate the minimum and maximum addresses required by the kernel:

surtr/boot.zig

while (true) {
    const phdr = iter.next() catch |err| {
        log.err("Failed to get program header: {?}\n", .{err});
        return .LoadError;
    } orelse break;
    if (phdr.p_type != elf.PT_LOAD) continue;
    if (phdr.p_paddr < kernel_start_phys) kernel_start_phys = phdr.p_paddr;
    if (phdr.p_vaddr < kernel_start_virt) kernel_start_virt = phdr.p_vaddr;
    if (phdr.p_paddr + phdr.p_memsz > kernel_end_phys) kernel_end_phys = phdr.p_paddr + phdr.p_memsz;
}

When the segment type is PT_LOAD³, update the currently known minimum and maximum segment addresses accordingly.

Then we calculate the memory size required:

surtr/boot.zig

const pages_4kib = (kernel_end_phys - kernel_start_phys + (page_size - 1)) / page_size;
log.info("Kernel image: 0x{X:0>16} - 0x{X:0>16} (0x{X} pages)", .{ kernel_start_phys, kernel_end_phys, pages_4kib });

Calculate the number of required 4KiB pages from the difference between the minimum and maximum segment addresses. The expression (A - B + (C - 1)) / C rounds up (A - B) / C when the remainder is not zero.

You'll see the below output:

txt

[INFO ] (surtr): Initialized bootloader log.
[INFO ] (surtr): Got boot services.
[INFO ] (surtr): Located simple file system protocol.
[INFO ] (surtr): Opened filesystem volume.
[INFO ] (surtr): Opened kernel file.
[INFO ] (surtr): Parsed kernel ELF header.
[INFO ] (surtr): Kernel image: 0x0000000000100000 - 0x0000000000100003 (0x1 pages)

Currently, Ymir's segment size is only 3 bytes (for kernelEntry()), so the output might look odd at first glance. However, it matches the results obtained by reading the segment headers with readelf, confirming that the required page size is correctly calculated as one page.

Finally, allocate memory for the number of pages calculated.

surtr/boot.zig

status = boot_service.allocatePages(.AllocateAddress, .LoaderData, pages_4kib, @ptrCast(&kernel_start_phys));
if (status != .Success) {
    log.err("Failed to allocate memory for kernel image: {?}", .{status});
    return status;
}
log.info("Allocated memory for kernel image @ 0x{X:0>16} ~ 0x{X:0>16}", .{ kernel_start_phys, kernel_start_phys + pages_4kib * page_size });

As in the previous chapter, we use Boot Services' AllocatePages() to allocate pages. However, here we specify .AllocateAddress for the first argument alloc_type to allocate memory exactly at the address given in the fourth argument. This address is the start address of the segment we calculated earlier. If allocation at the specified physical address fails, an error is returned.

Mapping Virtual Address

Since memory has been successfully allocated at the "physical address" requested by the kernel, the next step is to map the requested "virtual address" to the allocated physical address. We previously implemented a function to map 4KiB pages in the Simple Page Table chapter, so we will use that function to map the pages:

surtr/boot.zig

for (0..pages_4kib) |i| {
    arch.page.map4kTo(
        kernel_start_virt + page_size * i,
        kernel_start_phys + page_size * i,
        .read_write,
        boot_service,
    ) catch |err| {
        log.err("Failed to map memory for kernel image: {?}", .{err});
        return .LoadError;
    };
}
log.info("Mapped memory for kernel image.", .{});

Repeat the page mapping for the number of allocated 4KiB pages (pages_4kib). Ideally, the pages should be mapped with the attributes specified by the segment header (such as read-only), but for simplicity, here we map all pages with .read_write.

When you run it and check the memory map, it looks like the following:

txt

Virtual address start-end              Physical address start-end             Total size   Page size   Count  Flags
0x0000000000000000-0x0000000000200000  0x0000000000000000-0x0000000000200000  0x200000     0x200000    1      [RWX KERN ACCESSED DIRTY]
0x0000000000200000-0x0000000000800000  0x0000000000200000-0x0000000000800000  0x600000     0x200000    3      [RWX KERN ACCESSED]
...
0xffffffff80100000-0xffffffff80101000  0x0000000000100000-0x0000000000101000  0x1000       0x1000      1      [RWX KERN GLOBAL]

You can see that one page starting at virtual address 0xFFFFFFFF80100000 is mapped to the physical address 0x0000000000100000. It's as expected. Nice.

Reading and Loading Kernel Image

In the Parsing Kernel chapter, only the ELF header of Ymir was loaded into memory. Now, we will load the entire kernel segments into the prepared memory.

Loading Segments

First, just like when we calculated the required memory size earlier, we start by creating an iterator for the segment headers:

surtr/boot.zig

log.info("Loading kernel image...", .{});
iter = elf_header.program_header_iterator(kernel);
while (true) {
    const phdr = iter.next() catch |err| {
        log.err("Failed to get program header: {?}\n", .{err});
        return .LoadError;
    } orelse break;
    if (phdr.p_type != elf.PT_LOAD) continue;

    ...
}

Only the PT_LOAD segments need to be loaded; all others are skipped. Next, the segments are read from the file system into memory:

surtr/boot.zig

status = kernel.setPosition(phdr.p_offset);
if (status != .Success) {
    log.err("Failed to set position for kernel image.", .{});
    return status;
}
const segment: [*]u8 = @ptrFromInt(phdr.p_vaddr);
var mem_size = phdr.p_memsz;
status = kernel.read(&mem_size, segment);
if (status != .Success) {
    log.err("Failed to read kernel image.", .{});
    return status;
}
log.info(
    "  Seg @ 0x{X:0>16} - 0x{X:0>16}",
    .{ phdr.p_vaddr, phdr.p_vaddr + phdr.p_memsz },
);

Here, kernel is a *uefi.protocol.File created in the Parsing Kernel chapter. After seeking to the segment's start offset using setPosition(), we read the segment from the file directly into the virtual address requested by the segment header. It's very straightforward.

note

When working on paging-related code, it's very likely to confuse virtual addresses with physical addresses. However, in Surtr, you actually don't need to be overly concerned about distinguishing between the two. This is because the mappings provided by UEFI direct map virtual addresses to the same physical addresses, making virtual and physical addresses effectively identical.

Note that the memory where the kernel is being loaded has just been newly mapped. However, the direct mapping is still enabled. Therefore, the physical address can be accessed via two different virtual addresses. Try changing phdr.p_vaddr (the newly created mapping) in the code above to phdr.p_paddr (the old direct mapping). It should work without any issues.

Initialization of BSS Segment

The .bss section is a segment that is zero-initialized when loaded. Since the segment is known to be zero-initialized, the .bss section's data is not included in the ELF file. When loading segments, you need to allocate memory for the .bss section and initialize it to zero. Since the memory is already allocated, here we only need to perform zero initialization:

surtr/boot.zig

const zero_count = phdr.p_memsz - phdr.p_filesz;
if (zero_count > 0) {
    boot_service.setMem(@ptrFromInt(phdr.p_vaddr + phdr.p_filesz), zero_count, 0);
}

For zero-filling, you could use Zig's @memset() function. But now I chose to use UEFI's SetMem(). Yes, no reason. We've completed the initialization of the .bss section⁴. Well, technically, the current Ymir kernel doesn’t have a .bss section yet, so nothing actually happens here...

Summary

In this chapter, we calculated the amount of memory needed to load the Ymir Kernel and allocated that much physical memory. Then, we mapped the virtual addresses as required by the kernel and loaded the kernel into that mapped memory.

With this, we are finally ready to run Ymir. We could jump straight into Ymir now, but in the next chapter, we'll do some cleanup before diving into the kernel.

Complete virtual memory map with 4-level page tables - kernel.org

Memory map of BitVisor

Executable and Linking Format (ELF) Specification Version 1.2

⁴

Strictly speaking, this method zeroes out not only the .bss section but also other segments. For example, if the .text segment size is 0x800 bytes, the segment size is aligned to 4KiB, leaving a 0x800 byte gap after the .text section. With the current approach, this gap is also zeroed out (which isn't necessarily a bad thing).

Writing Hypervisor in Zig

Loading Kernel

Table of Contents

Linker Script of Ymir

Allocating Memory for Kernel

Mapping Virtual Address

Reading and Loading Kernel Image

Loading Segments

Initialization of BSS Segment

Summary