Loading Kernel
In the previous chapter, we implemented support for mapping 4KiB pages. The original motivation for enabling page operations was to map virtual addresses according to the layout required by the Ymir Kernel during its loading process. In this chapter, we'll parse the Ymir Kernel's ELF file and load it into memory at the virtual addresses it expects.
important
Source code for this chapter is in whiz-surtr-parse_kernel
branch.
Table of Contents
- Linker Script of Ymir
- Allocating Memory for Kernel
- Mapping Virtual Address
- Reading and Loading Kernel Image
- Summary
Linker Script of Ymir
First, we'll define the virtual address layout for Ymir. There's no strict rule for how to organize the address layout. For example, Linux separates the virtual address space used by userland and the kernel. Even within the kernel, certain regions are mapped for .text
, while others are direct-mapped to physical addresses1.
BitVisor, on the other hand, appears to use the following virtual address layout2:
Virtual Address | Description |
---|---|
0x0000000000 - 0x003FFFFFFF | Userland Process |
0x0040000000 - 0x007FFFFFFF | Kernel |
0x00F0000000 - 0x00FEFFFFFF | Dynamically allocated physical memory |
0x8000000000 - 0x8FFFFFFFFF | Statically allocated physical memory |
Ymir adopts a layout similar to Linux’s, though there's no particular reason for this choice. The virtual address layout for Ymir is as follows:
Virtual Address | Description |
---|---|
0xFFFF888000000000 - 0xFFFF88FFFFFFFFFF | Direct Map. This region is mapped to physical address 0. Heap also locates here. |
0xFFFFFFFF80000000 - | Kernel Base. |
0xFFFFFFFF80100000 - | Kernel Text. |
To achieve this layout, let's prepare the below linker script:
KERNEL_VADDR_BASE = 0xFFFFFFFF80000000;
KERNEL_VADDR_TEXT = 0xFFFFFFFF80100000;
SECTIONS {
. = KERNEL_VADDR_TEXT;
.text ALIGN(4K) : AT (ADDR(.text) - KERNEL_VADDR_BASE) {
*(.text)
*(.ltext)
}
.rodata ALIGN(4K) : AT (ADDR(.rodata) - KERNEL_VADDR_BASE) {
*(.rodata)
}
.data ALIGN(4K) : AT (ADDR(.data) - KERNEL_VADDR_BASE) {
*(.data)
*(.ldata)
}
.bss ALIGN(4K) : AT (ADDR(.bss) - KERNEL_VADDR_BASE) {
*(COMMON)
*(.bss)
*(.lbss)
}
}
With this linker script, all sections are placed starting from the virtual address 0xFFFFFFFF80100000
. Additionally, each section is mapped to a physical address by subtracting 0xFFFFFFFF80000000
from its virtual address.
note
For more details on Ymir's linker script and segment layout, see the Booting Kernel chapter.
To include the linker script in the build, write build.zig
as follows:
ymir.linker_script = b.path("ymir/linker.ld");
Let's see the segments of generated ELF file:
> readelf --segment ./zig-out/bin/ymir.elf
Elf file type is EXEC (Executable file)
Entry point 0xffffffff80100000
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000001000 0xffffffff80100000 0x0000000000100000
0x0000000000000003 0x0000000000000003 R E 0x1000
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000001000000 RW 0x0
Section to Segment mapping:
Segment Sections...
00 .text
01
Since the Ymir kernel currently does almost nothing, even the .text
section is only 3 bytes long. However, we can confirm that the entry point is at 0xFFFFFFFF80100000
, and that the segment is mapped to the physical address 0x100000
. It seems everything is placed as intended.
warning
Ymir adopts a layout similar to Linux. There’s no particular reason for this choice - it just felt more intuitive for those who have some experience with Linux. However, adopting a layout similar to Linux might actually be a poor choice.
That's because if we adopt a layout similar to Linux, it becomes difficult to tell whether a given address belongs to Ymir or to the guest Linux. For example, if you set a breakpoint at the address 0xFFFFFFFF80100000
, it will be hit regardless of whether Ymir or the guest is running. After hitting the breakpoint, figuring out which was actually running can be a bit of a hassle. Given that, it might actually be better to use a memory region that is guaranteed not to overlap with Linux from the start for debugging purposes .
Allocating Memory for Kernel
Now that the layout of the Ymir kernel has been decided, the next step is to prepare for loading the kernel into memory. Specifically, we need to calculate how much memory is required to load the kernel and allocate that amount of memory accordingly.
The kernel size is calculated from the kernel's memory map obtained by parsing the ELF file. First, we create an iterator for the ELF segment headers (program headers). Since Zig's standard library already provides an iterator for segment headers, we will use that:
const Addr = elf.Elf64_Addr;
var kernel_start_virt: Addr = std.math.maxInt(Addr);
var kernel_start_phys: Addr align(page_size) = std.math.maxInt(Addr);
var kernel_end_phys: Addr = 0;
var iter = elf_header.program_header_iterator(kernel);
Prepare variables to record the minimum and maximum physical addresses where the kernel will be placed, as well as the minimum virtual address. The segment header iterator can be created using std.elf.Header.program_header_iterator()
. Using this iterator, iterate through the segment headers and calculate the minimum and maximum addresses required by the kernel:
while (true) {
const phdr = iter.next() catch |err| {
log.err("Failed to get program header: {?}\n", .{err});
return .LoadError;
} orelse break;
if (phdr.p_type != elf.PT_LOAD) continue;
if (phdr.p_paddr < kernel_start_phys) kernel_start_phys = phdr.p_paddr;
if (phdr.p_vaddr < kernel_start_virt) kernel_start_virt = phdr.p_vaddr;
if (phdr.p_paddr + phdr.p_memsz > kernel_end_phys) kernel_end_phys = phdr.p_paddr + phdr.p_memsz;
}
When the segment type is PT_LOAD
3, update the currently known minimum and maximum segment addresses accordingly.
Then we calculate the memory size required:
const pages_4kib = (kernel_end_phys - kernel_start_phys + (page_size - 1)) / page_size;
log.info("Kernel image: 0x{X:0>16} - 0x{X:0>16} (0x{X} pages)", .{ kernel_start_phys, kernel_end_phys, pages_4kib });
Calculate the number of required 4KiB pages from the difference between the minimum and maximum segment addresses. The expression (A - B + (C - 1)) / C
rounds up (A - B) / C
when the remainder is not zero.
You'll see the below output:
[INFO ] (surtr): Initialized bootloader log.
[INFO ] (surtr): Got boot services.
[INFO ] (surtr): Located simple file system protocol.
[INFO ] (surtr): Opened filesystem volume.
[INFO ] (surtr): Opened kernel file.
[INFO ] (surtr): Parsed kernel ELF header.
[INFO ] (surtr): Kernel image: 0x0000000000100000 - 0x0000000000100003 (0x1 pages)
Currently, Ymir's segment size is only 3 bytes (for kernelEntry()
), so the output might look odd at first glance. However, it matches the results obtained by reading the segment headers with readelf
, confirming that the required page size is correctly calculated as one page.
Finally, allocate memory for the number of pages calculated.
status = boot_service.allocatePages(.AllocateAddress, .LoaderData, pages_4kib, @ptrCast(&kernel_start_phys));
if (status != .Success) {
log.err("Failed to allocate memory for kernel image: {?}", .{status});
return status;
}
log.info("Allocated memory for kernel image @ 0x{X:0>16} ~ 0x{X:0>16}", .{ kernel_start_phys, kernel_start_phys + pages_4kib * page_size });
As in the previous chapter, we use Boot Services' AllocatePages() to allocate pages. However, here we specify .AllocateAddress
for the first argument alloc_type
to allocate memory exactly at the address given in the fourth argument. This address is the start address of the segment we calculated earlier. If allocation at the specified physical address fails, an error is returned.
Mapping Virtual Address
Since memory has been successfully allocated at the "physical address" requested by the kernel, the next step is to map the requested "virtual address" to the allocated physical address. We previously implemented a function to map 4KiB pages in the Simple Page Table chapter, so we will use that function to map the pages:
for (0..pages_4kib) |i| {
arch.page.map4kTo(
kernel_start_virt + page_size * i,
kernel_start_phys + page_size * i,
.read_write,
boot_service,
) catch |err| {
log.err("Failed to map memory for kernel image: {?}", .{err});
return .LoadError;
};
}
log.info("Mapped memory for kernel image.", .{});
Repeat the page mapping for the number of allocated 4KiB pages (pages_4kib
). Ideally, the pages should be mapped with the attributes specified by the segment header (such as read-only), but for simplicity, here we map all pages with .read_write
.
When you run it and check the memory map, it looks like the following:
Virtual address start-end Physical address start-end Total size Page size Count Flags
0x0000000000000000-0x0000000000200000 0x0000000000000000-0x0000000000200000 0x200000 0x200000 1 [RWX KERN ACCESSED DIRTY]
0x0000000000200000-0x0000000000800000 0x0000000000200000-0x0000000000800000 0x600000 0x200000 3 [RWX KERN ACCESSED]
...
0xffffffff80100000-0xffffffff80101000 0x0000000000100000-0x0000000000101000 0x1000 0x1000 1 [RWX KERN GLOBAL]
You can see that one page starting at virtual address 0xFFFFFFFF80100000
is mapped to the physical address 0x0000000000100000
. It's as expected. Nice.
Reading and Loading Kernel Image
In the Parsing Kernel chapter, only the ELF header of Ymir was loaded into memory. Now, we will load the entire kernel segments into the prepared memory.
Loading Segments
First, just like when we calculated the required memory size earlier, we start by creating an iterator for the segment headers:
log.info("Loading kernel image...", .{});
iter = elf_header.program_header_iterator(kernel);
while (true) {
const phdr = iter.next() catch |err| {
log.err("Failed to get program header: {?}\n", .{err});
return .LoadError;
} orelse break;
if (phdr.p_type != elf.PT_LOAD) continue;
...
}
Only the PT_LOAD
segments need to be loaded; all others are skipped. Next, the segments are read from the file system into memory:
status = kernel.setPosition(phdr.p_offset);
if (status != .Success) {
log.err("Failed to set position for kernel image.", .{});
return status;
}
const segment: [*]u8 = @ptrFromInt(phdr.p_vaddr);
var mem_size = phdr.p_memsz;
status = kernel.read(&mem_size, segment);
if (status != .Success) {
log.err("Failed to read kernel image.", .{});
return status;
}
log.info(
" Seg @ 0x{X:0>16} - 0x{X:0>16}",
.{ phdr.p_vaddr, phdr.p_vaddr + phdr.p_memsz },
);
Here, kernel
is a *uefi.protocol.File
created in the Parsing Kernel chapter. After seeking to the segment's start offset using setPosition()
, we read the segment from the file directly into the virtual address requested by the segment header. It's very straightforward.
note
When working on paging-related code, it's very likely to confuse virtual addresses with physical addresses. However, in Surtr, you actually don't need to be overly concerned about distinguishing between the two. This is because the mappings provided by UEFI direct map virtual addresses to the same physical addresses, making virtual and physical addresses effectively identical.
Note that the memory where the kernel is being loaded has just been newly mapped. However, the direct mapping is still enabled. Therefore, the physical address can be accessed via two different virtual addresses. Try changing phdr.p_vaddr
(the newly created mapping) in the code above to phdr.p_paddr
(the old direct mapping). It should work without any issues.
Initialization of BSS Segment
The .bss
section is a segment that is zero-initialized when loaded. Since the segment is known to be zero-initialized, the .bss
section's data is not included in the ELF file. When loading segments, you need to allocate memory for the .bss
section and initialize it to zero. Since the memory is already allocated, here we only need to perform zero initialization:
const zero_count = phdr.p_memsz - phdr.p_filesz;
if (zero_count > 0) {
boot_service.setMem(@ptrFromInt(phdr.p_vaddr + phdr.p_filesz), zero_count, 0);
}
For zero-filling, you could use Zig's @memset()
function. But now I chose to use UEFI's SetMem(). Yes, no reason. We've completed the initialization of the .bss
section4. Well, technically, the current Ymir kernel doesn’t have a .bss
section yet, so nothing actually happens here...
Summary
In this chapter, we calculated the amount of memory needed to load the Ymir Kernel and allocated that much physical memory. Then, we mapped the virtual addresses as required by the kernel and loaded the kernel into that mapped memory.
With this, we are finally ready to run Ymir. We could jump straight into Ymir now, but in the next chapter, we'll do some cleanup before diving into the kernel.
Strictly speaking, this method zeroes out not only the .bss
section but also other segments. For example, if the .text
segment size is 0x800
bytes, the segment size is aligned to 4KiB, leaving a 0x800
byte gap after the .text
section. With the current approach, this gap is also zeroed out (which isn't necessarily a bad thing).