Parsing Kernel ELF
In this chapter, we read the host OS, Ymir, from the UEFI file system. The ELF file is parsed to obtain the memory layout required by Ymir. Normally, we would like to load Ymir directly into memory and transfer the execution to her. But to do so, we need to organize a page table to map the virtual pages required by ELF to the physical addresses. The page table operation will be implemented in the next chapter. In this chapter we will do up to the parsing of ELF of Ymir kernel.
important
Source code for this chapter is in whiz-surtr-parse_kernel
branch.
Table of Contents
Building a Skeleton for Ymir
To load the Ymir kernel, we obviously need to build its ELF file. As a first step, we'll create a minimal skeleton of Ymir that does nothing, just to make sure we can successfully build the ELF file.
Create a ymir
directory and add the following to ymir/main.zig
:
export fn kernelEntry() callconv(.Naked) noreturn {
while (true)
asm volatile ("hlt");
}
The kernelEntry()
function serves as the entry point for Ymir, where control is transferred from Surtr. Since this function never returns, its return type is set to noreturn
.
In Zig, you can specify a function's calling convention using callconv()
.Since the UEFI calling convention is the same as Windows, we would normally use .Win64
. However, this function is intended to be a trampoline that switches to the kernel's stack and then calls the actual main function. For that reason, we temporarily use .Naked
here. The .Naked
calling convention is ideal for trampoline code because it avoids generating function prologues, epilogues, or any register management.
Now that we've created the basic skeleton of Ymir, let's set up the build configuration:
const ymir_target = b.resolveTargetQuery(.{
.cpu_arch = .x86_64,
.os_tag = .freestanding,
.ofmt = .elf,
});
const ymir = b.addExecutable(.{
.name = "ymir.elf",
.root_source_file = b.path("ymir/main.zig"),
.target = ymir_target, // Freestanding x64 ELF executable
.optimize = optimize, // You can choose the optimization level.
.linkage = .static,
.code_model = .kernel,
});
ymir.entry = .{ .symbol_name = "kernelEntry" };
b.installArtifact(ymir);
In the .target
field, we specify .freestanding
as the OS tag. We also set the .code_model
to .kernel
to define the code model to use. Code models affect how relocation information is generated. Other options include .small
and .medium
. As we’ll cover in a later chapter, Ymir’s address layout is modeled after Linux. Ymir is placed around 0xFFFF888000000000
. If .kernel
is not specified, the relocation information may exceed what fits in the default code model, resulting in errors. Although defining an address layout requires a linker script, we won’t write one just yet. Finally, we complete the setup by specifying kernelEntry()
as the entry point.
At this point, running zig build
will generate zig-out/bin/ymir.elf
. Let's take a look at the headers using readelf
:
> readelf -h ./zig-out/bin/ymir.elf
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1001120
Start of program headers: 64 (bytes into file)
Start of section headers: 5216 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 4
Size of section headers: 64 (bytes)
Number of section headers: 13
Section header string table index: 11
As expected, a proper 64-bit ELF file is generated. The entry point is at 0x1001120
, and if you examine the surrounding area with objdump
, you can confirm that the previously defined kernelEntry()
function is present:
> objdump -D ./zig-out/bin/ymir.elf | grep 1001120 -n3
7:0000000001001120 <kernelEntry>:
8: 1001120: f4 hlt
9: 1001121: eb fd jmp 1001120 <kernelEntry>
With this, the ELF file for Ymir, which will be loaded by Surtr, has been successfully generated. Let's also set up the configuration to place the generated Ymir onto the EFI file system:
const install_ymir = b.addInstallFile(
ymir.getEmittedBin(),
b.fmt("{s}/{s}", .{ out_dir_name, ymir.name }),
);
install_ymir.step.dependOn(&ymir.step);
b.getInstallStep().dependOn(&install_ymir.step);
This setup is almost the same as the configuration for installing Surtr. As a result, Ymir will be copied to zig-out/img/ymir.elf
.
Loading Kernel Headers
To access files on the filesystem from Surtr, we use the Simple File System Protocol. Until Surtr, UEFI application, explicitly calls exit, it has access to a set of functions provided by UEFI called Boot Services. A pointer to Boot Services can be obtained from the EFI System Table, just like we did previously to get the Simple Text Output Protocol for logging:
const boot_service: *uefi.tables.BootServices = uefi.system_table.boot_services orelse {
log.err("Failed to get boot services.", .{});
return .Aborted;
};
log.info("Got boot services.", .{});
From the obtained Boot Services, we retrieve the Simple File System Protocol:
var fs: *uefi.protocol.SimpleFileSystem = undefined;
status = boot_service.locateProtocol(&uefi.protocol.SimpleFileSystem.guid, null, @ptrCast(&fs));
if (status != .Success) {
log.err("Failed to locate simple file system protocol.", .{});
return status;
}
log.info("Located simple file system protocol.", .{});
Next, using the Simple File System Protocol, we open the root directory of the filesystem:
var root_dir: *uefi.protocol.File = undefined;
status = fs.openVolume(&root_dir);
if (status != .Success) {
log.err("Failed to open volume.", .{});
return status;
}
log.info("Opened filesystem volume.", .{});
info
In Zig, unlike in C, you cannot declare a variable without also initializing it. Every variable must be assigned a value at the time of declaration. If you want to assign an uninitialized value, you can use undefined
. When a variable is initialized with undefined
, its actual contents are filled with 0xAA
in Debug mode, while in other build modes, the value is truly undefined. There is no way to check whether a variable was initialized with undefined
or not.
Opening a File
Next, we open the Ymir ELF file. To open a file, we use the open() function. The filename must be specified using UCS-2, just like with the previous logging output. Since we'll open files using the Simple File System Protocol multiple times, it's helpful to create a utility function to convert strings to UCS-2:
inline fn toUcs2(comptime s: [:0]const u8) [s.len * 2:0]u16 {
var ucs2: [s.len * 2:0]u16 = [_:0]u16{0} ** (s.len * 2);
for (s, 0..) |c, i| {
ucs2[i] = c;
ucs2[i + 1] = 0;
}
return ucs2;
}
Since the filename to open is fixed at compile time, the argument is declared as comptime s
. This allows us to use s.len
in the return type. Because converting to UCS-2 doubles the byte length, the return type here is [s.len * 2:0]u16
. Inside the function, the conversion is straightforward - just like before, it inserts a \0
after each ASCII byte.
Using this function, let's create a function to open a file:
fn openFile(
root: *uefi.protocol.File,
comptime name: [:0]const u8,
) !*uefi.protocol.File {
var file: *uefi.protocol.File = undefined;
const status = root.open(
&file,
&toUcs2(name),
uefi.protocol.File.efi_file_mode_read,
0,
);
if (status != .Success) {
log.err("Failed to open file: {s}", .{name});
return error.Aborted;
}
return file;
}
We open the file using root.open()
. The third argument specifies the file mode; since writing is not needed, read-only mode is sufficient. The fourth argument sets the file attributes when creating a new file, but since we're only opening an existing file here, it is unused and can be set to 0
.
Using this function, you can open the kernel as follows:
const kernel = openFile(root_dir, "ymir.elf") catch return .Aborted;
log.info("Opened kernel file.", .{});
Reading a File
Now that the Ymir ELF file is opened, we proceed to read it from the filesystem into memory. An ELF file always begins with an ELF Header. First, let's read just this header and parse it.
To allocate memory for reading the file, we use the AllocatePool() function provided by Memory Allocation Services in Boot Services:
var header_size: usize = @sizeOf(elf.Elf64_Ehdr);
var header_buffer: [*]align(8) u8 = undefined;
status = boot_service.allocatePool(.LoaderData, header_size, &header_buffer);
if (status != .Success) {
log.err("Failed to allocate memory for kernel ELF header.", .{});
return status;
}
The first argument of allocatePool()
specifies the memory type1 to allocate. In this case, we request memory of type LoaderData
, which is intended for UEFI application data2. The size of the ELF header is fixed and matches the size of the std.elf.Elf64_Ehdr
structure. We allocate memory of this size. Note that header_size
is defined as a var
because it will also be used later to store the actual number of bytes read during the file read operation.
With the memory allocated for reading, let's proceed to actually read the file:
status = kernel.read(&header_size, header_buffer);
if (status != .Success) {
log.err("Failed to read kernel ELF header.", .{});
return status;
}
At this point, we have successfully read the ELF header of the Ymir kernel. The size of the data read is stored in header_size
. Please try running this on QEMU to verify that it works correctly.
Parsing ELF Header
Finally, we parse the loaded ELF header. The structure of the ELF header is quite simple, so you could write your own parser. However, as we saw earlier, Zig provides the std.elf.Elf64_Ehdr
struct to represent the ELF header. We will use this struct here3:
const elf_header = elf.Header.parse(header_buffer[0..@sizeOf(elf.Elf64_Ehdr)]) catch |err| {
log.err("Failed to parse kernel ELF header: {?}", .{err});
return .Aborted;
};
log.info("Parsed kernel ELF header.", .{});
That's all it takes. Simple, isn't it? To verify that the parsing was done correctly, let's output some of the fields:
log.debug(
\\Kernel ELF information:
\\ Entry Point : 0x{X}
\\ Is 64-bit : {d}
\\ # of Program Headers: {d}
\\ # of Section Headers: {d}
,
.{
elf_header.entry,
@intFromBool(elf_header.is_64),
elf_header.phnum,
elf_header.shnum,
},
);
The output will look like this:
[INFO ] (surtr): Initialized bootloader log.
[INFO ] (surtr): Got boot services.
[INFO ] (surtr): Located simple file system protocol.
[INFO ] (surtr): Opened filesystem volume.
[INFO ] (surtr): Opened kernel file.
[INFO ] (surtr): Parsed kernel ELF header.
[DEBUG] (surtr): Kernel ELF information:
Entry Point : 0x10012B0
Is 64-bit : 1
# of Program Headers: 4
# of Section Headers: 16
You can verify the correctness of these values by comparing them with the output of readelf -h
on zig-out/bin/ymir.elf
.
Summary
In this chapter, we created a skeleton of the Ymir kernel and loaded the generated ELF file from the UEFI filesystem into memory. We also parsed the Ymir ELF header using Zig std library. Next, we need to parse the ELF program headers and load each segment into the virtual addresses specified by the ELF file. However, to map these requested virtual addresses to physical addresses, we must set up page tables. In the next chapter, we will implement page table management.
LoaderData
is also the default memory type for UEFI applications.
Surtr/Ymir has no external dependencies whatsoever. However, we use what Zig provides without hesitation. If you prefer not to use even those, try writing your own ELF parser. It can be quite educational.