Parsing Kernel ELF

In this chapter, we read the host OS, Ymir, from the UEFI file system. The ELF file is parsed to obtain the memory layout required by Ymir. Normally, we would like to load Ymir directly into memory and transfer the execution to her. But to do so, we need to organize a page table to map the virtual pages required by ELF to the physical addresses. The page table operation will be implemented in the next chapter. In this chapter we will do up to the parsing of ELF of Ymir kernel.

important

Source code for this chapter is in whiz-surtr-parse_kernel branch.

Table of Contents

Building a Skeleton for Ymir

To load the Ymir kernel, we obviously need to build its ELF file. As a first step, we'll create a minimal skeleton of Ymir that does nothing, just to make sure we can successfully build the ELF file.

Create a ymir directory and add the following to ymir/main.zig:

ymir/main.zig
export fn kernelEntry() callconv(.Naked) noreturn {
    while (true)
        asm volatile ("hlt");
}

The kernelEntry() function serves as the entry point for Ymir, where control is transferred from Surtr. Since this function never returns, its return type is set to noreturn.

In Zig, you can specify a function's calling convention using callconv().Since the UEFI calling convention is the same as Windows, we would normally use .Win64. However, this function is intended to be a trampoline that switches to the kernel's stack and then calls the actual main function. For that reason, we temporarily use .Naked here. The .Naked calling convention is ideal for trampoline code because it avoids generating function prologues, epilogues, or any register management.

Now that we've created the basic skeleton of Ymir, let's set up the build configuration:

build.zig
const ymir_target = b.resolveTargetQuery(.{
    .cpu_arch = .x86_64,
    .os_tag = .freestanding,
    .ofmt = .elf,
});
const ymir = b.addExecutable(.{
    .name = "ymir.elf",
    .root_source_file = b.path("ymir/main.zig"),
    .target = ymir_target, // Freestanding x64 ELF executable
    .optimize = optimize, // You can choose the optimization level.
    .linkage = .static,
    .code_model = .kernel,
});
ymir.entry = .{ .symbol_name = "kernelEntry" };
b.installArtifact(ymir);

In the .target field, we specify .freestanding as the OS tag. We also set the .code_model to .kernel to define the code model to use. Code models affect how relocation information is generated. Other options include .small and .medium. As we’ll cover in a later chapter, Ymir’s address layout is modeled after Linux. Ymir is placed around 0xFFFF888000000000. If .kernel is not specified, the relocation information may exceed what fits in the default code model, resulting in errors. Although defining an address layout requires a linker script, we won’t write one just yet. Finally, we complete the setup by specifying kernelEntry() as the entry point.

At this point, running zig build will generate zig-out/bin/ymir.elf. Let's take a look at the headers using readelf:

sh
> readelf -h ./zig-out/bin/ymir.elf

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1001120
  Start of program headers:          64 (bytes into file)
  Start of section headers:          5216 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         4
  Size of section headers:           64 (bytes)
  Number of section headers:         13
  Section header string table index: 11

As expected, a proper 64-bit ELF file is generated. The entry point is at 0x1001120, and if you examine the surrounding area with objdump, you can confirm that the previously defined kernelEntry() function is present:

sh
> objdump -D ./zig-out/bin/ymir.elf | grep 1001120 -n3

7:0000000001001120 <kernelEntry>:
8: 1001120:     f4                      hlt
9: 1001121:     eb fd                   jmp    1001120 <kernelEntry>

With this, the ELF file for Ymir, which will be loaded by Surtr, has been successfully generated. Let's also set up the configuration to place the generated Ymir onto the EFI file system:

build.zig
const install_ymir = b.addInstallFile(
    ymir.getEmittedBin(),
    b.fmt("{s}/{s}", .{ out_dir_name, ymir.name }),
);
install_ymir.step.dependOn(&ymir.step);
b.getInstallStep().dependOn(&install_ymir.step);

This setup is almost the same as the configuration for installing Surtr. As a result, Ymir will be copied to zig-out/img/ymir.elf.

Loading Kernel Headers

To access files on the filesystem from Surtr, we use the Simple File System Protocol. Until Surtr, UEFI application, explicitly calls exit, it has access to a set of functions provided by UEFI called Boot Services. A pointer to Boot Services can be obtained from the EFI System Table, just like we did previously to get the Simple Text Output Protocol for logging:

src/boot.zig
const boot_service: *uefi.tables.BootServices = uefi.system_table.boot_services orelse {
    log.err("Failed to get boot services.", .{});
    return .Aborted;
};
log.info("Got boot services.", .{});

From the obtained Boot Services, we retrieve the Simple File System Protocol:

src/boot.zig
var fs: *uefi.protocol.SimpleFileSystem = undefined;
status = boot_service.locateProtocol(&uefi.protocol.SimpleFileSystem.guid, null, @ptrCast(&fs));
if (status != .Success) {
    log.err("Failed to locate simple file system protocol.", .{});
    return status;
}
log.info("Located simple file system protocol.", .{});

Next, using the Simple File System Protocol, we open the root directory of the filesystem:

src/boot.zig
var root_dir: *uefi.protocol.File = undefined;
status = fs.openVolume(&root_dir);
if (status != .Success) {
    log.err("Failed to open volume.", .{});
    return status;
}
log.info("Opened filesystem volume.", .{});

info

In Zig, unlike in C, you cannot declare a variable without also initializing it. Every variable must be assigned a value at the time of declaration. If you want to assign an uninitialized value, you can use undefined. When a variable is initialized with undefined, its actual contents are filled with 0xAA in Debug mode, while in other build modes, the value is truly undefined. There is no way to check whether a variable was initialized with undefined or not.

Opening a File

Next, we open the Ymir ELF file. To open a file, we use the open() function. The filename must be specified using UCS-2, just like with the previous logging output. Since we'll open files using the Simple File System Protocol multiple times, it's helpful to create a utility function to convert strings to UCS-2:

src/boot.zig
inline fn toUcs2(comptime s: [:0]const u8) [s.len * 2:0]u16 {
    var ucs2: [s.len * 2:0]u16 = [_:0]u16{0} ** (s.len * 2);
    for (s, 0..) |c, i| {
        ucs2[i] = c;
        ucs2[i + 1] = 0;
    }
    return ucs2;
}

Since the filename to open is fixed at compile time, the argument is declared as comptime s. This allows us to use s.len in the return type. Because converting to UCS-2 doubles the byte length, the return type here is [s.len * 2:0]u16. Inside the function, the conversion is straightforward - just like before, it inserts a \0 after each ASCII byte.

Using this function, let's create a function to open a file:

src/boot.zig
fn openFile(
    root: *uefi.protocol.File,
    comptime name: [:0]const u8,
) !*uefi.protocol.File {
    var file: *uefi.protocol.File = undefined;
    const status = root.open(
        &file,
        &toUcs2(name),
        uefi.protocol.File.efi_file_mode_read,
        0,
    );

    if (status != .Success) {
        log.err("Failed to open file: {s}", .{name});
        return error.Aborted;
    }
    return file;
}

We open the file using root.open(). The third argument specifies the file mode; since writing is not needed, read-only mode is sufficient. The fourth argument sets the file attributes when creating a new file, but since we're only opening an existing file here, it is unused and can be set to 0.

Using this function, you can open the kernel as follows:

src/boot.zig
const kernel = openFile(root_dir, "ymir.elf") catch return .Aborted;
log.info("Opened kernel file.", .{});

Reading a File

Now that the Ymir ELF file is opened, we proceed to read it from the filesystem into memory. An ELF file always begins with an ELF Header. First, let's read just this header and parse it.

To allocate memory for reading the file, we use the AllocatePool() function provided by Memory Allocation Services in Boot Services:

src/boot.zig
var header_size: usize = @sizeOf(elf.Elf64_Ehdr);
var header_buffer: [*]align(8) u8 = undefined;
status = boot_service.allocatePool(.LoaderData, header_size, &header_buffer);
if (status != .Success) {
    log.err("Failed to allocate memory for kernel ELF header.", .{});
    return status;
}

The first argument of allocatePool() specifies the memory type1 to allocate. In this case, we request memory of type LoaderData, which is intended for UEFI application data2. The size of the ELF header is fixed and matches the size of the std.elf.Elf64_Ehdr structure. We allocate memory of this size. Note that header_size is defined as a var because it will also be used later to store the actual number of bytes read during the file read operation.

With the memory allocated for reading, let's proceed to actually read the file:

src/boot.zig
status = kernel.read(&header_size, header_buffer);
if (status != .Success) {
    log.err("Failed to read kernel ELF header.", .{});
    return status;
}

At this point, we have successfully read the ELF header of the Ymir kernel. The size of the data read is stored in header_size. Please try running this on QEMU to verify that it works correctly.

Parsing ELF Header

Finally, we parse the loaded ELF header. The structure of the ELF header is quite simple, so you could write your own parser. However, as we saw earlier, Zig provides the std.elf.Elf64_Ehdr struct to represent the ELF header. We will use this struct here3:

src/boot.zig
const elf_header = elf.Header.parse(header_buffer[0..@sizeOf(elf.Elf64_Ehdr)]) catch |err| {
    log.err("Failed to parse kernel ELF header: {?}", .{err});
    return .Aborted;
};
log.info("Parsed kernel ELF header.", .{});

That's all it takes. Simple, isn't it? To verify that the parsing was done correctly, let's output some of the fields:

zig
log.debug(
    \\Kernel ELF information:
    \\  Entry Point         : 0x{X}
    \\  Is 64-bit           : {d}
    \\  # of Program Headers: {d}
    \\  # of Section Headers: {d}
,
    .{
        elf_header.entry,
        @intFromBool(elf_header.is_64),
        elf_header.phnum,
        elf_header.shnum,
    },
);

The output will look like this:

txt
[INFO ] (surtr): Initialized bootloader log.
[INFO ] (surtr): Got boot services.
[INFO ] (surtr): Located simple file system protocol.
[INFO ] (surtr): Opened filesystem volume.
[INFO ] (surtr): Opened kernel file.
[INFO ] (surtr): Parsed kernel ELF header.
[DEBUG] (surtr): Kernel ELF information:
  Entry Point         : 0x10012B0
  Is 64-bit           : 1
  # of Program Headers: 4
  # of Section Headers: 16

You can verify the correctness of these values by comparing them with the output of readelf -h on zig-out/bin/ymir.elf.

Summary

In this chapter, we created a skeleton of the Ymir kernel and loaded the generated ELF file from the UEFI filesystem into memory. We also parsed the Ymir ELF header using Zig std library. Next, we need to parse the ELF program headers and load each segment into the virtual addresses specified by the ELF file. However, to map these requested virtual addresses to physical addresses, we must set up page tables. In the next chapter, we will implement page table management.

2

LoaderData is also the default memory type for UEFI applications.

3

Surtr/Ymir has no external dependencies whatsoever. However, we use what Zig provides without hesitation. If you prefer not to use even those, try writing your own ELF parser. It can be quite educational.