Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This project servers as a memory dump of my head. It contains more or less structure data about the technology I’m lerning about and as such the primary purpose of it is to help me remind of details which would others go away. Although it’s trying to be correct it is by no means perfect. It’s subject of continuous changes as I get deeper and deeper in the topics.

If you find it intreresting, found some obvious mistakes or want to contribute feel free to drop me a line at dbognar@protonmail.com

Misconceptions about system programming

  • It’s hard to learn
  • It’s too low level
  • It’s not flexibel enough
  • It’s dangerous and unsafe
  • It takes long to write a program
  • It’s targeting only one platform

Questions to answer:

  • Why does the kernel use a user-space loader (ld.so) instead of loading the shared libraries itself (like it does with the elf binary)?
  • Why do we need argc if the end of argv is marked by a null pointer?
  • Should I write this book in as a set of coding challenges?
    • Every section could start with a challenge description. Eg: print out the CLI arguments
    • It could provide some background knowledge
    • And an implementation of mine

Building standalone binary

In this chapter we’re going to create a standalone elf binary which only depends on the core rust library. For that we’re going through the following steps:

  1. Create initial project
  2. Disable the Rust standard library
  3. Disable standard startup logic
  4. Implement startup logic
  5. Implement teardown logic
  6. Implement the standard library

Initialize a project

Since we only support the linux platform let’s call our new library linux as it will be an interface to the linux kernel. To get a deeper understanding how the Rust ecosystem works we won’t use cargo at this but write out all the commands which Cargo uses to build the libraries and binaries. Let’s create a simple Rust binary with:

> echo 'fn main() {}' > bin.rs
> rustc bin.rs
> ./bin
> echo $?
0

The only thing our program currently does is giving back a number as return code but it’s gonna be more than enough for first.

Disable the Rust standard library

#![no_std]

To disable the Rust standard library we have to add the #![no_std] at the top of the source file:

#![no_std]
fn main() {}

If we try to rebuild the code we get the following errors:

> rustc bin.rs
error: `#[panic_handler]` function required, but not found
error: unwinding panics are not supported without std

Panic handler

It seems like the std lib provides a panic-handler which is needed to be able to compile the code. So let’s implement it by adding the following lines at the end of the main.rs file:

#![allow(unused)]
fn main() {
#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
}

Unwinding

But what should we do with the second error message: unwinding panics are not supported without std? What does unwinding mean? We can disable the unwinding support by aborting the execution in case of panic. As a result we get another type of error message.

> rustc -C panic=abort bin.rs
error: using `fn main` requires the standard library

#![no_main]

So the main function depends on the std too, but how we can start a program if there is no main function? Luckily the rustc gives us nice tips how we can solve this problem. We have to disable the compiler generated main function and implement a Linux specific version of it. One can do this by adding the #![no_main] attribute.

#![no_std]
#![no_main]

fn main() {}

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
> rustc -C panic=abort bin.rs
error: linking with `cc` failed: exit status: 1
  |
  = note: /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
          (.text+0x21): undefined reference to `__libc_start_main'
          collect2: error: ld returned 1 exit status

As you have probably expected it doesn’t compile. (From now on I’m going to cleanup the long error messages a bit to only show the relevant informations to us) But more interestingly it doens’t complain about the missing main function but the missing __libc_start_main function. Which is a bit weird because we’re compiling Rust and not C code.

Disable standard startup logic

To investigate the problem let’s go back to the std world and create a new binary which we can debug in gdb.

> echo 'fn main() {}' > std.rs
> rustc std.rs
> gdb ./std
(gdb) set backtrace past-main on
(gdb) set backtrace past-entry on
(gdb) break main
(gdb) run
(gdb) backtrace
#0  0x000055555555c320 in main ()
#1  0x00007ffff7d8fd90 in __libc_start_call_main (main=main@entry=0x55555555c320 <main>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe948) at ../sysdeps/nptl/libc_start_call_main.h:58
#2  0x00007ffff7d8fe40 in __libc_start_main_impl (main=0x55555555c320 <main>, argc=1, argv=0x7fffffffe948, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe938) at ../csu/libc-start.c:392
#3  0x000055555555c155 in _start ()

The standard Rust binary seems to be using some libc symboles to start the main function. There is the _start function which calls __libc_start_main_impl which calls __libc_start_call_main which calls the main function at the end. But do we really need these symboles? Do we need a main function at all? Or can we simply use the _start function as an entry point? Let’s rewrite the code like this:

#![allow(unused)]
#![no_std]
#![no_main]

fn main() {
fn _start() {}

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
}

and try to compile the binary without the general startup logic provided by gcc

> rustc -C panic=abort bin.rs -C link-args='-nostartfiles -static'
> ./bin
Segmentation fault (core dumped)

Implement startup logic

It look like we made a step further. We can compile our code now just we’re unable to run it. To find a reason of a segfault it’s typically good idea to run the binary in gdb.

> gdb ./bin
(gdb) set backtrace past-main on
(gdb) set backtrace past-entry on
(gdb) run
Starting program: /home/taabodal/work/blog/blog/src/chapter-01/bin

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) backtrace
#0  0x0000000000000000 in ?? ()
#1  0x0000000000000001 in ?? ()
#2  0x00007fffffffebda in ?? ()
#3  0x0000000000000000 in ?? ()

That’s not to much information a bunch of zeros in the backtrace and some questionmarks… But where is the _start function which we have defined? Let’s try another tool to print the symboles of an executable:

> nm ./bin
0000000000401000 R __bss_start
0000000000401000 R _edata
0000000000401000 R _end
                 U _start

Okay, so it has at least some data which we can read. The nm command shows the address (column 1) the type the (column 2) and the name of the symbole (column 3). The R type means that the symbole is in the read-only data section of the binary and U type means that the symbole is undefined. So the conclusion is that the _start function which we just added to the source is undefined. Which also explains why it doesn’t show any memory address for this function.

Rust has a different philosophy about public and private function compared with other popular languages like C or Java. In C or Java is everything public until you mark it specifically private. For example in C one can mark a function private for a compilation unit with the static keyword. As opposed to this in Rust is everythin private until you make it specificly public. So how can we make our _start function public? Let’s decorate it with the #![no_mangle] attribute. This attribute has to effects on the decorated function:

  • Disables name mangling (more about that later)
  • Makes the function public for the compilation unit
#![allow(unused)]
#![no_std]
#![no_main]

fn main() {
#[no_mangle]
fn _start() {}

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
}

After the function was exported the output of nm looks already much better (T means: the symbole is in the .text section of the code)

> nm bin
0000000000402000 T __bss_start
0000000000402000 T _edata
0000000000402000 T _end
0000000000401000 T _start

Implement teardown logic

We have proved that we have the _start function implemented, so why does the segfault happen? Our function is empty so it definitelly doesn’t do any invalid memory access, or does it? Is our function really empty? Let’s checkout the code generated by the compiler:

> objdump --disassemble=_start -M intel ./bin
0000000000401000 <_start>:
  401000:       c3                      ret

Even though in the Rust source the _start function is completelly empty the compiler still generates a return instruction for us. The first line of the documentation says already what we have missed:

Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction

If the _start function is the first code which gets executed then there is not return value on the stack which can be used to jump to after finishing the _start function. But what should we do if we can not return from a function?

The answer is: tell the kernel, that we’re done and the process should be destroyed without executing further instructions. We can do that by applying some assembly code in place of the ret instruction. Let’s rewrite the _start function like this:

#![allow(unused)]
fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "mov rax,0x3c",
            "mov rdi,0x0",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
}

The compiler will generate the following assembly code for us:

> rustc -C panic=abort bin.rs -C link-args='-nostartfiles -static'
> objdump --disassemble=_start -M intel ./bin
0000000000401000 <_start>:
  401001:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  401008:       48 c7 c7 00 00 00 00    mov    rdi,0x0
  40100f:       0f 05                   syscall
  401011:       0f 0b                   ud2

It has replaced the return instruction with the small code we provided and something else. So what does these lines do? The mov rax,0x3c moves the integer value 60 into the rax register of the CPU. This value is used by the kernel to identify the request as exit. The second instruction moves the integer value 0 into the rdi register. This will be the return code of our program. The syscall transfers the execution of the process to the kernel but since the process will be destroyed the last instruction ud2 will never be executed by the CPU. And it’s perfect like that because the ud2 is not a valid x86_86 instruction. This way the compiler makes sure that if the syscall returns the process will fail with immediatelly and Illegal Instruction error. This is the result of the options(noreturn). I encourage you to prove it yourself by putting the ud2 instruction before the syscall instruction and let the process crash. It looks like this:

> ./bin
Illegal instruction (core dumped)

But if you remove the ud2 instruction again, the execution of the binary gives you back 0 as return code:

> ./bin
> echo $?
0

And if you modify the value of the rdi register by replacing the 0x0 with 13 for example it gives back 13 as return code:

> ./bin
> echo $?
13

Feel free to remove the options(nostack) attribute too and compare the generated assembly code with the original version. Try to figure out why is the code generated like that. (We’re getting back to that later on)

Implement standard library

Until now we’ve implemented everyting in a single binary but what we’re aiming for with the project is creating a Linux specific standard library. So let’s move most of the code into a file called linux.rs and add the call to the main function into the _start function. The library file look this now:

#![allow(unused)]
#![no_std]
#![no_main]

fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "call main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
}

We’re calling main at first, an as the System V ABI describes the return value of the function will be placed into the rax register. We can simply move this value from rax to rdi so the kernel can use this information as a return code of the process. After that we write into the source of the executable something like this:

#![no_std]
#![no_main]

extern crate linux;

#[no_mangle]
fn main() -> u8 { 0 }

Since it’s getting difficult to write out all the rustc commands let’s create a build script to build our library and our binary. The cargo.sh looks like this:

#!/bin/bash

clean() {
    rm -rf target
}

build() {
    mkdir -p target
    rustc -C panic=abort --crate-type=lib linux.rs -o target/liblinux.rlib
    rustc -C panic=abort -C link-args='-nostartfiles -static' -L target ./bin.rs -o target/bin
}

run() {
    build
    ./target/bin
}

case "$1" in
    clean) clean;;
    build) build;;
    run) run;;
    *) echo "Invalid argument '$1'";;
esac

After adding execute permissions to the mini cargo script it can be used like this:

> chmod +x ./cargo.sh
> ./cargo.sh run
> echo $?
0

Implementing standard streams

In this chapter we’re going to continue builind the Linux standard library by going through the following steps:

  1. Syscalls in general
  2. Implement read and write syscalls
  3. Make syscalls safe
  4. Make syscalls idiomatic
  5. Abstract standard streams
  6. Implement string formatting

Syscalls in general

In chapter one we already implemented a systemcall called exit. We didn’t talk much about how it works. Since systemcalls are the foundation of the communication between the user and kernel space we will implement a couple of them throughout the following chapters. As result it’s important to get a basic understanding how they work.

Systemcalls work quite similar to function calls in the sinn that a couple of registers will be upated with some data, the execution of the current code will be interrupted to call another code section. This other code will use the values of the registers, do some operation with them and wenn it finishes the execution returns back to the original point to the caller function can continue with the result of the call. An important difference though is that by calling the syscall a contex switch will occur. This means that instead of simply jumping to another code segment of the same executable the process will be interrupted the CPU will switch to kernel mode and the code of the kernel continue to execute. The same happens at the end of the systemcall: the CPU switches back to user-mode and continues to execute the user-space code. To tell the CPU to make contex switches there are two instructions on x86 family called syscall and sysret. The first is used by user-space codes to switch to kernel and the second is used by the kernel to switch back to user-mode.

There are many systemcalls defined by the Linux kernel. The id of these systemcalls can be found in the kernes source tree. The 64 bit version of the x86 architecture can be found for example here

If you have already done some lower level programming (for example C/C++) you most likely already know some of these calls. The standard C library warps these systemcalls into simple functions so you can call them in youre code without even realizing that a contex switch is needed. Some famous examples are the following:

  • read
  • write
  • open
  • close
  • socket
  • connect
  • accept
  • exit

Since we don’t use the standard C library we need to implement these wrappers in rust to be able to use them in our binaries.

To be able to pass arguments to the kernel we need to specific registers. The question is which register should we use? The references which describe how a binary code needs to be implemented / interpeted called Application Binary Interface (ABI). Linux uses the System V ABI specification. There are many interesting stuff to read about in this PDF but the most important part now for us are the calling conventions. It turns out the the function calling convention of the C language and the syscall interface are not the same. While the function arguments are passed in the rdi, rsi, rdx, rcx, r8, r9 registers the syscall interface uses the rdi, rsi, rdx, r10, r8 and r9 registers. Appart from that it’s important that the rax register is used to pass the syscall id and to retrieve the result of the syscall. To conform to these requirements we can implement a macro to provide a simple way of starting a syscall. Let’s create a file called syscalls.rs and add a pub mod syscalls to the linux.rs file.

#![allow(unused)]
fn main() {
macro_rules! syscall {
    ($rax:expr) => {{
        core::arch::asm!(
            "syscall",
            inout("rax") $rax,
        );
        $rax
    }};

    ($rax:expr, $rdi:expr) => {{
        let mut rax: isize;
        core::arch::asm!(
            "syscall",
            inlateout("rax") $rax => rax,
            in("rdi") $rdi,
        );
        rax
    }};

    ($rax:expr, $rdi:expr, $rsi:expr) => {{
        let mut rax: isize;
        core::arch::asm!(
            "syscall",
            inlateout("rax") $rax => rax,
            in("rdi") $rdi,
            in("rsi") $rsi,
        );
        rax
    }};

    ($rax:expr, $rdi:expr, $rsi:expr, $rdx:expr) => {{
        let mut rax: isize;
        core::arch::asm!(
            "syscall",
            inlateout("rax") $rax => rax,
            in("rdi") $rdi,
            in("rsi") $rsi,
            in("rdx") $rdx,
        );
        rax
    }};

    ($rax:expr, $rdi:expr, $rsi:expr, $rdx:expr, $r10:expr) => {{
        let mut rax: isize;
        core::arch::asm!(
            "syscall",
            inlateout("rax") $rax => rax,
            in("rdi") $rdi,
            in("rsi") $rsi,
            in("rdx") $rdx,
            in("r10") $r10,
        );
        rax
    }};

    ($rax:expr, $rdi:expr, $rsi:expr, $rdx:expr, $r10:expr, $r8:expr) => {{
        let mut rax: isize;
        core::arch::asm!(
            "syscall",
            inlateout("rax") $rax => rax,
            in("rdi") $rdi,
            in("rsi") $rsi,
            in("rdx") $rdx,
            in("r10") $r10,
            in("r8") $r8,
        );
        rax
    }};

    ($rax:expr, $rdi:expr, $rsi:expr, $rdx:expr, $r10:expr, $r8:expr, $r9:expr) => {{
        let mut rax: isize;
        core::arch::asm!(
            "syscall",
            inlateout("rax") $rax => rax,
            in("rdi") $rdi,
            in("rsi") $rsi,
            in("rdx") $rdx,
            in("r10") $r10,
            in("r8") $r8,
            in("r9") $r9,
        );
        rax
    }};
}
}

This macro can be called with variadic (1-7) number of arguments which will be passed into the specified registers. After the registers were filled with the data the syscall instruction will be executed to hand over the execution to the kernel. Note that the asm macro of the rust core library requires the parameters to be placed after the assembly code it self even though they will be set before the execution.

Read, write, exit

The simplest way to lookup how the standard C library has implemented a systemcall wrapper is to check out the manual page of the it. For example: man read.2, man write.2, man exit.2

The function signatures written in C look like this

ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
void exit(int rc);

Let’s update our syscalls.rs file with the following functions:

#![allow(unused)]
fn main() {
const SYS_READ: isize = 0;
const SYS_WRITE: isize = 1;
const SYS_EXIT: isize = 60;

pub fn read(fd: i32, buf: *mut u8, count: usize) -> isize {
    unsafe { syscall!(SYS_READ, fd, buf, count) }
}

pub fn write(fd: i32, buf: *const u8, count: usize) -> isize {
    unsafe { syscall!(SYS_WRITE, fd, buf, count) }
}

pub fn exit(rc: u8) -> ! {
    unsafe { syscall!(SYS_EXIT, rc as u32); }
    unreachable!();
}
}

This allows us to read some user input and write it to the stdout as follows:

#[no_mangle]
fn main() { 
    let mut buf = [0u8;1024];
    let ptr = &mut buf as *mut u8;
    linux::syscall::read(0, ptr, buf.len());
    linux::syscall::write(1, ptr, buf.len());
    0
}

But we have still a problem… Our program doesn’t compile anymore. We have just introduced an undefined reference

> ./cargo.sh build
error: linking with `cc` failed: exit status: 1
  /blog/chapter-02/src/bin.rs:7: undefined reference to `memset'

It turns out that to be able to use the rust syntax let buf = [0u8;1024] the core library needs the memset symbole. This makes sense since this expression fills up a memory region with 1024 zeros. There are a couple symboles the core library needs to be able to work. These are typically provided by the standard C library but since we have disabled any libraries apart from the core lib we have to implement them manually. The documentation says the expected symboles are:

  • memcpy
  • memmove
  • memset
  • memcmp
  • bcmp
  • strlen

There are some other expected symboles like rust_begin_panic and rust_eh_personality but we will only implement these step by step to be able to explore which functionality of the core library needs them. Let’s implement the memset for now in the ffi module. We need to add a pub mod ffi; in to the linux.rs file and create ffi.rs with the content:

#![allow(unused)]
fn main() {
use core::convert::TryInto;

#[no_mangle]
fn memset(buffer: *mut u8, byte: u8, len: usize) -> *mut u8 {
    for idx in 0 .. len {
        let offset = idx.try_into().unwrap();
        unsafe { buffer.offset(offset).write(byte); }
    }
    buffer
}
}

And recompile the code

> echo "hello world" | ./cargo.sh run
hello world

Safe syscalls

Wenn we write unsafe code we sign a contract with the compiler that our code is never going to be unsound. In the Rust world a codeblock is known to sound if it can never cause undefined behaviour. Luckily it’s quiet well defined what “undefined” means. There a list of actions which causes undefined behavior and if we can be sure you are not hitting any of the items of list our code in said to be sound. Even if this list is quite straitforward it’s easy to miss some small detail just like we did in the previous paragraphs. Our code look good, right? It has basically the same signature like the C functions and it passes all the arguments to the kernel. It doesn’t do something like dereferencing raw pointers, it doesn’t do array indexing, doesn’t free up memory, so what could go wrong then? Well let’s rewrite the main function and see what happens.

#[no_mangle]
fn main() -> u8 { 
    linux::syscall::write(1, b"X" as *const u8, 1024);
    0
}

If we run this code we just experience undefined behaviour: We pass the kernel a one byte length array and a length paramter 1024. As a result it tries to write 1024 bytes after the position of our byte array and it is absolutelly not defined what will happen in such a scenario. In our case since the byte array was in the read only section of the binary it picks up the bytes from there.

./target/bin
xinternal error: entered unreachable codesyscall.rsHhzRx
A                                                          C
UAC

The conclusion is that Rust is only safe if every part of the code is known to be sound. Our code is not sound because the safe rust code can pass such parameters to it which causes undefined behaviour. Let’s fix that by utilizing a primitive type in the Rust core library called slice. Since the slice bundles the buffer and its length a user of our code can not pass a length paramter which is bigger than the size of the slice. To be more precise it can pass to our function a slice which is has an invalid length parameter but to create this slice one need to use an other unsafe block and the auther of this unsafe block has signed the same contract with the compiler, that it can never produce undefined behaviour. So you see the point. If all the unsafe blocks are sound then the whole language is safe. But if any of these block is unsound the whole ecosystem is corrupted. So let’s be causios with unsafe blocks. Here is a fix for our syscalls:

#![allow(unused)]
fn main() {
pub fn read(fd: u32, buf: &mut [u8]) -> isize {
    unsafe { syscall!(SYS_READ, fd, buf.as_ptr(), buf.len()) }
}

pub fn write(fd: u32, buf: &[u8]) -> isize {
    unsafe { syscall!(SYS_WRITE, fd, buf.as_ptr(), buf.len()) }
}
}

The main function works like this:

#[no_mangle]
fn main() -> u8 { 
    linux::syscall::write(1, b"x");
    0
}

Since there is no way to missuse this syscall if you run it, it will write exaclty one character to the screen:

> ./cargo.sh run
x

Idiomatic syscall

Although our code is now safe it is still not really idiomatic. In C programming it’s normal to return with a number wich represents the result of the function. For example all of our syscalls return with a negativ integer in case of an error. But in rust we have a nicer way to handle error which is based on the Result enum. Let’s create an Error enum and a Result enum to represent the result of our syscalls. The list of the error codes that a syscall may return can be found in errno-base.h and errno.h After combining the content of these files we can build a huge enum which represents these error codes

#![allow(unused)]
fn main() {
use core::fmt;

pub type Result<T> = core::result::Result<T, Error>;

#[derive(Debug)]
pub enum Error {
    EPERM = 1,
    ENOENT = 2,
    ESRCH = 3,
    EINTR = 4,
    EIO = 5,
    ENXIO = 6,
    E2BIG = 7,
    ENOEXEC = 8,
    EBADF = 9,
    ECHILD = 10,
    EAGAIN = 11,
    ENOMEM = 12,
    EACCES = 13,
    EFAULT = 14,
    ENOTBLK = 15,
    EBUSY = 16,
    EEXIST = 17,
    EXDEV = 18,
    ENODEV = 19,
    ENOTDIR = 20,
    EISDIR = 21,
    EINVAL = 22,
    ENFILE = 23,
    EMFILE = 24,
    ENOTTY = 25,
    ETXTBSY = 26,
    EFBIG = 27,
    ENOSPC = 28,
    ESPIPE = 29,
    EROFS = 30,
    EMLINK = 31,
    EPIPE = 32,
    EDOM = 33,
    ERANGE = 34,
    EDEADLK = 35,
    ENAMETOOLONG = 36,
    ENOLCK = 37,
    ENOSYS = 38,
    ENOTEMPTY = 39,
    ELOOP = 40,
    EWOULDBLOCK = 41,
    ENOMSG = 42,
    EIDRM = 43,
    ECHRNG = 44,
    EL2NSYNC = 45,
    EL3HLT = 46,
    EL3RST = 47,
    ELNRNG = 48,
    EUNATCH = 49,
    ENOCSI = 50,
    EL2HLT = 51,
    EBADE = 52,
    EBADR = 53,
    EXFULL = 54,
    ENOANO = 55,
    EBADRQC = 56,
    EBADSLT = 57,
    EDEADLOCK = 58,
    EBFONT = 59,
    ENOSTR = 60,
    ENODATA = 61,
    ETIME = 62,
    ENOSR = 63,
    ENONET = 64,
    ENOPKG = 65,
    EREMOTE = 66,
    ENOLINK = 67,
    EADV = 68,
    ESRMNT = 69,
    ECOMM = 70,
    EPROTO = 71,
    EMULTIHOP = 72,
    EDOTDOT = 73,
    EBADMSG = 74,
    EOVERFLOW = 75,
    ENOTUNIQ = 76,
    EBADFD = 77,
    EREMCHG = 78,
    ELIBACC = 79,
    ELIBBAD = 80,
    ELIBSCN = 81,
    ELIBMAX = 82,
    ELIBEXEC = 83,
    EILSEQ = 84,
    ERESTART = 85,
    ESTRPIPE = 86,
    EUSERS = 87,
    ENOTSOCK = 88,
    EDESTADDRREQ = 89,
    EMSGSIZE = 90,
    EPROTOTYPE = 91,
    ENOPROTOOPT = 92,
    EPROTONOSUPPORT = 93,
    ESOCKTNOSUPPORT = 94,
    EOPNOTSUPP = 95,
    EPFNOSUPPORT = 96,
    EAFNOSUPPORT = 97,
    EADDRINUSE = 98,
    EADDRNOTAVAIL = 99,
    ENETDOWN = 100,
    ENETUNREACH = 101,
    ENETRESET = 102,
    ECONNABORTED = 103,
    ECONNRESET = 104,
    ENOBUFS = 105,
    EISCONN = 106,
    ENOTCONN = 107,
    ESHUTDOWN = 108,
    ETOOMANYREFS = 109,
    ETIMEDOUT = 110,
    ECONNREFUSED = 111,
    EHOSTDOWN = 112,
    EHOSTUNREACH = 113,
    EALREADY = 114,
    EINPROGRESS = 115,
    ESTALE = 116,
    EUCLEAN = 117,
    ENOTNAM = 118,
    ENAVAIL = 119,
    EISNAM = 120,
    EREMOTEIO = 121,
    EDQUOT = 122,
    ENOMEDIUM = 123,
    EMEDIUMTYPE	= 124,
    ECANCELED = 125,
    ENOKEY = 126,
    EKEYEXPIRED	= 127,
    EKEYREVOKED	= 128,
    EKEYREJECTED = 129,
    EOWNERDEAD = 130,
    ENOTRECOVERABLE = 131,
    ERFKILL = 132,
    EHWPOISON = 133,
}


impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.as_str())
    }
}

impl From<Error> for isize {
    fn from(error: Error) -> Self {
        match error {
            Error::EPERM => 1,
            Error::ENOENT => 2,
            Error::ESRCH => 3,
            Error::EINTR => 4,
            Error::EIO => 5,
            Error::ENXIO => 6,
            Error::E2BIG => 7,
            Error::ENOEXEC => 8,
            Error::EBADF => 9,
            Error::ECHILD => 10,
            Error::EAGAIN => 11,
            Error::ENOMEM => 12,
            Error::EACCES => 13,
            Error::EFAULT => 14,
            Error::ENOTBLK => 15,
            Error::EBUSY => 16,
            Error::EEXIST => 17,
            Error::EXDEV => 18,
            Error::ENODEV => 19,
            Error::ENOTDIR => 20,
            Error::EISDIR => 21,
            Error::EINVAL => 22,
            Error::ENFILE => 23,
            Error::EMFILE => 24,
            Error::ENOTTY => 25,
            Error::ETXTBSY => 26,
            Error::EFBIG => 27,
            Error::ENOSPC => 28,
            Error::ESPIPE => 29,
            Error::EROFS => 30,
            Error::EMLINK => 31,
            Error::EPIPE => 32,
            Error::EDOM => 33,
            Error::ERANGE => 34,
            Error::EDEADLK => 35,
            Error::ENAMETOOLONG => 36,
            Error::ENOLCK => 37,
            Error::ENOSYS => 38,
            Error::ENOTEMPTY => 39,
            Error::ELOOP => 40,
            Error::EWOULDBLOCK => 41,
            Error::ENOMSG => 42,
            Error::EIDRM => 43,
            Error::ECHRNG => 44,
            Error::EL2NSYNC => 45,
            Error::EL3HLT => 46,
            Error::EL3RST => 47,
            Error::ELNRNG => 48,
            Error::EUNATCH => 49,
            Error::ENOCSI => 50,
            Error::EL2HLT => 51,
            Error::EBADE => 52,
            Error::EBADR => 53,
            Error::EXFULL => 54,
            Error::ENOANO => 55,
            Error::EBADRQC => 56,
            Error::EBADSLT => 57,
            Error::EDEADLOCK => 58,
            Error::EBFONT => 59,
            Error::ENOSTR => 60,
            Error::ENODATA => 61,
            Error::ETIME => 62,
            Error::ENOSR => 63,
            Error::ENONET => 64,
            Error::ENOPKG => 65,
            Error::EREMOTE => 66,
            Error::ENOLINK => 67,
            Error::EADV => 68,
            Error::ESRMNT => 69,
            Error::ECOMM => 70,
            Error::EPROTO => 71,
            Error::EMULTIHOP => 72,
            Error::EDOTDOT => 73,
            Error::EBADMSG => 74,
            Error::EOVERFLOW => 75,
            Error::ENOTUNIQ => 76,
            Error::EBADFD => 77,
            Error::EREMCHG => 78,
            Error::ELIBACC => 79,
            Error::ELIBBAD => 80,
            Error::ELIBSCN => 81,
            Error::ELIBMAX => 82,
            Error::ELIBEXEC => 83,
            Error::EILSEQ => 84,
            Error::ERESTART => 85,
            Error::ESTRPIPE => 86,
            Error::EUSERS => 87,
            Error::ENOTSOCK => 88,
            Error::EDESTADDRREQ => 89,
            Error::EMSGSIZE => 90,
            Error::EPROTOTYPE => 91,
            Error::ENOPROTOOPT => 92,
            Error::EPROTONOSUPPORT => 93,
            Error::ESOCKTNOSUPPORT => 94,
            Error::EOPNOTSUPP => 95,
            Error::EPFNOSUPPORT => 96,
            Error::EAFNOSUPPORT => 97,
            Error::EADDRINUSE => 98,
            Error::EADDRNOTAVAIL => 99,
            Error::ENETDOWN => 100,
            Error::ENETUNREACH => 101,
            Error::ENETRESET => 102,
            Error::ECONNABORTED => 103,
            Error::ECONNRESET => 104,
            Error::ENOBUFS => 105,
            Error::EISCONN => 106,
            Error::ENOTCONN => 107,
            Error::ESHUTDOWN => 108,
            Error::ETOOMANYREFS => 109,
            Error::ETIMEDOUT => 110,
            Error::ECONNREFUSED => 111,
            Error::EHOSTDOWN => 112,
            Error::EHOSTUNREACH => 113,
            Error::EALREADY => 114,
            Error::EINPROGRESS => 115,
            Error::ESTALE => 116,
            Error::EUCLEAN => 117,
            Error::ENOTNAM => 118,
            Error::ENAVAIL => 119,
            Error::EISNAM => 120,
            Error::EREMOTEIO => 121,
            Error::EDQUOT => 122,
            Error::ENOMEDIUM => 123,
            Error::EMEDIUMTYPE => 124,
            Error::ECANCELED => 125,
            Error::ENOKEY => 126,
            Error::EKEYEXPIRED => 127,
            Error::EKEYREVOKED => 128,
            Error::EKEYREJECTED => 129,
            Error::EOWNERDEAD => 130,
            Error::ENOTRECOVERABLE => 131,
            Error::ERFKILL => 132,
            Error::EHWPOISON => 133,
        }
    }
}

impl From<isize> for Error {
    fn from(number: isize) -> Self {
        match number {
            1 => Self::EPERM,
            2 => Self::ENOENT,
            3 => Self::ESRCH,
            4 => Self::EINTR,
            5 => Self::EIO,
            6 => Self::ENXIO,
            7 => Self::E2BIG,
            8 => Self::ENOEXEC,
            9 => Self::EBADF,
            10 => Self::ECHILD,
            11 => Self::EAGAIN,
            12 => Self::ENOMEM,
            13 => Self::EACCES,
            14 => Self::EFAULT,
            15 => Self::ENOTBLK,
            16 => Self::EBUSY,
            17 => Self::EEXIST,
            18 => Self::EXDEV,
            19 => Self::ENODEV,
            20 => Self::ENOTDIR,
            21 => Self::EISDIR,
            22 => Self::EINVAL,
            23 => Self::ENFILE,
            24 => Self::EMFILE,
            25 => Self::ENOTTY,
            26 => Self::ETXTBSY,
            27 => Self::EFBIG,
            28 => Self::ENOSPC,
            29 => Self::ESPIPE,
            30 => Self::EROFS,
            31 => Self::EMLINK,
            32 => Self::EPIPE,
            33 => Self::EDOM,
            34 => Self::ERANGE,
            35 => Self::EDEADLK,
            36 => Self::ENAMETOOLONG,
            37 => Self::ENOLCK,
            38 => Self::ENOSYS,
            39 => Self::ENOTEMPTY,
            40 => Self::ELOOP,
            41 => Self::EWOULDBLOCK,
            42 => Self::ENOMSG,
            43 => Self::EIDRM,
            44 => Self::ECHRNG,
            45 => Self::EL2NSYNC,
            46 => Self::EL3HLT,
            47 => Self::EL3RST,
            48 => Self::ELNRNG,
            49 => Self::EUNATCH,
            50 => Self::ENOCSI,
            51 => Self::EL2HLT,
            52 => Self::EBADE,
            53 => Self::EBADR,
            54 => Self::EXFULL,
            55 => Self::ENOANO,
            56 => Self::EBADRQC,
            57 => Self::EBADSLT,
            58 => Self::EDEADLOCK,
            59 => Self::EBFONT,
            60 => Self::ENOSTR,
            61 => Self::ENODATA,
            62 => Self::ETIME,
            63 => Self::ENOSR,
            64 => Self::ENONET,
            65 => Self::ENOPKG,
            66 => Self::EREMOTE,
            67 => Self::ENOLINK,
            68 => Self::EADV,
            69 => Self::ESRMNT,
            70 => Self::ECOMM,
            71 => Self::EPROTO,
            72 => Self::EMULTIHOP,
            73 => Self::EDOTDOT,
            74 => Self::EBADMSG,
            75 => Self::EOVERFLOW,
            76 => Self::ENOTUNIQ,
            77 => Self::EBADFD,
            78 => Self::EREMCHG,
            79 => Self::ELIBACC,
            80 => Self::ELIBBAD,
            81 => Self::ELIBSCN,
            82 => Self::ELIBMAX,
            83 => Self::ELIBEXEC,
            84 => Self::EILSEQ,
            85 => Self::ERESTART,
            86 => Self::ESTRPIPE,
            87 => Self::EUSERS,
            88 => Self::ENOTSOCK,
            89 => Self::EDESTADDRREQ,
            90 => Self::EMSGSIZE,
            91 => Self::EPROTOTYPE,
            92 => Self::ENOPROTOOPT,
            93 => Self::EPROTONOSUPPORT,
            94 => Self::ESOCKTNOSUPPORT,
            95 => Self::EOPNOTSUPP,
            96 => Self::EPFNOSUPPORT,
            97 => Self::EAFNOSUPPORT,
            98 => Self::EADDRINUSE,
            99 => Self::EADDRNOTAVAIL,
            100 => Self::ENETDOWN,
            101 => Self::ENETUNREACH,
            102 => Self::ENETRESET,
            103 => Self::ECONNABORTED,
            104 => Self::ECONNRESET,
            105 => Self::ENOBUFS,
            106 => Self::EISCONN,
            107 => Self::ENOTCONN,
            108 => Self::ESHUTDOWN,
            109 => Self::ETOOMANYREFS,
            110 => Self::ETIMEDOUT,
            111 => Self::ECONNREFUSED,
            112 => Self::EHOSTDOWN,
            113 => Self::EHOSTUNREACH,
            114 => Self::EALREADY,
            115 => Self::EINPROGRESS,
            116 => Self::ESTALE,
            117 => Self::EUCLEAN,
            118 => Self::ENOTNAM,
            119 => Self::ENAVAIL,
            120 => Self::EISNAM,
            121 => Self::EREMOTEIO,
            122 => Self::EDQUOT,
            123 => Self::ENOMEDIUM,
            124 => Self::EMEDIUMTYPE,
            125 => Self::ECANCELED,
            126 => Self::ENOKEY,
            127 => Self::EKEYEXPIRED,
            128 => Self::EKEYREVOKED,
            129 => Self::EKEYREJECTED,
            130 => Self::EOWNERDEAD,
            131 => Self::ENOTRECOVERABLE,
            132 => Self::ERFKILL,
            133 => Self::EHWPOISON,
            other => panic!("Invalid error code: {}", other),
        }
    }
}

impl Error {
    pub fn as_str(&self) -> &'static str {
        match self {
            Self::EPERM => "Operation not permitted",
            Self::ENOENT => "No such file or directory",
            Self::ESRCH => "No such process",
            Self::EINTR => "Interrupted system call",
            Self::EIO => "I/O error",
            Self::ENXIO => "No such device or address",
            Self::E2BIG => "Arg list too long",
            Self::ENOEXEC => "Exec format error",
            Self::EBADF => "Bad file number",
            Self::ECHILD => "No child processes",
            Self::EAGAIN => "Try again",
            Self::ENOMEM => "Out of memory",
            Self::EACCES => "Permission denied",
            Self::EFAULT => "Bad address",
            Self::ENOTBLK => "Block device required",
            Self::EBUSY => "Device or resource busy",
            Self::EEXIST => "File exists",
            Self::EXDEV => "Cross-device link",
            Self::ENODEV => "No such device",
            Self::ENOTDIR => "Not a directory",
            Self::EISDIR => "Is a directory",
            Self::EINVAL => "Invalid argument",
            Self::ENFILE => "File table overflow",
            Self::EMFILE => "Too many open files",
            Self::ENOTTY => "Not a typewriter",
            Self::ETXTBSY => "Text file busy",
            Self::EFBIG => "File too large",
            Self::ENOSPC => "No space left on device",
            Self::ESPIPE => "Illegal seek",
            Self::EROFS => "Read-only file system",
            Self::EMLINK => "Too many links",
            Self::EPIPE => "Broken pipe",
            Self::EDOM => "Math argument out of domain of func",
            Self::ERANGE => "Math result not representable",
            Self::EDEADLK => "Resource deadlock would occur",
            Self::ENAMETOOLONG => "File name too long",
            Self::ENOLCK => "No record locks available",
            Self::ENOSYS => "Function not implemented",
            Self::ENOTEMPTY => "Directory not empty",
            Self::ELOOP => "Too many symbolic links encountered",
            Self::EWOULDBLOCK => "Operation would block",
            Self::ENOMSG => "No message of desired type",
            Self::EIDRM => "Identifier removed",
            Self::ECHRNG => "Channel number out of range",
            Self::EL2NSYNC => "Level 2 not synchronized",
            Self::EL3HLT => "Level 3 halted",
            Self::EL3RST => "Level 3 reset",
            Self::ELNRNG => "Link number out of range",
            Self::EUNATCH => "Protocol driver not attached",
            Self::ENOCSI => "No CSI structure available",
            Self::EL2HLT => "Level 2 halted",
            Self::EBADE => "Invalid exchange",
            Self::EBADR => "Invalid request descriptor",
            Self::EXFULL => "Exchange full",
            Self::ENOANO => "No anode",
            Self::EBADRQC => "Invalid request code",
            Self::EBADSLT => "Invalid slot",
            Self::EDEADLOCK => "File locking deadlock error",
            Self::EBFONT => "Bad font file format",
            Self::ENOSTR => "Device not a stream",
            Self::ENODATA => "No data available",
            Self::ETIME => "Timer expired",
            Self::ENOSR => "Out of streams resources",
            Self::ENONET => "Machine is not on the network",
            Self::ENOPKG => "Package not installed",
            Self::EREMOTE => "Object is remote",
            Self::ENOLINK => "Link has been severed",
            Self::EADV => "Advertise error",
            Self::ESRMNT => "Srmount error",
            Self::ECOMM => "Communication error on send",
            Self::EPROTO => "Protocol error",
            Self::EMULTIHOP => "Multihop attempted",
            Self::EDOTDOT => "RFS specific error",
            Self::EBADMSG => "Not a data message",
            Self::EOVERFLOW => "Value too large for defined data type",
            Self::ENOTUNIQ => "Name not unique on network",
            Self::EBADFD => "File descriptor in bad state",
            Self::EREMCHG => "Remote address changed",
            Self::ELIBACC => "Can not access a needed shared library",
            Self::ELIBBAD => "Accessing a corrupted shared library",
            Self::ELIBSCN => ".lib section in a.out corrupted",
            Self::ELIBMAX => "Attempting to link in too many shared libraries",
            Self::ELIBEXEC => "Cannot exec a shared library directly",
            Self::EILSEQ => "Illegal byte sequence",
            Self::ERESTART => "Interrupted system call should be restarted",
            Self::ESTRPIPE => "Streams pipe error",
            Self::EUSERS => "Too many users",
            Self::ENOTSOCK => "Socket operation on non-socket",
            Self::EDESTADDRREQ => "Destination address required",
            Self::EMSGSIZE => "Message too long",
            Self::EPROTOTYPE => "Protocol wrong type for socket",
            Self::ENOPROTOOPT => "Protocol not available",
            Self::EPROTONOSUPPORT => "Protocol not supported",
            Self::ESOCKTNOSUPPORT => "Socket type not supported",
            Self::EOPNOTSUPP => "Operation not supported on transport endpoint",
            Self::EPFNOSUPPORT => "Protocol family not supported",
            Self::EAFNOSUPPORT => "Address family not supported by protocol",
            Self::EADDRINUSE => "Address already in use",
            Self::EADDRNOTAVAIL => "Cannot assign requested address",
            Self::ENETDOWN => "Network is down",
            Self::ENETUNREACH => "Network is unreachable",
            Self::ENETRESET => "Network dropped connection because of reset",
            Self::ECONNABORTED => "Software caused connection abort",
            Self::ECONNRESET => "Connection reset by peer",
            Self::ENOBUFS => "No buffer space available",
            Self::EISCONN => "Transport endpoint is already connected",
            Self::ENOTCONN => "Transport endpoint is not connected",
            Self::ESHUTDOWN => "Cannot send after transport endpoint shutdown",
            Self::ETOOMANYREFS => "Too many references: cannot splice",
            Self::ETIMEDOUT => "Connection timed out",
            Self::ECONNREFUSED => "Connection refused",
            Self::EHOSTDOWN => "Host is down",
            Self::EHOSTUNREACH => "No route to host",
            Self::EALREADY => "Operation already in progress",
            Self::EINPROGRESS => "Operation now in progress",
            Self::ESTALE => "Stale NFS file handle",
            Self::EUCLEAN => "Structure needs cleaning",
            Self::ENOTNAM => "Not a XENIX named type file",
            Self::ENAVAIL => "No XENIX semaphores available",
            Self::EISNAM => "Is a named type file",
            Self::EREMOTEIO => "Remote I/O error",
            Self::EDQUOT => "Quota exceeded",
            Self::ENOMEDIUM => "No medium found",
            Self::EMEDIUMTYPE => "Wrong medium type",
            Self::ECANCELED => "Operation Canceled",
            Self::ENOKEY => "Required key not available",
            Self::EKEYEXPIRED => "Key has expired",
            Self::EKEYREVOKED => "Key has been revoked",
            Self::EKEYREJECTED => "Key was rejected by service",
            Self::EOWNERDEAD => "Owner died",
            Self::ENOTRECOVERABLE => "State not recoverable",
            Self::ERFKILL => "Operation not possible due to RF-kill",
            Self::EHWPOISON => "Memory page has hardware error",
        }
    }
}

}

Once we have Result and Error we can reimplement our syscalls as follows:

#![allow(unused)]
fn main() {
use core::convert::TryInto;
use crate::error::{Error, Result};

#[no_mangle]
pub fn read(fd: i32, buf: &mut [u8]) -> Result<usize> {
    let rc = unsafe { syscall!(SYS_READ, fd, buf.as_ptr(), buf.len()) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    }

    Ok(rc.try_into().unwrap())
}


#[no_mangle]
pub fn write(fd: i32, buf: &[u8]) -> Result<usize> {
    let rc = unsafe { syscall!(SYS_WRITE, fd, buf.as_ptr(), buf.len()) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    }

    Ok(rc.try_into().unwrap())
}
}

and the main function should look like:

#[no_mangle]
fn main() -> u8 { 
    linux::syscall::write(1, b"Hello world\n").unwrap();
    0
}

Once we recompile and run the code we can see the text on the stdout

> ./cargo.sh run
Hello world

But what happens if we specify a wrong file number. Let’s use 3 as file descriptor instead of 1. Since we never opened a file with a descriptor 3 we should see an error now. Let’s recompile and run

> ./cargo.sh run

our program starts hammering on the CPU and never exists. Sounds familiar? The write syscall returns and error, we unwrap it and as a result our code panics, But we implemented our panic handler in the first chapter like this:

#![allow(unused)]
fn main() {
#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
}

Let’s fix that calling the exit syscall instead of looping forever:

#![allow(unused)]
fn main() {
#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    syscall::exit(255);
}
}

Now if we run the same code with file descriptor 3 the process should simply exit with error code 255.

> ./cargo.sh run; echo $?
255

Standard IO

We already implemented Display and Debug for our error type so why don’t we simply print them on the stderr? The PanicInfo also implements these traits, so we should be able to write them out, but how should we creata a string or more preciselly a bytearray from these types? There is a nice macro in the core library called write! which could be used to format the output. Let’s try that in the panic_handler function.

#![allow(unused)]
fn main() {
#[panic_handler]
fn panic_handler(info: &core::panic::PanicInfo) -> ! {
    write!(1u8, "{:?}", info);
    syscall::exit(255);
}
}

As you probably have expect, we get a compilation error:

./cargo.sh build
error[E0599]: cannot write into `u8`
  --> linux.rs:24:12
   |
24 |     write!(1u8, "{:?}", info);
   |     -------^^^--------------- method not found in `u8`
   |
note: must implement `io::Write`, `fmt::Write`, or have a `write_fmt` method
  --> linux.rs:24:12
   |
24 |     write!(1u8, "{:?}", info);
   |            ^^^
help: a writer is needed before this format string
  --> linux.rs:24:12
   |
24 |     write!(1u8, "{:?}", info);
   |            ^

We can not write into u8… Which kind of makes sense. The write! macro is part of the core library which has no idea about the write syscall we just implemented. We should somehow inverse the dependencies and the compiler message helps us to do that. The first argument of the write! macro needs to implement the io::Write, fmt::Write traits or needs to have a write_fmt method. Let’s wrap some integers into a struct and implement the fmt::Write trait for it. (The io::Write trait is part of the std library which we don’t have access to)

Let’s create a new module, called io. We need to include it into the linux.rs with pub mode io; and create a new file called io.rs with the following content:

#![allow(unused)]
fn main() {
use core::fmt;
use crate::error::Result;

pub struct Stdio {
    fd: u32,
}

impl Stdio {
    pub fn read(&self, buf: &mut [u8]) -> Result<usize> {
        crate::syscall::read(self.fd, buf)
    }

    pub fn write(&self, buf: &[u8]) -> Result<usize> {
        crate::syscall::write(self.fd, buf)
    }
}

impl fmt::Write for Stdio {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        match self.write(s.as_bytes()) {
            Ok(_) => Ok(()),
            Err(_) => Err(fmt::Error),
        }
    }
}

pub fn stdin() -> Stdio {
    Stdio { fd: 0 }
}

pub fn stdout() -> Stdio {
    Stdio { fd: 1 }
}

pub fn stderr() -> Stdio {
    Stdio { fd: 2 }
}
}

After that we can rewrite the panic-handler like this:

#![allow(unused)]
fn main() {
use core::fmt::Write;

#[panic_handler]
fn panic_handler(info: &core::panic::PanicInfo) -> ! {
    let _ = write!(io::stderr(), "{}\n", info);
    syscall::exit(255);
}
}

But if we try to build the code we get yet another linker error about the missing memcpy function. No problem. We already expected that just didn’t know when it is going to come. So let’s put our memcpy implementation next to the memset in the ffi.rs file:

#![allow(unused)]
fn main() {
#[no_mangle]
unsafe fn memcpy(dst: *mut u8, src: *const u8, len: usize) -> *mut u8 {
    for idx in 0 .. len {
        let offset = idx.try_into().unwrap();
        unsafe { 
            let byte = src.offset(offset).read(); 
            dst.offset(offset).write(byte); 
        }
    }
    dst
}
}

Exceptions in Rust: https://github.com/rust-lang/rfcs/blob/master/text/1236-stabilize-catch-panic.md Last by not least we get an undefine reference error to rust_eh_personality TODO: what’s this?

#![allow(unused)]
#![feature(lang_items)]
#![allow(internal_features)]

fn main() {
#[lang = "eh_personality"]
fn rust_eh_personality() {}
}

The write macro is already a big improvement but we can go further. Let’s define two macros to print a text onto the stdout and stderr. The can be defined in the io.rs file.

#![allow(unused)]
fn main() {
#[macro_export]
macro_rules! print {
    ($fmt:literal $(,$($args:expr)*)?) => {{
        use core::fmt::Write;
        write!($crate::io::stdout(), $fmt, $($($args),*)?).unwrap();
    }}
}

#[macro_export]
macro_rules! println {
    ($fmt:literal $(,$($args:expr)*)?) => {{
        $crate::print!("{}\n", format_args!($fmt, $($($args),*)?))
    }}
}

#[macro_export]
macro_rules! eprint {
    ($fmt:literal $(,$($args:expr)*)?) => {{
        use core::fmt::Write;
        write!($crate::io::stderr(), $fmt, $($($args),*)?).unwrap();
    }}
}

#[macro_export]
macro_rules! eprintln {
    ($fmt:literal $(,$($args:expr)*)?) => {{
        $crate::eprint!("{}\n", format_args!($fmt, $($($args),*)?))
    }}
}
}

and the bin.rs like this: (Note the new #[macro_use] attribute on the extern linux crate)

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

#[no_mangle]
fn main() -> u8 { 
    print!("Hello");
    eprintln!(" {}", "world");
    0
}

File operations

  • open, close
  • stat, fstat, lstat, fstatat
  • fcntl, fsync, fdatasync
  • truncate, ftruncate, fallocate
  • lseek
  • seek, drop (close)
  • BufRead, BufWrite – prove with perf the many syscalls

Open and close

First of all we need to implement two syscall the open and the close to be able to work with files. If you lookup the manual page of open and close it says that the function signature look like:

int open(const char *path, int flags);
int close(int fd);

This should be quit simple to implement in Rust. Let’s add the following functions to our syscall.rs:

#![allow(unused)]
fn main() {
const SYS_OPEN: isize = 2;
const SYS_CLOSE: isize = 3;

#[no_mangle]
pub fn open(path: &str, flags: u64, mode: u64) -> Result<u32> {
    let rc = unsafe { syscall!(SYS_OPEN, path.as_ptr(), flags, mode) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(u32::try_from(rc).unwrap())
}

#[no_mangle]
pub fn close(fd: u32) -> Result<()> {
    let rc = unsafe { syscall!(SYS_CLOSE, fd) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}
}

And call them from the main function like this:

#[no_mangle]
fn main() -> u8 { 
    let fd = linux::syscall::open("./bin.rs", 0, 0).unwrap();
    linux::syscall::close(fd).unwrap();
    0
}

If try to run this code the following happens:

> ./cargo.sh run
panicked at ./bin.rs:10:50:
called `Result::unwrap()` on an `Err` value: ENAMETOOLONG

The error message is quite straighforward: The name of the file is too long. Heh? 8 character is too long? We have most likely messed something up. So how does the kernel determine the length of our string? It uses the strlen function which expects a string to be null terminated. As opposed to this the Rust str are not null terminated but it works as a byte slice. As a result the kernel does out of bound access on our str, so we just violated the rules of Rust and cause undefined behaviour and made the whole library unsound. Nice… We can prove it by adding a null byte into our str and letting the code run:

#[no_mangle]
fn main() -> u8 { 
    let fd = linux::syscall::open("./bin.rs\0", 0, 0).unwrap();
    linux::syscall::close(fd).unwrap();
    0
}
> ./cargo.sh run

Now seems to be all fine. But as the unsafe rules says: an unsafe block is only safe if it can not be called from safe code in a way that it causes undefined behaviour. This means that we can not expect the user to put a null at the end of a str every time a file needs to be opened. We have convert the rust str into a null terminated string. And there is a nice struct for it: CString. The only problem is that it is defined in the alloc crate which we don’t want to depend on. Let’s avoid implementing our own allocation primitives for now and simply use a stack array to build our null terminated string. So let’s rewrite our open function like this:

#![allow(unused)]
fn main() {
#[no_mangle]
pub fn open(path: &str, flags: u64, mode: u64) -> Result<u32> {
    let mut dst = [0u8;crate::limits::PATH_MAX];
    let src = path.as_bytes();

    if src.len() >= crate::limits::PATH_MAX {
        return Err(Error::ENAMETOOLONG);
    }

    for idx in 0 .. src.len() {
        dst[idx] = src[idx];
    }

    let rc = unsafe { syscall!(SYS_OPEN, dst.as_ptr(), flags, mode) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(u32::try_from(rc).unwrap())
}
}

There are a couple of limits defined in the linux kernel. For example here To conform these limits, we include a module to the linux.rs with pub mod limits; and also create a file called limits.rs with the content of

#![allow(unused)]
fn main() {
pub const PATH_MAX: usize = 4096;
}

After that we can remove the \0 termination from our str and it should just work now:

> ./cargo.sh run

Let’s define the options for the open syscall: You can the options in the fcntl.h And the opening mode flages in the stat.h We can simply add these values into the syscall.rs file:

#![allow(unused)]
fn main() {
pub const O_ACCMODE:   u64 = 0o0000003;
pub const O_RDONLY:    u64 = 0o0000000;
pub const O_WRONLY:    u64 = 0o0000001;
pub const O_RDWR:      u64 = 0o0000002;
pub const O_CREAT:     u64 = 0o0000100;
pub const O_EXCL:      u64 = 0o0000200;
pub const O_NOCTTY:    u64 = 0o0000400;
pub const O_TRUNC:     u64 = 0o0001000;
pub const O_APPEND:    u64 = 0o0002000;
pub const O_NONBLOCK:  u64 = 0o0004000;
pub const O_DSYNC:     u64 = 0o0010000;
pub const O_DIRECT:    u64 = 0o0040000;
pub const O_LARGEFILE: u64 = 0o0100000;
pub const O_DIRECTORY: u64 = 0o0200000;
pub const O_NOFOLLOW:  u64 = 0o0400000;
pub const O_NOATIME:   u64 = 0o1000000;
pub const O_CLOEXEC:   u64 = 0o2000000;
pub const O_SYNC:      u64 = 0o4000000;
pub const O_PATH:      u64 = 0o10000000;
pub const O_TMPFILE:   u64 = 0o20000000;
pub const O_NDELAY:    u64 = O_NONBLOCK;

pub const S_IRWXU: u64 = 0o700; // RWX mask for owner
pub const S_IRUSR: u64 = 0o400; // R for ownwer
pub const S_IWUSR: u64 = 0o200; // W for ownwer
pub const S_IXUSR: u64 = 0o100; // X for ownwer

pub const S_IRWXG: u64 = 0o070; // RWX for group
pub const S_IRGRP: u64 = 0o040; // R for group
pub const S_IWGRP: u64 = 0o020; // W for group
pub const S_IXGRP: u64 = 0o010; // X for group

pub const S_IRWXO: u64 = 0o007; // RWX for other
pub const S_IROTH: u64 = 0o004; // R for other
pub const S_IWOTH: u64 = 0o002; // W for other
pub const S_IXOTH: u64 = 0o001; // X for other
}

So we can have a basic file handling functionality:

#[no_mangle]
fn main() -> u8 { 
    use linux::syscall::*;
    let fd = open("hello.txt", O_CREAT|O_RDWR|O_DSYNC, S_IRUSR|S_IWUSR).unwrap();
    write(fd, b"hello world\n").unwrap();
    close(fd).unwrap();
    0
}

And we can run it like this:

> ./cargo.sh run
> cat hello.txt
hello world

stat, fstat, lstat

The C wrapper of the stat and fstat syscalls look like this:

int stat(const char *pathname, struct stat *statbuf);
int fstat(int fd, struct stat *statbuf);

In it’s quite common to create a struct on the stack and pass it into a function as a pointer. The function initializes the struct and after that we can use it. It makes a lot of sense because so we can use the return value as an error type. Zero means typically that the function succeeded while something else means typically an error. As opposed to this we have Result types in Rust. So would be better to create the stat struct on the stack of the syscall wrapper and give it back as Ok(stat) in case of success? To find out let’s implement two versions of this function:

#![allow(unused)]
fn main() {
const SYS_FSTAT: isize = 5;

#[repr(C)]
#[derive(Debug, Default)]
pub struct stat64 {
    pub st_dev: u64,
    pub st_ino: u64,
    pub st_nlink: u64,
    pub st_mode: u32,
    pub st_uid: u32,
    pub st_gid: u32,
    __pad0: i32,
    pub st_rdev: u64,
    pub st_size: i64,
    pub st_blksize: i64,
    pub st_blocks: i64,
    pub st_atime: i64,
    pub st_atime_nsec: i64,
    pub st_mtime: i64,
    pub st_mtime_nsec: i64,
    pub st_ctime: i64,
    pub st_ctime_nsec: i64,
    __reserved: [i64; 3],
}

#[no_mangle]
pub fn fstat1(fd: u32, stat: &mut stat64) -> Result<()> {
    let rc = unsafe { syscall!(SYS_FSTAT, fd, stat) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}

#[no_mangle]
pub fn fstat2(fd: u32) -> Result<stat64> {
    let mut stat = stat64::default();
    let rc = unsafe { syscall!(SYS_FSTAT, fd, &mut stat) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(stat)
}
}

If we build the code and dump the assembly it’s easy to see the difference between the two functions:

> ./cargo.sh build

> ./cargo.sh dump fstat1
0000000000401f70 <fstat1>:
  401f70:           55                          push   rbp
  401f71:           48 89 e5                    mov    rbp,rsp
  401f74:           b8 05 00 00 00              mov    eax,0x5
  401f79:           0f 05                       syscall
  401f7b:           48 85 c0                    test   rax,rax
  401f7e:       /-- 78 04                       js     401f84 <fstat1+0x14>
  401f80:       |   31 c0                       xor    eax,eax
  401f82:       |   5d                          pop    rbp
  401f83:       |   c3                          ret
  401f84:       \-> 48 f7 d8                    neg    rax
  401f87:           48 89 c7                    mov    rdi,rax
  401f8a:           5d                          pop    rbp
  401f8b:           ff 25 d7 4f 00 00           jmp    QWORD PTR [rip+0x4fd7]        # 406f68 <_GLOBAL_OFFSET_TABLE_+0x20>

> ./cargo.sh dump fstat2
0000000000401fa0 <fstat2>:
  401fa0:              55                       push   rbp
  401fa1:              48 89 e5                 mov    rbp,rsp
  401fa4:              53                       push   rbx
  401fa5:              48 81 ec 98 00 00 00     sub    rsp,0x98
  401fac:              89 f1                    mov    ecx,esi
  401fae:              48 89 fb                 mov    rbx,rdi
  401fb1:              0f 57 c0                 xorps  xmm0,xmm0
  401fb4:              0f 29 45 e0              movaps XMMWORD PTR [rbp-0x20],xmm0
  401fb8:              0f 29 45 d0              movaps XMMWORD PTR [rbp-0x30],xmm0
  401fbc:              0f 29 45 c0              movaps XMMWORD PTR [rbp-0x40],xmm0
  401fc0:              0f 29 45 b0              movaps XMMWORD PTR [rbp-0x50],xmm0
  401fc4:              0f 29 45 a0              movaps XMMWORD PTR [rbp-0x60],xmm0
  401fc8:              0f 29 45 90              movaps XMMWORD PTR [rbp-0x70],xmm0
  401fcc:              0f 29 45 80              movaps XMMWORD PTR [rbp-0x80],xmm0
  401fd0:              0f 29 85 70 ff ff ff     movaps XMMWORD PTR [rbp-0x90],xmm0
  401fd7:              0f 29 85 60 ff ff ff     movaps XMMWORD PTR [rbp-0xa0],xmm0
  401fde:              48 8d b5 60 ff ff ff     lea    rsi,[rbp-0xa0]
  401fe5:              b8 05 00 00 00           mov    eax,0x5
  401fea:              89 cf                    mov    edi,ecx
  401fec:              0f 05                    syscall
  401fee:              48 85 c0                 test   rax,rax
  401ff1:       /----- 78 13                    js     402006 <fstat2+0x66>
  401ff3:       |      48 8d 7b 08              lea    rdi,[rbx+0x8]
  401ff7:       |      ba 90 00 00 00           mov    edx,0x90
  401ffc:       |      ff 15 6e 4f 00 00        call   QWORD PTR [rip+0x4f6e]        # 406f70 <_GLOBAL_OFFSET_TABLE_+0x28>
  402002:       |      31 c0                    xor    eax,eax
  402004:       |  /-- eb 11                    jmp    402017 <fstat2+0x77>
  402006:       \--|-> 48 f7 d8                 neg    rax
  402009:          |   48 89 c7                 mov    rdi,rax
  40200c:          |   ff 15 56 4f 00 00        call   QWORD PTR [rip+0x4f56]        # 406f68 <_GLOBAL_OFFSET_TABLE_+0x20>
  402012:          |   88 43 01                 mov    BYTE PTR [rbx+0x1],al
  402015:          |   b0 01                    mov    al,0x1
  402017:          \-> 88 03                    mov    BYTE PTR [rbx],al
  402019:              48 89 d8                 mov    rax,rbx
  40201c:              48 81 c4 98 00 00 00     add    rsp,0x98
  402023:              5b                       pop    rbx
  402024:              5d                       pop    rbp
  402025:              c3                       ret

The second version of fstat is more thant twice as long as the first. But is it enough to throw it away? To be able to answer the question we have to go a bit deeper in the code of fstat2 and analyse what’s actually happening here

After aligning the satck (push rbx) we reserve 0x90 byte space on the stack for the stat64 struct. This space has to be zerod out and to make it fast the compiler zeros out the xmm0 SIMD register and uses it to copy zeros on the stack.

  401fa0:              55                       push   rbp
  401fa1:              48 89 e5                 mov    rbp,rsp
  401fa4:              53                       push   rbx
  401fa5:              48 81 ec 98 00 00 00     sub    rsp,0x98
  401fac:              89 f1                    mov    ecx,esi
  401fae:              48 89 fb                 mov    rbx,rdi
  401fb1:              0f 57 c0                 xorps  xmm0,xmm0
  401fb4:              0f 29 45 e0              movaps XMMWORD PTR [rbp-0x20],xmm0
  401fb8:              0f 29 45 d0              movaps XMMWORD PTR [rbp-0x30],xmm0
  401fbc:              0f 29 45 c0              movaps XMMWORD PTR [rbp-0x40],xmm0
  401fc0:              0f 29 45 b0              movaps XMMWORD PTR [rbp-0x50],xmm0
  401fc4:              0f 29 45 a0              movaps XMMWORD PTR [rbp-0x60],xmm0
  401fc8:              0f 29 45 90              movaps XMMWORD PTR [rbp-0x70],xmm0
  401fcc:              0f 29 45 80              movaps XMMWORD PTR [rbp-0x80],xmm0
  401fd0:              0f 29 85 70 ff ff ff     movaps XMMWORD PTR [rbp-0x90],xmm0
  401fd7:              0f 29 85 60 ff ff ff     movaps XMMWORD PTR [rbp-0xa0],xmm0

Once we have initialized the struct we have to pass it together with the fd to the syscall

  401fde:              48 8d b5 60 ff ff ff     lea    rsi,[rbp-0xa0]
  401fe5:              b8 05 00 00 00           mov    eax,0x5
  401fea:              89 cf                    mov    edi,ecx
  401fec:              0f 05                    syscall

We check the return code of the syscall and if it’s not zero we jump forward to the error handling (401e63)

  401fee:              48 85 c0                 test   rax,rax
  401ff1:       /----- 78 13                    js     402006 <fstat2+0x66>

If the return code was zero call memcpy. The paramters are rdi (dst) which is calculated from rbx, rsi (src) which is the stat64 struct on the current function and edx (len) which is the size of the stat64 struct. So question is where do we copy the initialized struct? If you look the first section of this code it says mov rbx,rdi which is kind of interesting because rdi is used for the first parameter of the function calls which should be the filedescriptor in this case. Let’s investigate that in gdb (see bellow).

  401ff3:       |      48 8d 7b 08              lea    rdi,[rbx+0x8]
  401ff7:       |      ba 90 00 00 00           mov    edx,0x90
  401ffc:       |      ff 15 6e 4f 00 00        call   QWORD PTR [rip+0x4f6e]        # 406f70 <_GLOBAL_OFFSET_TABLE_+0x28>
  402002:       |      31 c0                    xor    eax,eax
  402004:       |  /-- eb 11                    jmp    402017 <fstat2+0x77>

Do the error handling here

  402006:       \--|-> 48 f7 d8                 neg    rax
  402009:          |   48 89 c7                 mov    rdi,rax
  40200c:          |   ff 15 56 4f 00 00        call   QWORD PTR [rip+0x4f56]        # 406f68 <_GLOBAL_OFFSET_TABLE_+0x20>
  402012:          |   88 43 01                 mov    BYTE PTR [rbx+0x1],al
  402015:          |   b0 01                    mov    al,0x1

Teardown the function and return with Result<stat64>. Release the 0x90 bytes and the extra 8 alignment byte from the stack and return to the caller function.

  402017:          \-> 88 03                    mov    BYTE PTR [rbx],al
  402019:              48 89 d8                 mov    rax,rbx
  40201c:              48 81 c4 98 00 00 00     add    rsp,0x98
  402023:              5b                       pop    rbx
  402024:              5d                       pop    rbp
  402025:              c3                       ret
> gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) break fstat2
(gdb) run
Breakpoint 1, linux::syscall::{impl#1}::default () at syscall.rs:52
52      #[derive(Debug, Default)]
(gdb) disassemble
Dump of assembler code for function linux::syscall::fstat2:
   0x0000000000401fa0 <+0>:     push   rbp
   0x0000000000401fa1 <+1>:     mov    rbp,rsp
   0x0000000000401fa4 <+4>:     push   rbx
   0x0000000000401fa5 <+5>:     sub    rsp,0x98
   0x0000000000401fac <+12>:    mov    ecx,esi
   0x0000000000401fae <+14>:    mov    rbx,rdi
=> 0x0000000000401fb1 <+17>:    xorps  xmm0,xmm0
...
(gdb) info registers esi rdi
esi            0x3                 3
rdi            0x7fffffffe7f8      140737488349176

Something is definitelly weird. The esi (alias rsi) which should contain the second parameter of the function is set to the filedescriptor (3) and the rdi has some random address in it. But the fstat2 doesn’t even have two parameters… So what’s happening here? If we look up the 3.2.3 Parameter Passing chapter of the System V ABI and scroll down to the “Returning of Values” section it has an interesting point:

If the type has class MEMORY, then the caller provides space for the return value and passes the address of this storage in rdi as if it were the first argument to the function. In effect, this address becomes a hidden first argument. This storage must not overlap any data visible to the callee through other names than this argument. On return %rax will contain the address that has been passed in by the caller in %rdi

So we could summarize the call to the two fstat functions as follows:

#![allow(unused)]
fn main() {
pub fn fstat1(fd: u32, stat: &mut stat64) -> Result<()>;
}
  1. The caller reserves space for stat64
  2. The caller zeros out stat64
  3. fstat updates stat64
  4. fstat returns the result
#![allow(unused)]
fn main() {
pub fn fstat2(fd: u32) -> Result<stat64>;
}
  1. The caller reserves space for the first stat64
  2. fstat reserves space for the second stat64
  3. fstat zeros out the second stat64
  4. fstat updates the second stat64
  5. fstat overwrites the first stat64 with the second stat64
  6. fstat returns the result

Beside the fact that the fstat1 function is much more lightweight (no extra allocation + memcpy) we can also reuse the stat64 struct in case of checking multiple files. So we don’t have to reintialize it over and over again, which took at least 10 instruction long. As a conclusion let’s drop the fstat2 function rename fstat1 to fstat. Similarly we can also implement stat and lstat as follows

#![allow(unused)]
fn main() {
const SYS_STAT: isize = 4;
const SYS_FSTAT: isize = 5;
const SYS_LSTAT: isize = 6;

#[no_mangle]
pub fn stat(path: &str, stat: &mut stat64) -> Result<()> {
    let mut dst = [0u8;crate::limits::PATH_MAX];
    cpath(path.as_bytes(), &mut dst)?;

    let rc = unsafe { syscall!(SYS_STAT, dst.as_ptr(), stat) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}

#[no_mangle]
pub fn fstat(fd: u32, stat: &mut stat64) -> Result<()> {
    let rc = unsafe { syscall!(SYS_FSTAT, fd, stat) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}

#[no_mangle]
pub fn lstat(path: &str, stat: &mut stat64) -> Result<()> {
    let mut dst = [0u8;crate::limits::PATH_MAX];
    cpath(path.as_bytes(), &mut dst)?;

    let rc = unsafe { syscall!(SYS_LSTAT, dst.as_ptr(), stat) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}
}

And now we can use them in the main function like this:

#[no_mangle]
fn main() -> u8 { 
    let fd = linux::syscall::open("hello", 0, 0).unwrap();
    let mut stat = linux::syscall::stat64::default();
    linux::syscall::fstat(fd, &mut stat).unwrap();
    println!("{:#?}", stat);
    0
}

The result should look something like this:

> ./cargo.sh run
stat64 {
    st_dev: 64768,
    st_ino: 940171,
    st_nlink: 1,
    st_mode: 33188,
    st_uid: 1066219479,
    st_gid: 1068570817,
    __pad0: 0,
    st_rdev: 0,
    st_size: 9,
    st_blksize: 4096,
    st_blocks: 8,
    st_atime: 1719926584,
    st_atime_nsec: 93534043,
    st_mtime: 1719926583,
    st_mtime_nsec: 457537512,
    st_ctime: 1719926583,
    st_ctime_nsec: 457537512,
    __reserved: [
        0,
        0,
        0,
    ],
}

truncate, ftruncate, fallocate

To set the size of a file we can use the truncate and allocate syscall family. Let’s implement these syscalls in syscall.rs:

#![allow(unused)]
fn main() {
const SYS_TRUNCATE: isize = 76;
const SYS_FTRUNCATE: isize = 77;
const SYS_FALLOCATE: isize = 285;

#[no_mangle]
pub fn truncate(path: &str, len: u64) -> Result<()> {
    let mut dst = [0u8;crate::limits::PATH_MAX];
    cpath(path.as_bytes(), &mut dst)?;

    let rc = unsafe { syscall!(SYS_TRUNCATE, dst.as_ptr(), len) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}

#[no_mangle]
pub fn ftruncate(fd: u32, len: u64) -> Result<()> {
    let rc = unsafe { syscall!(SYS_FTRUNCATE, fd, len) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}

#[no_mangle]
pub fn fallocate(fd: u32, mode: u32, offset: u64, len: u64) -> Result<()> {
    let rc = unsafe { syscall!(SYS_FALLOCATE, fd, mode, offset, len) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}
}

We can use them like this:

#[no_mangle]
fn main() -> u8 { 
    use linux::syscall::*;
    let mut stat = stat64::default();
    let fd = open("buffer", O_CREAT|O_APPEND|O_RDWR, S_IRWXU).unwrap();

    fallocate(fd, 0, 0, 1024).unwrap();
    fstat(fd, &mut stat).unwrap();
    println!("size: {}", stat.st_size);

    ftruncate(fd, 512).unwrap();
    fstat(fd, &mut stat).unwrap();
    println!("size: {}", stat.st_size);

    close(fd).unwrap();
    0
}

So the result is:

> ./cargo.sh run
size: 1024
size: 512

fsync, fdatasync

#![allow(unused)]
fn main() {
const SYS_FSYNC: isize = 74;
const SYS_FDATASYNC: isize = 75;

#[no_mangle]
pub fn fsync(fd: u32) -> Result<()> {
    let rc = unsafe { syscall!(SYS_FSYNC, fd) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}

#[no_mangle]
pub fn fdatasync(fd: u32) -> Result<()> {
    let rc = unsafe { syscall!(SYS_FDATASYNC, fd) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(())
}
}

lseek

The lseek sysall can be used to modify the cursor of the current file: The options can be found here

#![allow(unused)]
fn main() {
const SYS_LSEEK: isize = 8;

#[no_mangle]
pub fn lseek(fd: u32, offset: u64, whence: i32) -> Result<u64> {
    let rc = unsafe { syscall!(SYS_LSEEK, fd, offset, whence) };

    if rc < 0 {
        return Err(Error::from(rc * -1))
    } 

    Ok(u64::try_from(rc).unwrap())
}
}

The main function should look like this:

#[no_mangle]
fn main() -> u8 { 
    use linux::syscall::*;
    let fd = open("buffer", O_CREAT|O_APPEND|O_RDWR, S_IRWXU).unwrap();
    fallocate(fd, 0, 0, 1024).unwrap();

    let pos = lseek(fd, 512, SEEK_SET).unwrap();
    println!("Cursor position: {}", pos);
    read(0, &mut [0u8]).unwrap();

    close(fd).unwrap();
    0
}

Let’s start our program like and let it block on the read syscall

> ./cargo.sh run
Cursor position: 512

We can check the status of the file in the proc filesystem like this:

> cat /proc/$(pidof bin)/fdinfo/3
pos:    512
flags:  0102002
mnt_id: 30
ino:    940180

Memory

The .text section

Let’s create a small binary without much bloat and checkout its memory footprint

section .text
global _start
_start:
    mov rax,0x22
    syscall

All it does is providing an entry point to the process and pauses the execution by calling the pause system call. We can compile, link, run and display the memory of it as follows:

> nasm -f elf64 main.s && ld ./main.o && strip -s ./a.out
> ./a.out & cat /proc/$!/maps
00400000-00401000                  r--p  00000000  fd:00  940117  /a.out
00401000-00402000                  r-xp  00001000  fd:00  940117  /a.out
7ffc92d30000-7ffc92d51000          rw-p  00000000  00:00  0       [stack]
7ffc92d51000-7ffc92d55000          r--p  00000000  00:00  0       [vvar]
7ffc92d55000-7ffc92d57000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

If you’re unfamiliar with this syntax: $! is a bash variable and it contains the process id of the last started process. With & our process goes into the background so we can use the same terminal to print the memory mappings of it which are expressed by the kernel at the location /proc/<pid>/maps as a simple file.

The columns above have the following values:

  1. memory address range
  2. permissions (r=read, w=write, x=exec, p=private, s=shared)
  3. file offset (only if the mapping is file-backed)
  4. device id (major:minor)
  5. inode id
  6. either the file name or some human readable identifyer of the memory range

The first two lines in the mapping shows us how our binary was mapped:

  1. 00400000-00401000: the elf header can be found in this read-only region
  2. 00401000-00402000: this is the .text section of our binary which contains the code to be executed

We can see something similar if we look at the section headers in the file too:

> readelf -W -S ./a.out
Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .text     PROGBITS 0000000000401000 001000 000007 00  AX  0   0 16
  [ 2] .shstrtab STRTAB   0000000000000000 001007 000011 00      0   0  1

The .data section

Let’s create another section in our binary the .data by adding some initialized data to it:

section .data
    db "Hello world"
section .text
global _start
_start:
    mov rax,34
    syscall

If we now run our program we see an extra line about the data section

> nasm -f elf64 main.s && ld ./main.o && strip -s ./a.out
> ./a.out & cat /proc/$!/maps
00400000-00401000                  r--p  00000000  fd:00  940117  /a.out
00401000-00402000                  r-xp  00001000  fd:00  940117  /a.out
00402000-00403000                  rw-p  00002000  fd:00  940117  /a.out
7ffd711d1000-7ffd711f2000          rw-p  00000000  00:00  0       [stack]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

As we can see the 00402000-00403000 section is read-write enabled but it can not be executed.

> readelf -W -S ./a.out
Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .text     PROGBITS 0000000000401000 001000 000007 00  AX  0   0 16
  [ 2] .data     PROGBITS 0000000000402000 002000 00000b 00  WA  0   0  4
  [ 3] .shstrtab STRTAB   0000000000000000 00200b 000017 00      0   0  1

The .rodata section

Let’s create another section in our binary the .rodata by adding some initialized read-only data to it:

section .rodata
    db "Hello world"
section .text
global _start
_start:
    mov rax,34
    syscall

If we now run our program we see an extra line about the rodata section which is mapped as r--p now.

> nasm -f elf64 main.s && ld ./main.o && strip -s ./a.out
> ./a.out & cat /proc/$!/maps
00400000-00401000                  r--p  00000000  fd:00  940149  /a.out
00401000-00402000                  r-xp  00001000  fd:00  940149  /a.out
00402000-00403000                  r--p  00002000  fd:00  940149  /a.out
7ffc213e4000-7ffc21405000          rw-p  00000000  00:00  0       [stack]
7ffc21558000-7ffc2155c000          r--p  00000000  00:00  0       [vvar]
7ffc2155c000-7ffc2155e000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

The elf file looks like this:

> readelf -W -S ./a.out
Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .text     PROGBITS 0000000000401000 001000 000007 00  AX  0   0 16
  [ 2] .rodata   PROGBITS 0000000000402000 002000 00000b 00   A  0   0  4
  [ 3] .shstrtab STRTAB   0000000000000000 00200b 000019 00      0   0  1

The .bss section

To reserve some extra space we can use during the execution of the process we can use the .bss section

section .bss
    resq 1024

section .text
global _start
_start:
    mov rax,34
    syscall

This creates a buffer which will be initialized with zeros at the startup of the process but it doesn’t take up space in the binary itself. We can see this section maped as reas-write too right under the .data section. Since it’s only logically defined by the executable the new line doesn’t show the relation to the elf file.

> nasm -f elf64 main.s && ld ./main.o && strip -s ./a.out
> ./a.out & cat /proc/$!/maps
00400000-00401000                  r--p  00000000  fd:00  940150  /a.out
00401000-00402000                  r-xp  00001000  fd:00  940150  /a.out
00403000-00405000                  rw-p  00000000  00:00  0       
7ffc17944000-7ffc17965000          rw-p  00000000  00:00  0       [stack]
7ffc179c2000-7ffc179c6000          r--p  00000000  00:00  0       [vvar]
7ffc179c6000-7ffc179c8000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

And the elf file:

> readelf -W -S ./a.out
Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .text     PROGBITS 0000000000401000 001000 000007 00  AX  0   0 16
  [ 2] .bss      NOBITS   0000000000402000 002000 002000 00  WA  0   0  4
  [ 3] .shstrtab STRTAB   0000000000000000 001007 000016 00      0   0  1

The heap

Let’s reserve another type of memory. For the heap allocation we need to ask the kernel to move the break point of the process a bit higher. There is a system call for that called brk(). If it is called with 0 as argument it returns the current break point of the process and it if it’s called with a valid address it will be set as the new breakpoint. The assembly code looks like this:

section .text
global _start
_start:
    ; old =  brk(0);
    mov rdi,0x0
    mov rax,0xc
    syscall

    ; new = brk(old + 0x1000)
    add rax,0x1000
    mov rdi,rax
    mov rax,0xc
    syscall

    ; pause()
    mov rax,34
    syscall

If we execute see a new line again called [heap]. Similarly to the .data and .bss sections it is also mapped into the low address region of the virtual address space but differently from them the size of it can be changed. It grows towards the high memory address region.

> nasm -f elf64 main.s && ld ./main.o && strip -s ./a.out
> ./a.out & cat /proc/$!/maps
00400000-00401000                  r--p  00000000  fd:00  940155  /a.out
00401000-00402000                  r-xp  00001000  fd:00  940155  /a.out
009ca000-009cb000                  rw-p  00000000  00:00  0       [heap]
7ffd483c0000-7ffd483e1000          rw-p  00000000  00:00  0       [stack]
7ffd483e4000-7ffd483e8000          r--p  00000000  00:00  0       [vvar]
7ffd483e8000-7ffd483ea000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

The elf

> readelf -W -S ./a.out
Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .text     PROGBITS 0000000000401000 001000 000023 00  AX  0   0 16
  [ 2] .shstrtab STRTAB   0000000000000000 001023 000011 00      0   0  1

The stack

We can also change the size of the stack but I don’t know how….

The vdso, vvar and vsyscall

The v in the name of these sections means “virtual”. The vdso section is a dynamic library mapped by the kernel into the address space of the process and it allows to call some system calls with faster execution time. Since the call of these functions doesn’t require a context switch like a normal system call it can provide a significante performance improvement to our program. Let’s dump the content of it to check the available symboles. We need our pause program again:

section .text
global _start
_start:
    mov rax,34
    syscall

Let’s check the location of the vdso on the usual way:

> nasm -f elf64 main.s && ld ./main.o && strip -s ./a.out
> ./a.out & pid=$!; cat /proc/$pid/maps
00400000-00401000                  r--p  00000000  fd:00  940149  /a.out
00401000-00402000                  r-xp  00001000  fd:00  940149  /a.out
00402000-00403000                  r--p  00002000  fd:00  940149  /a.out
7ffd2023d000-7ffd2025e000          rw-p  00000000  00:00  0       [stack]
7ffd203ed000-7ffd203f1000          r--p  00000000  00:00  0       [vvar]
7ffd203f1000-7ffd203f3000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

Once we know the start address (0x7ffd203f1000) and the length (0x7ffd203f3000 - 0x7ffd203f1000) of the vdso section we can use the dd command to dump the content of it. Note that we need root access to do this.

sudo dd if=/proc/$pid/mem of=vdso bs=1 skip=$((0x7ffd203f1000)) count=$((0x7ffd203f3000 - 0x7ffd203f1000))

After that we can analyse it just like any ather shared object files:

> readelf -W -s ./vdso
Symbol table '.dynsym' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000c10     5 FUNC    WEAK   DEFAULT   11 clock_gettime@@LINUX_2.6
     2: 0000000000000bd0     5 FUNC    GLOBAL DEFAULT   11 __vdso_gettimeofday@@LINUX_2.6
     3: 0000000000000c20    99 FUNC    WEAK   DEFAULT   11 clock_getres@@LINUX_2.6
     4: 0000000000000c20    99 FUNC    GLOBAL DEFAULT   11 __vdso_clock_getres@@LINUX_2.6
     5: 0000000000000bd0     5 FUNC    WEAK   DEFAULT   11 gettimeofday@@LINUX_2.6
     6: 0000000000000be0    42 FUNC    GLOBAL DEFAULT   11 __vdso_time@@LINUX_2.6
     7: 0000000000000cc0   157 FUNC    GLOBAL DEFAULT   11 __vdso_sgx_enter_enclave@@LINUX_2.6
     8: 0000000000000be0    42 FUNC    WEAK   DEFAULT   11 time@@LINUX_2.6
     9: 0000000000000c10     5 FUNC    GLOBAL DEFAULT   11 __vdso_clock_gettime@@LINUX_2.6
    10: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
    11: 0000000000000c90    38 FUNC    GLOBAL DEFAULT   11 __vdso_getcpu@@LINUX_2.6
    12: 0000000000000c90    38 FUNC    WEAK   DEFAULT   11 getcpu@@LINUX_2.6

Stack

Stack overflow

The common source of stack overflows is recursive functions which never returns:

Alignment

Now that we have implemented a couple of handy helper methods we can go back to the question from chapter one: What is the Options(nostack) in the _start assembly block used for. Let’s put a panic into the main function:

#[no_mangle]
fn main() -> u8 { 
    painc!();
}

and execute the program

> ./cargo.sh build
> ./target/bin
panicked at ./bin.rs:9:5:
explicit panic

It all looks fine, right? But what happens if you remove the nostack option from the assembly block of the _start function?

> ./cargo.sh build
> ./target/bin
Segmentation fault (core dumped)

The process crashes with segfault. Let’s analyse that in gdb:

gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) run
Starting program: /home/taabodal/work/blog/src/chapter-02/target/bin

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402557 in rust_begin_unwind ()

(gdb) disassemble
Dump of assembler code for function rust_begin_unwind:
   ...
   0x0000000000402552 <+50>:    movups xmm0,XMMWORD PTR [rsp+0x78]
=> 0x0000000000402557 <+55>:    movaps XMMWORD PTR [rsp+0x60],xmm0
   0x000000000040255c <+60>:    movaps xmm0,XMMWORD PTR [rsp+0x60]
   0x0000000000402561 <+65>:    movaps XMMWORD PTR [rsp+0x50],xmm0
   ...
End of assembler dump.

From the output above I removed some lines to make it easier to digest. The process crashes at the instruction movaps XMMWORD PTR [rsp+0x60],xmm0. The line above moveups seems to be quiet similar but it doesn’t crashes. Let’s lookup what these instructions are doing:

  • movaps: Move Aligned Packed Single Precision Floating-Point Values
  • movups: Move Unaligned Packed Single Precision Floating-Point Values

The key difference between these two is alignment of the memory address. While movups doesn’t expect any alignment of the memory address the movaps expects that it is 16/32/64 byte aligned:

When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version), 32-byte (VEX.256 encoded version) or 64-byte (EVEX.512 encoded version) boundary or a general-protection exception (#GP) will be generated.

The instruction which crashes process uses [rsp+0x60] as memory address. 0x60 is 16 byte aligned but what is the value of the rsp register? Let’s go back to gdb and print the current value of the register with

(gdb) info registers rsp
rsp            0x7fffffffe828      0x7fffffffe828

It seems like we have found the reason: The value of rsp is not 16 byte aligned so [rsp+0x60] wont be 16 byte aligned either which causes the processor to throw a general-protection exception.

That’s all nice but if the aligment of the memory address is so important then why does the compiler not check if rsp is in good state before calling movaps? As always the System V ABI has the answer for this question. In the section 3.2.2 The Stack Frame it says:

The end of the input argument area shall be aligned on a 16 byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 when control is transferred to the function entry point.

Since it’s documented in the ABIs calling convention the compiler can asume that before a function is called the rsp is 16 byte aligned. So if it doesn’t do stack operation which misaligns the stack it should remain 16 byte aligned. Let’s go back gdb and check the stack alignment throught of our process:

> gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) break _start
(gdb) break rust_begin_unwind
(gdb) run
Breakpoint 1, 0x0000000000402500 in _start ()

(gdb) info registers rsp
rsp            0x7fffffffe960      0x7fffffffe960

(gdb) continue
Breakpoint 2, 0x0000000000402520 in rust_begin_unwind ()

(gdb) info registers rsp
rsp            0x7fffffffe8b0      0x7fffffffe8b0

(gdb) disassemble
Dump of assembler code for function rust_begin_unwind:
=> 0x0000000000402520 <+0>:     sub    rsp,0x88
   0x0000000000402527 <+7>:     mov    QWORD PTR [rsp+0x10],rdi
   ...

As wee can see the rsp register is 16 byte aligned at the beginning of both, the _start and rust_begin_unwind functions. The problem seems to be comming after that: the first instruction of the rust_begin_unwind function substract 0x88 from the stack pointer which becomes unaligned this way. But why does it do that if it knows that movaps needs 16 byte alignment?

The reason for that is that the rsp has to be 16 byte aligned before the call instruction is executed. Since call instruction pushes the return current value of the instruction pointer (rip) onto the stack which is 8 byte long the compiler needs to compensate this as the first step of every function call. So the sub rsp,0x88 should actually make the rsp 16 byte aligned again which means that it wasn’t aligned at all wenn the rust_begin_unwind function was started. To find out when did it get misaligned we need to go up on the stack frames and check the rsp registers. Let’s see how does the stackframes look like:

(gdb) backtrace
#0  0x0000000000402520 in rust_begin_unwind ()
#1  0x0000000000401033 in core::panicking::panic_fmt () at library/core/src/panicking.rs:72
#2  0x00000000004010dc in core::panicking::panic () at library/core/src/panicking.rs:146
#3  0x000000000040128d in main ()

(gdb) up
#1  0x0000000000401033 in core::panicking::panic_fmt () at library/core/src/panicking.rs:72
72      in library/core/src/panicking.rs

(gdb) info registers rsp
rsp            0x7fffffffe8b8      0x7fffffffe8b8

(gdb) up
#2  0x00000000004010dc in core::panicking::panic () at library/core/src/panicking.rs:146
146     in library/core/src/panicking.rs

(gdb) info registers rsp
rsp            0x7fffffffe8f8      0x7fffffffe8f8

(gdb) up
#3  0x000000000040128d in main ()

(gdb) info registers rsp
rsp            0x7fffffffe948      0x7fffffffe948

If if go up on the stackframes we can checkout the value of registers right before call instruction. The bad news is that the rsp seems to be misaligned already in the main function. This means that the whole code is corrupted. Since main function is called from our _start function let’s invesigate that one:

> gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) break _start
(gdb) run
(gdb) disassemble
Dump of assembler code for function _start:
=> 0x0000000000402500 <+0>:     push   rax
   0x0000000000402501 <+1>:     call   0x401270 <main>
   0x0000000000402506 <+6>:     mov    rdi,rax
   0x0000000000402509 <+9>:     mov    rax,0x3c
   0x0000000000402510 <+16>:    syscall
   0x0000000000402512 <+18>:    ud2

We seems to have the same construct here. The first instruction of the function push rax realigns the stack after that the main will be called. The only difference is since the _start function is the entry point of our code it has never been called and as such this is the only function which is started with the stack 16 byte aligned. As a result the first instruction which was meant to compensate the misalignment of the stack will be the reason of the misalignment of it.

So let’s get back to the options(nostack). The documentation says:

The asm! block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is not used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.

If we compile and dump the _start function with nostack option enabled then we get the working assembly code:

> ./cargo.sh build
> ./cargo.sh dump _start
0000000000402500 <_start>:
  402500:       e8 6b ed ff ff          call   401270 <main>
  402505:       48 89 c7                mov    rdi,rax
  402508:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  40250f:       0f 05                   syscall
  402511:       0f 0b                   ud2

and without this option we get the crashing assembly code:

> ./cargo.sh build
> ./cargo.sh dump _start
0000000000402500 <_start>:
  402500:       50                      push   rax
  402501:       e8 6a ed ff ff          call   401270 <main>
  402506:       48 89 c7                mov    rdi,rax
  402509:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  402510:       0f 05                   syscall
  402512:       0f 0b                   ud2

So what’s here happening, isn’t it exactly the opposite of what the documentation says? And the answer is no. The compiler guaranties the stack alignment the right way in case of a function call. But no knowledge about that the _start code section never gets called. It thinks that it’s a function just like any other. We can prove this by moving the call main outside of the assembly block. It will generate the push rax even if the assembly block has the nostack option enabled.

#[no_mangle]
fn _start() -> ! {
    extern "C" { fn main() -> u8; } 
    unsafe { main(); }
    unsafe {
        core::arch::asm!(
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
> ./cargo.sh build
> ./cargo.sh dump _start
0000000000402500 <_start>:
  402500:       50                      push   rax
  402501:       48 8d 05 68 ed ff ff    lea    rax,[rip+0xffffffffffffed68]        # 401270 <main>
  402508:       ff d0                   call   rax
  40250a:       48 89 c7                mov    rdi,rax
  40250d:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  402514:       0f 05                   syscall
  402516:       0f 0b                   ud2

Now that we agreed that stack alignment is important let’s make it permanent to avoid this bug in the future. The simplest way to clean up the last 16 byte of the number is and rsp,-0x10. Let’s add this to the beginning of the asm block:

#![allow(unused)]
fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "add rsp,-0x10",
            "call main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
}

Now it should work even without the nostack option because we the generated first instruction push rax will have simply no effect on our code. It’s a better to use the and instead of the sub or pop instruction here because sub and pop would remove 8 bytes in every case while the and instruction only modifies the rsp if it wasn’t aligned. Last but not least the System V ABI also says that the user space code is responsible for cleaning up the rbp register:

The content of this register is unspecified at process initialization time, but the user code should mark the deepest stack frame by setting the frame pointer to zero.

So let’s do that by adding an extra assembly line xor rbp,rbp.

#![allow(unused)]
fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "xor rbp,rbp",
            "add rsp,-0x10",
            "call main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
}

The main function

Let’s print our args like this:

cat -E -T /proc/self/cmdline | tr '\000' '\n'
cat
-E
-T
/proc/self/cmdline

In a C program we get the command line arguments directly from the main function like this

int main(int argc, char **argv); 

If we also need to access the environment variables we can extend the function signature like this

int main(int argc, char **argv, char **envp); 

Or further extend it to get auxiliary informations passed to the process like this:

int main(int argc, char **argv, char **envp, auxv_t *auxv); 

But where does these information come from? To be able to answer this question we need to go back to the System V ABI and read the section of 3.4.1 Initial Stack and Register State. It says that the stack of the process will be initialized as follows:

  • Unspecified block
  • Info block: the command line arguments and environment varibales are copied here
  • Unspecified block
  • End of auxiliary vector (null entry)
  • Auxiliary vector entries (auxv_t *auxv)
  • End of environment pointer vector (null pointer)
  • Environment pointer vector entries (char **envp)
  • End of argument pointer vector (null pointer)
  • Argument pointer vector (char **argv)
  • Argument pointer vector lengs (int argc)

The argument and environment ponter vectors are just an array of pointers pointing to the Info block of the stack. To check the value of it we can use gdb like this:

> gdb --args ./target/bin --arg1 --arg2
(gdb) break _start
(gdb) run
(gdb) x/8s *(char**)($rsp + 8)
0x7fffffffebd6: "/blog/src/chapter-03/target/bin"
0x7fffffffec09: "--arg1"
0x7fffffffec10: "--arg2"
0x7fffffffec17: "SHELL=/bin/bash"
0x7fffffffec27: "LESS=-RSF"
0x7fffffffec31: "TERM_PROGRAM_VERSION=3.2a"
0x7fffffffec4b: "TMUX=/tmp/tmux-1066129479/default,2230,8"
0x7fffffffec74: "EDITOR=vim"

The x let’s you examine a memory location of the program and the /8s specifies that 8 strings should be displayed. The $rsp + 8 is the location of the char **argv and cast it derefence it you get the wanted memory location. After the list of command line arguments we can see a list of environment variables. Feel free to play around the x command of gdb if you’re unfamiliar to it. You can get the help of it like help x.

Command line arguments

Let’s try to implement a C like command line argument handling. As we have learn in chapter 2 the C ABI uses the rdi, rsi, rdx, rcx, r8 and r9 registers to pass the arguments to a function so to pass argc and argv to main we just need to fill these registers with the values we can found on the stack. Let’s rewrite our _start function like this:

#![allow(unused)]
fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "and rsp,-16",
            "mov rdi,[rsp]",
            "lea rsi,[rsp+8]",
            "call main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
}

If you look at the assembly code there is important difference between argc (rdi) and argv (rsi): The argc is passed by value while the argv is passed by a reference. As such in case of argc we load the value pointed by rsp into the rdi in the instruction mov rdi,[rsp]. As opposed to this the lea instruction instead of loading the value it just calculates the memory address at [rsp+8] and puts this address to rsi. As a result argc can be interpreted az an integer value while argv can be interpreted as pointer to to an array of strings.

We can now rewrite the main function like this:

#[no_mangle]
fn main(argc: usize, argv: *const *const i8) -> u8 { 
    use core::convert::TryInto;
    for offset in 0 .. argc {
        unsafe {
            let ptr = *argv.offset(offset as isize);
            println!("{}", core::ffi::CStr::from_ptr(ptr).to_str().unwrap());
        }
    }
    0
}

And if we try to compile we can see an almost expected error message: Missing strlen symbole:

> ./cargo.sh run
error: linking with `cc` failed: exit status: 1
  = note: /usr/bin/ld: target/bin.bin.97e806d2324bed6f-cgu.0.rcgu.o: in function `core::ffi::c_str::CStr::from_ptr':
          bin.97e806d2324bed6f-cgu.0:(.text._ZN4core3ffi5c_str4CStr8from_ptr17hac38e50840c901dfE+0xc): undefined reference to `strlen'

So let’s add strlen to the ffi module:

#![allow(unused)]
fn main() {
#[no_mangle]
fn strlen(buf: *const u8) -> usize {
    let mut len = 0;
    while unsafe { *buf.offset(len) != 0 } {
        len += 1;
    }
    let x = len.try_into().unwrap();
    x
}
}

And now we have access to the command line arguments:

> ./cargo.sh build
> ./target/bin arg1 arg2
./target/bin
arg1
arg2

It works but this way the main function needs to implement an unsafe block to access the arguments. I think we can do better. The Rust standard library provides an args() which returns an Args struct which implements the Iterator trait so one can iterate over the arguments without the need of unsafe blocks. Let’s take as an example and implement our env module. Let’s create a new file called env.rs and include it into the linux.rs with pub mod env;.

To be able to do some initialization we won’t call the main function directly from _start but we will implement a __rust_main function (just like we have seen __libc_main in the first chapter) and do the process initialization there. Let’s do that by modifying the linux.rs file like this:

extern "C" { fn main() -> u8; }

#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "xor rbp,rbp",
            "and rsp,-16",
            "mov rdi,rsp",
            "call __rust_main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}

#[no_mangle]
fn __rust_main(rsp: isize) -> u8 {
    unsafe { main() }
}

Once we start writing Rust code it’s really hard to get a pointer to the beginning of the stack where argc, argv, etc. are located so we pass this pointer directly from assembly to our __rust_main function as an argument. The rest of the pointer operations can be done via the Rust interface. The main function can be rewritten like this:

#[no_mangle]
fn main() -> u8 { 0 }

Let’s add the logic to store the pointer of argv which can be later used to implement the env::args() funciton.

#![allow(unused)]
fn main() {
use core::sync::atomic::{AtomicPtr, Ordering};
pub(crate) static ARGV: AtomicPtr<*const i8> = AtomicPtr::new(core::ptr::null_mut());

#[no_mangle]
fn __rust_main(rsp: *const u8) -> u8 {
    let argv = unsafe { rsp.offset(8) as *mut *const i8 };
    ARGV.store(argv, Ordering::Relaxed);
    unsafe { main() }
}
}

The env.rs looks like this:

#![allow(unused)]
fn main() {
use core::ffi::CStr;
use core::sync::atomic::Ordering;

pub struct Pointers {
    next: isize,
    ptrs: *const *const i8,
}

impl core::iter::Iterator for Pointers {
    type Item = &'static str;

    fn next(&mut self) -> Option<Self::Item> {
        unsafe {
            let ptr = *self.ptrs.offset(self.next);
            self.next += 1;
            match ptr.is_null() {
                true => None,
                false => CStr::from_ptr(ptr).to_str().ok()
            }
        }
    }
}

pub fn args() -> Pointers {
    Pointers { 
        next: 0,
        ptrs: crate::ARGV.load(Ordering::Relaxed),
    }
}
}

And we can reimplement the main function as follows:

#[no_mangle]
fn main() -> u8 { 
    for arg in linux::env::args() {
        println!("{}", arg);
    }
    0
}

Run the program like this:

> ./cargo.sh build
> ./target/bin  a1 a2
./target/bin
a1
a2

Environment variables

Let’s print our env like this:

> cat /proc/self/environ | tr '\000' '\n'
SHELL=/bin/bash
LESS=-RSF
TERM_PROGRAM_VERSION=3.2a
EDITOR=vim
....
_=/usr/bin/cat

We already have almost everyting to get access to the environment variables of our process. Let’s update the startup logic like this:

#![allow(unused)]
fn main() {
pub(crate) static ENVP: AtomicPtr<*const i8> = AtomicPtr::new(core::ptr::null_mut());

#[no_mangle]
fn __rust_main(rsp: *const u8) -> u8 {
    let argc = unsafe { *(rsp as *const isize) };
    let argv = unsafe { rsp.offset(8) as *mut *const i8 };
    let envp = unsafe { rsp.offset(8 + 8 + argc * 8) as *mut *const i8 };

    ARGV.store(argv, Ordering::Relaxed);
    ENVP.store(envp, Ordering::Relaxed);

    unsafe { main() }
}
}

The environment logic like this:

#![allow(unused)]
fn main() {
pub fn envp() -> Pointers {
    Pointers { 
        next: 0,
        ptrs: crate::ENVP.load(Ordering::Relaxed),
    }
}
}

The main function like this:

#[no_mangle]
fn main() -> u8 { 
    for arg in linux::env::envp() {
        println!("{}", arg);
    }
    0
}

So we can print the environment variables like this:

> ./cargo.sh build
> ./target/bin
SHELL=/bin/bash
LESS=-RSF
TERM_PROGRAM_VERSION=3.2a
TMUX=/tmp/tmux-1066129479/default,2230,8
EDITOR=vim
...

Apart from that the standard library provides a neat function called vars and var. Let’s implement those too by adding the followings to the env.rs:

#![allow(unused)]
fn main() {
pub struct Variables {
    ptrs: Pointers,
}

impl core::iter::Iterator for Variables {
    type Item = (&'static str, &'static str);

    fn next(&mut self) -> Option<Self::Item> {
        self.ptrs.next().map(|s| s.split_once('=')).flatten()
    }
}

pub fn vars() -> Variables {
    Variables { ptrs: envp() }
}

pub fn var(key: &str) -> Option<&'static str> {
    vars().find(|(k, _)| *k == key).map(|(_, v)| v)
}
}

After that we can update the main function like this:

#[no_mangle]
fn main() -> u8 { 
    println!("MYVAR={:?}", linux::env::var("MYVAR"));
    0
}

But this time we get an symbole error on compilation:

> ./cargo.sh build
  = note: /usr/bin/ld: /home/taabodal/work/blog/src/chapter-03/target/liblinux.rlib(liblinux.linux.77104c24dad4cdd3-cgu.0.rcgu.o): in function `<[A] as core::slice::cmp::SlicePartialEq<B>>::equal':
          linux.77104c24dad4cdd3-cgu.0:(.text._ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h27d80543cacf2715E+0x38): undefined reference to `memcmp'

So let’s implement memcmp by putting the following code into the ffi.rs module:

#![allow(unused)]
fn main() {
#[no_mangle]
unsafe fn memcmp(s1: *const u8, s2: *const u8, len: usize) -> i32 {
    for idx in 0 .. len {
        let offset = idx.try_into().unwrap();
        unsafe { 
            let b1 = s1.offset(offset).read(); 
            let b2 = s2.offset(offset).read(); 
            if b1 != b2 {
                return (b1 - b2).into();
            }
        }
    }
    0
}
}

So we can run our program like this:

> ./cargo.sh build

> ./target/bin
MYVAR=None

> MYVAR="hello world" ./target/bin
MYVAR=Some("hello world")

Auxiliary vector

LD Magic:

> LD_DEBUG=bindings python
> LD_SHOW_AUXV=1 cat /dev/null

https://cseweb.ucsd.edu/~gbournou/CSE131/the_inside_story_on_shared_libraries_and_dynamic_loading.pdf

Let’s check out the auxv passed by the kernel to the cat command:

> LD_SHOW_AUXV=1 cat /dev/null
AT_SYSINFO_EHDR:      0x7ffeeb1dd000
AT_MINSIGSTKSZ:       3632
AT_HWCAP:             f8bfbff
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0x555a0920b040
AT_PHENT:             56
AT_PHNUM:             13
AT_BASE:              0x7f73bbebc000
AT_FLAGS:             0x0
AT_ENTRY:             0x555a0920e760
AT_UID:               1066129479
AT_EUID:              1066129479
AT_GID:               1065878017
AT_EGID:              1065878017
AT_SECURE:            0
AT_RANDOM:            0x7ffeeb0d43d9
AT_HWCAP2:            0x2
AT_EXECFN:            /usr/bin/cat
AT_PLATFORM:          x86_64

A bit lower level way to the the same is:

> od -t x8 /proc/self/auxv
0000000 0000000000000021 00007fff77dbd000
0000020 0000000000000033 0000000000000e30
0000040 0000000000000010 000000000f8bfbff
0000060 0000000000000006 0000000000001000
0000100 0000000000000011 0000000000000064
0000120 0000000000000003 00005633999f3040
0000140 0000000000000004 0000000000000038
0000160 0000000000000005 000000000000000d
0000200 0000000000000007 00007f43dc1f2000
0000220 0000000000000008 0000000000000000
0000240 0000000000000009 00005633999f6be0
0000260 000000000000000b 000000003f8bd847
0000300 000000000000000c 000000003f8bd847
0000320 000000000000000d 000000003f880201
0000340 000000000000000e 000000003f880201
0000360 0000000000000017 0000000000000000
0000400 0000000000000019 00007fff77d0daf9
0000420 000000000000001a 0000000000000002
0000440 000000000000001f 00007fff77d0dfec
0000460 000000000000000f 00007fff77d0db09
0000500 0000000000000000 0000000000000000
#![allow(unused)]
fn main() {
pub struct AuxVector {
    next: isize,
    buf: *const auxv_t,
}

impl core::iter::Iterator for AuxVector {
    type Item = AT;

    fn next(&mut self) -> Option<Self::Item> {
        let aux = unsafe { *self.buf.offset(self.next) };
        self.next += 1;

        match AT::from(aux){
            AT::AT_NULL => None,
            other => Some(other),
        }
    }
}

pub fn auxv() -> AuxVector {
    AuxVector { 
        next: 0,
        buf: crate::AUXV.load(Ordering::Relaxed),
    }
}
}
#![allow(unused)]
fn main() {
#[no_mangle]
unsafe fn __rust_main(rsp: *const u8) -> u8 {
    parse_stack(rsp);
    //let ldso = ldso::Ldso::new();
    //ldso.relocate_ldso();
    //ldso.relocate_exe();
    main()
}
}

Memory management

  • mmap, mremap, munmap
  • brk
  • msync
  • mprotect, mincore

brk

Let’s write a code like this and investigate the memory footprint of our program:

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

use linux::syscall::*;

#[no_mangle]
fn main() -> u8 { 
    println!("pid: {}", getpid().unwrap());
    let _ = pause();
    0
}

This small program gets the process id of our program and pauses the execution so we can checkout the memory

> ./cargo.sh run
pid: 1320734

Let’s checkout the mappings in the proc file system by using the pid like this:

> cat /proc/1320734/maps
00400000-00401000                  r--p  00000000  fd:00  950935  /target/bin
00401000-00404000                  r-xp  00001000  fd:00  950935  /target/bin
00404000-00405000                  r--p  00004000  fd:00  950935  /target/bin
00406000-00407000                  rw-p  00005000  fd:00  950935  /target/bin
00407000-00408000                  rw-p  00000000  00:00  0       
7ffe1a300000-7ffe1a321000          rw-p  00000000  00:00  0       [stack]
7ffe1a3f2000-7ffe1a3f6000          r--p  00000000  00:00  0       [vvar]
7ffe1a3f6000-7ffe1a3f8000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

Let’s break it down what this file tells us:

  • Our binary is mapped into the low address rage with different permissions:
    • read-only
    • read-exec
    • read-only
    • read-write
  • There is a middle section
  • In the high address range we have
    • stack
    • vvar
    • vdso
    • vsyscall

Allocating memory

Let’s modify our main function like this and run our program:

#[no_mangle]
fn main() -> u8 { 
    println!("pid: {}", getpid().unwrap());
    brk(brk(0) + 4096);
    let _ = pause();
    0
}

The mappings have been changed like this:

> cat /proc/1321009/maps
00400000-00401000                  r--p  00000000  fd:00  950935  /target/bin
00401000-00404000                  r-xp  00001000  fd:00  950935  /target/bin
00404000-00405000                  r--p  00004000  fd:00  950935  /target/bin
00406000-00407000                  rw-p  00005000  fd:00  950935  /target/bin
00407000-00408000                  rw-p  00000000  00:00  0       
004cd000-004ce000                  rw-p  00000000  00:00  0       [heap]
7ffc131b5000-7ffc131d6000          rw-p  00000000  00:00  0       [stack]
7ffc131ed000-7ffc131f1000          r--p  00000000  00:00  0       [vvar]
7ffc131f1000-7ffc131f3000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

There is a new section called [heap] mapped as a private region with read and write permissions (rw-p) So let’s use that space to to read the mappings from in the proc filesystem

#![no_std]
#![no_main]

extern crate linux;

use linux::syscall::*;
use linux::constants::*;

#[no_mangle]
fn main() -> u8 { 
    let len = 4096;
    let old = brk(0);
    let _ = brk(old + len) as *mut u8;
    let mut buf = unsafe { 
        core::slice::from_raw_parts_mut(old as *mut u8, len as usize) 
    };

    let fd = open("/proc/self/maps", O_RDONLY, 0).unwrap();
    loop {
        let len = read(fd, &mut buf).unwrap();
        let _ = write(1, &buf[..len]).unwrap();
        if len < buf.len() {
            break;
        }
    }
    0
}

It works like this:

> ./cargo.sh run
00400000-00401000                  r--p  00000000  fd:00  950935  /target/bin
00401000-00404000                  r-xp  00001000  fd:00  950935  /target/bin
00404000-00406000                  r--p  00004000  fd:00  950935  /target/bin
00406000-00407000                  rw-p  00005000  fd:00  950935  /target/bin
00407000-00408000                  rw-p  00000000  00:00  0       
017c8000-017c9000                  rw-p  00000000  00:00  0       [heap]
7ffc3c050000-7ffc3c071000          rw-p  00000000  00:00  0       [stack]
7ffc3c079000-7ffc3c07d000          r--p  00000000  00:00  0       [vvar]
7ffc3c07d000-7ffc3c07f000          r-xp  00000000  00:00  0       [vdso]
ffffffffff600000-ffffffffff601000  --xp  00000000  00:00  0       [vsyscall]

Maps

mmap, munmap

Memory protection

Although brk is a nice little tool to allocate memory there quite a lot of other things we can do with memory. To write self modifying code we can use mmap to allocate a memory which has all the read-write-exec flags enabled

Let’s create an executable which can read byte stream from standard out and it tries to execute it.

#![no_std]
#![no_main]

extern crate linux;
use linux::syscall::*;

#[no_mangle]
fn main() -> u8 { 
    let ptr = unsafe { 
        mmap(
            core::ptr::null_mut(), 1024, 
            PROT_READ|PROT_WRITE|PROT_EXEC, 
            MAP_PRIVATE|MAP_ANONYMOUS, 
            0, 0
        ).unwrap() 
    };

    let mut buf = unsafe { core::slice::from_raw_parts_mut(ptr, 1024) };

    if read(0, &mut buf).unwrap() > 0 {
        unsafe { core::arch::asm!("jmp {0}", in(reg) ptr) }
    }

    0
}

Let’s break our program down: First we need to allocate a buffer which we can fill with data

#![allow(unused)]
fn main() {
let ptr = unsafe { 
    mmap(
        core::ptr::null_mut(), 1024, 
        PROT_READ|PROT_WRITE|PROT_EXEC, 
        MAP_PRIVATE|MAP_ANONYMOUS, 
        0, 0
    ).unwrap() 
};
}

After that we create a slice to make sure that we avoid any memory safety issues…

#![allow(unused)]
fn main() {
let mut buf = unsafe { core::slice::from_raw_parts_mut(ptr, 1024) };
}

Once we’ve done with that we can read data from stdin into this buffer and if there were some data we can try to execute it.

#![allow(unused)]
fn main() {
if read(0, &mut buf).unwrap() > 0 {
    unsafe { core::arch::asm!("jmp {0}", in(reg) ptr) }
}
}

If there is no data present, we simply exit with process with return code 0. Let’s test our program like this:

> ./cargo.sh build

> cat /dev/null | ./target/bin; echo $?
0

> echo "hello world" | ./target/bin; echo $?
Segmentation fault (core dumped)
139

It seems to be working, so let’s write some code which is able to rewrite itself:

global exploit
.text:
exploit:
    mov rdi,0x1
    inc byte [rel exploit + 0x1]
    cmp rdi,0xa
    jb exploit
    mov rax,0x3c
    syscall

This code initializes rdi with 0x1 and increments the constant value of 0x1 by one. After that it checks if rdi is already equals to 0xa and if not it jumps back to exploit but this time we put 0x2 into rdi. Once the rdi reaches 0xa it calls the exit system call so the return code of our process will be 10.

Let’s build that code and see how it looks after the compilation:

> nasm -f elf64 -o obj asm.s
> objdump --disassemble=exploit -M intel ./obj
0000000000000000 <exploit>:
   0:   bf 01 00 00 00          mov    edi,0x1
   5:   fe 05 f6 ff ff ff       inc    BYTE PTR [rip+0xfffffffffffffff6]        # 1 <exploit+0x1>
   b:   48 83 ff 0a             cmp    rdi,0xa
   f:   72 ef                   jb     0 <exploit>
  11:   b8 3c 00 00 00          mov    eax,0x3c
  16:   0f 05                   syscall

We can dump our exploit function as a binary blob so we can use it against our rust program like this:

> objcopy -O binary --only-section=.text obj exploit
> cat ./exploit | ./target/bin; echo $?
10

File mappings

As we’ve seen in the brk section there are always some files mapped into the virtual address space of a process. At least there is the binary which is being executed. In many times there are mapped here too. (Check out the mappings of the cat command with cat /proc/self/maps or of your shell with cat /proc/$$/maps)

We can also map a regular file to the address space and use it like a permanent buffer for our program.

#![no_std]
#![no_main]

extern crate linux;
use linux::syscall::*;
use linux::constants::*;

#[no_mangle]
fn main() -> u8 { 
    let fd = open("/tmp/data", O_CREAT|O_APPEND|O_RDWR, S_IRUSR|S_IWUSR).unwrap();
    fallocate(fd, 0, 0, 1024).unwrap();
    let ptr = unsafe { 
        mmap(
            core::ptr::null_mut(), 1024, 
            PROT_READ|PROT_WRITE,
            MAP_SHARED_VALIDATE,
            fd, 0
        ).unwrap() 
    };

    let mut buf = unsafe { core::slice::from_raw_parts_mut(ptr, 1024) };
    let _ = write(1, buf).unwrap();
    let _ = read(0, &mut buf).unwrap();
    0
}

This way we can use it:

> echo "Hello old world" | ./target/bin

> cat /tmp/data
Hello old world

> echo "Hello new world" | ./target/bin
Hello old world

> cat /tmp/data
Hello new world

Feel free to reimplement the exploit above by mapping it into the virtual address space instead of reading from stdin.

Shared memory

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;
use linux::syscall::*;
use linux::constants::*;


#[no_mangle]
fn main() -> u8 { 
    let fd = open("/tmp/data", O_CREAT|O_TRUNC|O_RDWR, 0).unwrap();
    fallocate(fd, 0, 0, 1024).unwrap();

    let p1 = unsafe {
        mmap(
            core::ptr::null_mut(), 1024,
            PROT_READ|PROT_WRITE,
            MAP_SHARED_VALIDATE,
            fd, 0
        ).unwrap()
    };

    let p2 = unsafe {
        mmap(
            core::ptr::null_mut(), 1024,
            PROT_READ|PROT_WRITE,
            MAP_SHARED_VALIDATE,
            fd, 0
        ).unwrap()
    };

    let mut b1 = unsafe { core::slice::from_raw_parts_mut(p1, 1024) };
    let mut b2 = unsafe { core::slice::from_raw_parts_mut(p2, 1024) };

    b1[0] = 13;

    println!("b1[0] = {}", b1[0]);
    println!("b2[0] = {}", b2[0]);

    0
}
> ./cargo.sh run
b1[0] = 13
b2[0] = 13

Overmap section with different protection

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;
use linux::syscall::*;
use linux::constants::*;


#[no_mangle]
fn main() -> u8 { 
    let p1 = unsafe {
        mmap(
            core::ptr::null_mut(), 4096 * 3,
            PROT_READ,
            MAP_ANONYMOUS|MAP_PRIVATE,
            0, 0
        ).unwrap()
    };
    
    read(0, &mut [0u8]);

    let p2 = unsafe {
        mmap(
            p1.offset(4096), 4096,
            PROT_READ|PROT_WRITE,
            MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED,
            0, 0
        ).unwrap()
    };

    read(0, &mut [0u8]);
    unsafe { munmap(p1, 4096 * 3).unwrap() };

    pause();

    0
}

mprotect

msync

Vdso

Functions to load symboles: dlopen, dlclose, dlsym

LD_PRELOAD=./libmfilter.so python to overwrite functions

To print the aux vector

https://lwn.net/Articles/519085/ https://lwn.net/Articles/615809/

#![allow(unused)]
fn main() {
use core::ffi::CStr;
use core::mem::transmute;

use crate::types::*;
use crate::error::{Result, result};

#[repr(C)]
#[derive(Debug, Clone)]
pub struct Ehdr {
    pub e_ident: [u8;16],
    pub e_type: u16,
    pub e_machine: u16,
    pub e_version: u32,
    pub e_entry: u64,
    pub e_phoff: u64,
    pub e_shoff: u64,
    pub e_flags: u32,
    pub e_ehsize: u16,
    pub e_phentsize: u16,
    pub e_phnum: u16,
    pub e_shentsize: u16,
    pub e_shnum: u16,
    pub e_shstrndx: u16,
}

#[repr(C)]
#[derive(Debug, Clone)]
pub struct Phdr {
    pub p_type: u32,
    pub p_flags: u32,
    pub p_offset: u64,
    pub p_vaddr: u64,
    pub p_paddr: u64,
    pub p_filesz: u64,
    pub p_memsz: u64,
    pub p_align: u64,
}

#[repr(C)]
#[derive(Debug, Clone)]
pub struct Shdr {
    pub sh_name: u32,
    pub sh_type: u32,
    pub sh_flags: u64,
    pub sh_addr: u64,
    pub sh_offset: u64,
    pub sh_size: u64,
    pub sh_link: u32,
    pub sh_info: u32,
    pub sh_addralign: u64,
    pub sh_entsize: u64,
}

#[repr(C)]
#[derive(Debug, Clone)]
pub struct Sym {
    pub st_name: u32,
    pub st_info: u8,
    pub st_other: u8,
    pub st_shndx: u16,
    pub st_value: u64,
    pub st_size: u64,
}

pub struct Vdso {
    time: extern "C" fn(*mut time_t) -> time_t,
    getcpu: extern "C" fn(*mut u32, *mut u32) -> isize,
    gettimeofday: extern "C" fn(*mut timeval, *mut timezone) -> isize,
    clock_getres: extern "C" fn(clockid_t, *mut timespec) -> isize,
    clock_gettime: extern "C" fn(clockid_t, *mut timespec) -> isize,
}

impl Vdso {
    pub(crate) unsafe fn from_ptr(p: *const u8) -> Self {
        let header = &*(p as *const Ehdr);

        let section_headers = core::slice::from_raw_parts(
            p.offset(header.e_shoff as isize) as *const Shdr,
            header.e_shnum as usize
        );

        let dynstr = section_headers.iter().find(|e| e.sh_type == 3).map(|h| {
            p.offset(h.sh_offset as isize) as *const u8
        }).unwrap();

        let dynsym = section_headers.iter().find(|e| e.sh_type == 11).map(|h| {
            core::slice::from_raw_parts(
                p.offset(h.sh_offset as isize) as *const Sym,
                h.sh_size as usize / core::mem::size_of::<Sym>(),
            )
        }).unwrap();

        let mut time = None;
        let mut getcpu = None;
        let mut gettimeofday = None;
        let mut clock_getres = None;
        let mut clock_gettime = None;

        for symbole in dynsym {
            let s = dynstr.add(symbole.st_name as usize) as *const i8;
            match CStr::from_ptr(s).to_str() {
                Ok("time") => { time = transmute(p.add(symbole.st_value as usize)); }
                Ok("getcpu") => { getcpu = transmute(p.add(symbole.st_value as usize)); }
                Ok("gettimeofday") => { gettimeofday = transmute(p.add(symbole.st_value as usize)); }
                Ok("clock_getres") => { clock_getres = transmute(p.add(symbole.st_value as usize)); }
                Ok("clock_gettime") => { clock_gettime = transmute(p.add(symbole.st_value as usize)); }
                _ => { /* ignore */ }
            }
        }

        Self {
            time: time.unwrap(),
            getcpu: getcpu.unwrap(),
            gettimeofday: gettimeofday.unwrap(),
            clock_getres: clock_getres.unwrap(),
            clock_gettime: clock_gettime.unwrap(),
        }
    }

    #[inline(always)]
    pub fn time(&self, time: &mut time_t) -> time_t {
        (self.time)(time as *mut _)
    }

    /// The signature of this system call is different from the one documented in the man pages.
    /// This is because there is only way make this system call fail which is providing invalid pointers
    /// Since returning Result<()> has the same effect as returning a tuple with two numbers we simply
    /// make sure that it never fails by putting these variables on the stack.
    ///
    /// TODO: Is this really true about the Result<()>????
    #[inline(always)]
    pub fn getcpu(&self) -> (u32, u32) {
        let mut cpu = 0;
        let mut node = 0;
        (self.getcpu)(&mut cpu as *mut _, &mut node as *mut _);
        (cpu, node)
    }

    #[inline(always)]
    pub fn gettimeofday(&self, tv: &mut timeval, tz: &mut timezone) -> Result<()> {
        result((self.gettimeofday)(tv as *mut _, tz as *mut _)).map(|_| ())
    }

    #[inline(always)]
    pub fn clock_getres(&self, clock: clockid_t, spec: &mut timespec) -> Result<()> {
        result((self.clock_getres)(clock, spec as *mut _)).map(|_| ())
    }

    #[inline(always)]
    pub fn clock_gettime(&self, clock: clockid_t, spec: &mut timespec) -> Result<()> {
        result((self.clock_gettime)(clock, spec as *mut _)).map(|_| ())
    }
}
}

Performance

Casting

There are multiple ways to convert types in Rust. We’re going to investigate the benefits and drawbacks of using one over another.

u64 => u32

Let’s create a small program to check out the differences at the assembly level:

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

use core::convert::TryInto;
use core::convert::TryFrom;

#[no_mangle] #[inline(never)] fn cast_as(n: u32) -> u64 { n as u64 }
#[no_mangle] #[inline(never)] fn cast_into(n: u32) -> u64 { n.into() }
#[no_mangle] #[inline(never)] fn cast_from(n: u32) -> u64 { u64::from(n) }
#[no_mangle] #[inline(never)] fn cast_try_into(n: u32) -> u64 { n.try_into().unwrap() }
#[no_mangle] #[inline(never)] fn cast_try_from(n: u32) -> u64 { u64::try_from(n).unwrap() }

#[no_mangle]
fn main() -> u8 { 
    println!("{}", cast_as(1));
    println!("{}", cast_into(1));
    println!("{}", cast_from(1));
    println!("{}", cast_try_into(1));
    println!("{}", cast_try_from(1));
    0
}

once we compile, we get the following codes

> ./cargo.sh build
> ./cargo.sh dump cast_as
0000000000401270 <cast_as>:
  401270:       89 f8                   mov    eax,edi
  401272:       c3                      ret

> ./cargo.sh dump cast_into
0000000000401280 <cast_into>:
  401280:       89 f8                   mov    eax,edi
  401282:       c3                      ret

> ./cargo.sh dump cast_from
0000000000401290 <cast_from>:
  401290:       89 f8                   mov    eax,edi
  401292:       c3                      ret

> ./cargo.sh dump cast_try_into
00000000004012a0 <cast_try_into>:
  4012a0:       89 f8                   mov    eax,edi
  4012a2:       c3                      ret

> ./cargo.sh dump cast_try_from
00000000004012b0 <cast_try_from>:
  4012b0:       89 f8                   mov    eax,edi
  4012b2:       c3                      ret

As you can see rust really does a zero-cost abstraction and generates all of our functions the same way. This is possible since a u32 can always be converted into a u64.

u32 => u64

But What happens if we switch the types and try to convert u64 into u32? In this case we don’t have the Into and From traits implemented for the conversion so we can only compare the following functions:

#![no_std]
#![no_main]

use core::convert::TryInto;
use core::convert::TryFrom;

#[macro_use]
extern crate linux;

#[no_mangle] #[inline(never)] fn cast_as(n: u64) -> u32 { n as u32 }
#[no_mangle] #[inline(never)] fn cast_try_into(n: u64) -> u32 { n.try_into().unwrap() }
#[no_mangle] #[inline(never)] fn cast_try_from(n: u64) -> u32 { u32::try_from(n).unwrap() }

#[no_mangle]
fn main() -> u8 { 
    println!("{}", cast_as(1));
    println!("{}", cast_try_into(1));
    println!("{}", cast_try_from(1));
    0
}

Interestingly the code looks still the same. The compiler sees that the only value we use this function is a constant 1 so it can be sure that it fits into an u32 and it optimizes out the size checks.

> ./cargo.sh dump cast_as
0000000000401270 <cast_as>:
  401270:       89 f8                   mov    eax,edi
  401272:       c3                      ret

> ./cargo.sh dump cast_try_into
00000000004012a0 <cast_try_into>:
  4012a0:       89 f8                   mov    eax,edi
  4012a2:       c3                      ret

> ./cargo.sh dump cast_try_from
00000000004012b0 <cast_try_from>:
  4012b0:       89 f8                   mov    eax,edi
  4012b2:       c3                      ret

Let’s make it a bit more comples by reading a random byte from stdin, so the compiler doesn’t have a chance to optimize our code:

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

use core::convert::TryInto;
use core::convert::TryFrom;

#[no_mangle] #[inline(never)] fn cast_as(n: u64) -> u32 { n as u32 }
#[no_mangle] #[inline(never)] fn cast_try_into(n: u64) -> u32 { n.try_into().unwrap() }
#[no_mangle] #[inline(never)] fn cast_try_from(n: u64) -> u32 { u32::try_from(n).unwrap() }

#[no_mangle]
fn main() -> u8 { 
    let mut buf = [0u8;1];
    linux::syscall::read(0, &mut buf).unwrap();
    let n = buf[0].try_into().unwrap();

    println!("{}", cast_as(n));
    println!("{}", cast_try_into(n));
    println!("{}", cast_try_from(n));
    0
}
> ./cargo.sh dump cast_as
00000000004019c0 <cast_as>:
  4019c0:       48 89 f8                mov    rax,rdi
  4019c3:       c3                      ret

> ./cargo.sh dump cast_try_from
0000000000401a10 <cast_try_from>:
  401a10:           48 89 f8                    mov    rax,rdi
  401a13:           48 c1 e8 20                 shr    rax,0x20
  401a17:       /-- 75 03                       jne    401a1c <cast_try_from+0xc>
  401a19:       |   89 f8                       mov    eax,edi
  401a1b:       |   c3                          ret
  401a1c:       \-> 50                          push   rax
  401a1d:           48 8d 3d f2 37 00 00        lea    rdi,[rip+0x37f2]
  401a24:           48 8d 0d 7d 60 00 00        lea    rcx,[rip+0x607d]
  401a2b:           4c 8d 05 fe 60 00 00        lea    r8,[rip+0x60fe]
  401a32:           48 8d 54 24 07              lea    rdx,[rsp+0x7]
  401a37:           be 2b 00 00 00              mov    esi,0x2b
  401a3c:           ff 15 96 65 00 00           call   QWORD PTR [rip+0x6596]

> ./cargo.sh dump cast_try_into
00000000004019d0 <cast_try_into>:
  4019d0:           48 89 f8                    mov    rax,rdi
  4019d3:           48 c1 e8 20                 shr    rax,0x20
  4019d7:       /-- 75 03                       jne    4019dc <cast_try_into+0xc>
  4019d9:       |   89 f8                       mov    eax,edi
  4019db:       |   c3                          ret
  4019dc:       \-> 50                          push   rax
  4019dd:           48 8d 3d 32 38 00 00        lea    rdi,[rip+0x3832]
  4019e4:           48 8d 0d bd 60 00 00        lea    rcx,[rip+0x60bd]
  4019eb:           4c 8d 05 26 61 00 00        lea    r8,[rip+0x6126]
  4019f2:           48 8d 54 24 07              lea    rdx,[rsp+0x7]
  4019f7:           be 2b 00 00 00              mov    esi,0x2b
  4019fc:           ff 15 d6 65 00 00           call   QWORD PTR [rip+0x65d6]

Alright, that looks now a bit different. As we can see, the documentation of TryFrom and TryInto was right: the two functions generate the same code, the only difference is how the rust code looks like. So from now on we don’t care the try_from function either and only compare the as keyword with the TryInto trait.

As you can see as keyword generates a single instruction in which it moves the content of rdi into rax. So it only does register operation. As opposed to this the the TryInto trait generates 12 instructions. 5 of these instruction uses memory access (lines with […]) and although it points to code segment which is probably already located in L2 cache it’s obviously much slower than a simple register access.

If we count one CPU cycle for moving the value of a register into another one and about 10 cycles for finding a value in L2 cache we can say that the TryInto conversion takes about 60-70x longer. And if you do it a lot it quickly adds up and makes a huge difference in the performance of your code. But is this really mirroring the reality? Well not quite…

If we look at the code a bit closer at the beginning it does the same as the as keyword. After that it shifts the value of our number 32 bit right and if it’s not equal to 0x4019dc then it jumps to the failure handling logic.

  4019d0:           48 89 f8                    mov    rax,rdi
  4019d3:           48 c1 e8 20                 shr    rax,0x20
  4019d7:       /-- 75 03                       jne    4019dc <cast_try_into+0xc>
  4019d9:       |   89 f8                       mov    eax,edi
  4019db:       |   c3                          ret

This means that in case of the valid cast we only do 4 instructions which is only 4 times slower then the as keyword but for that we get the benefit of the error handling. This makes the .text segment of our code obviously bigger and so it will fit not so good into our instruction cache wich makes our code overall slower but it will be always correct. As opposed to this we can see the as keyword a bit like an unsafe keyword which doesn’t always produces the expected value and it just goes forward like nothing had happened. Still it has the benefit of having smaller and faster code if we use it with care.

To see the difference between the code sizes we can use readelf like this:

> readelf -s ./target/bin | grep -E 'cast_|Name'
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    45: 00000000004019d0    50 FUNC    GLOBAL DEFAULT    2 cast_try_into
    95: 00000000004019c0     4 FUNC    GLOBAL DEFAULT    2 cast_as

This says that the cast_try_into function occupies 12.5x more space in our caches. And if we remove the function wrapper from the casts (remove the ret instruction) casting with as takes 3 bytes while casting with try_into takes 49 bytes. As a result we can have ~22 as cast and ~1.5 try_into cast in our L1 instruction cache. Which is quite a bit of difference.

Let’s see what does the as keyword in different scenarios:

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

#[no_mangle] #[inline(never)] fn u32_as_u16(n: u32) -> u16 { n as u16 }
#[no_mangle] #[inline(never)] fn u32_as_i16(n: u32) -> i16 { n as i16 }
#[no_mangle] #[inline(never)] fn u32_as_i32(n: u32) -> i32 { n as i32 }
#[no_mangle] #[inline(never)] fn u32_as_u64(n: u32) -> u64 { n as u64 }
#[no_mangle] #[inline(never)] fn u32_as_i64(n: u32) -> i64 { n as i64 }

#[no_mangle] #[inline(never)] fn i32_as_u16(n: i32) -> u16 { n as u16 }
#[no_mangle] #[inline(never)] fn i32_as_i16(n: i32) -> i16 { n as i16 }
#[no_mangle] #[inline(never)] fn i32_as_u32(n: i32) -> u32 { n as u32 }
#[no_mangle] #[inline(never)] fn i32_as_u64(n: i32) -> u64 { n as u64 }
#[no_mangle] #[inline(never)] fn i32_as_i64(n: i32) -> i64 { n as i64 }


#[no_mangle]
fn main() -> u8 { 
    println!("u32_as_u16(1)         {}", u32_as_u16(1));
    println!("u32_as_i16(1)         {}", u32_as_i16(1));
    println!("u32_as_i32(1)         {}", u32_as_i32(1));
    println!("u32_as_u64(1)         {}", u32_as_u64(1));
    println!("u32_as_i64(1)         {}", u32_as_i64(1));

    println!("-----------------------------------------------");
    println!("u32_as_u16(u32::MAX)  {}", u32_as_u16(u32::MAX));
    println!("u32_as_i16(u32::MAX)  {}", u32_as_i16(u32::MAX));
    println!("u32_as_i32(u32::MAX)  {}", u32_as_i32(u32::MAX));
    println!("u32_as_u64(u32::MAX)  {}", u32_as_u64(u32::MAX));
    println!("u32_as_i64(u32::MAX)  {}", u32_as_i64(u32::MAX));

    println!("-----------------------------------------------");
    println!("i32_as_u16(i32::MIN)  {}", i32_as_u16(i32::MIN));
    println!("i32_as_i16(i32::MIN)  {}", i32_as_i16(i32::MIN));
    println!("i32_as_u32(i32::MIN)  {}", i32_as_u32(i32::MIN));
    println!("i32_as_u64(i32::MIN)  {}", i32_as_u64(i32::MIN));
    println!("i32_as_i64(i32::MIN)  {}", i32_as_i64(i32::MIN));

    println!("-----------------------------------------------");
    println!("i32_as_u16(-1)        {}", i32_as_u16(-1));
    println!("i32_as_i16(-1)        {}", i32_as_i16(-1));
    println!("i32_as_u32(-1)        {}", i32_as_u32(-1));
    println!("i32_as_u64(-1)        {}", i32_as_u64(-1));
    println!("i32_as_i64(-1)        {}", i32_as_i64(-1));

    println!("-----------------------------------------------");
    println!("i32_as_u16(1)         {}", i32_as_u16(1));
    println!("i32_as_i16(1)         {}", i32_as_i16(1));
    println!("i32_as_u32(1)         {}", i32_as_u32(1));
    println!("i32_as_u64(1)         {}", i32_as_u64(1));
    println!("i32_as_i64(1)         {}", i32_as_i64(1));

    println!("-----------------------------------------------");
    println!("i32_as_u16(i32::MAX)  {}", i32_as_u16(i32::MAX));
    println!("i32_as_i16(i32::MAX)  {}", i32_as_i16(i32::MAX));
    println!("i32_as_u32(i32::MAX)  {}", i32_as_u32(i32::MAX));
    println!("i32_as_u64(i32::MAX)  {}", i32_as_u64(i32::MAX));
    println!("i32_as_i64(i32::MAX)  {}", i32_as_i64(i32::MAX));

    0
}

Interestingly if we try to disasseble the code of these functions there are only three of them can be found: i32_as_i16, u32_as_i64, i32_as_i64. We must to dig a bit deeper to find out why. Let’s checkout the symbol table of the binary with

> objdump -t ./target/bin | grep _as_ | sort
0000000000401270 g     F .text  0000000000000003 u32_as_i64
0000000000401270 g     F .text  0000000000000003 u32_as_u64
0000000000401280 g     F .text  0000000000000003 i32_as_i16
0000000000401280 g     F .text  0000000000000003 i32_as_u16
0000000000401280 g     F .text  0000000000000003 u32_as_i16
0000000000401280 g     F .text  0000000000000003 u32_as_u16
0000000000401290 g     F .text  0000000000000004 i32_as_i64
0000000000401290 g     F .text  0000000000000004 i32_as_u64

In this output we can see the following columns:

  • memory address
  • flags to describe the type of the symbol (g=global, F=function)
  • section in which the symbol is located (.text = program code)
  • size of the symbol
  • name of the symbol

If you have a closer look at the memory address of the symboles you can see that multiple symbol uses the same address. This means that there are only 3 different code sections for these 8 symboles. In the disassemble function of the objdump command it takes only the first of these memory addresses as real symbol and so it doesn’t find any other of them. The reason for merging these symboles are that they do exactly the same from the compilers perspective. Let’s see what is that:

> objdump --disassemble=u32_as_i64 -M intel ./target/bin
0000000000401270 <u32_as_i64>:
  401270:       89 f8                   mov    eax,edi
  401272:       c3                      ret

> objdump --disassemble=i32_as_i16 -M intel ./target/bin
0000000000401280 <i32_as_i16>:
  401280:       89 f8                   mov    eax,edi
  401282:       c3                      ret

> objdump --disassemble=i32_as_i64 -M intel ./target/bin
0000000000401290 <i32_as_i64>:
  401290:       48 63 c7                movsxd rax,edi
  401293:       c3                      ret

I’m now sure why the first two functions weren’t merged but maybe because of the different input type (u32/i32) but the third function is obviously different. It creates a signed integer with bigger size. This means that the value has to be sign-extended (movsxd. This means for example if case of i8 => i16

-1: 0xff => 0xffff
+1: 0x01 => 0x0001

Last but not least, let’s see the output of our program. We have the following blocks:

Unsigned normal:

u32_as_u16(1)         1                     # same
u32_as_i16(1)         1                     # same
u32_as_i32(1)         1                     # same
u32_as_u64(1)         1                     # same
u32_as_i64(1)         1                     # same

Unsigned overflow:

u32_as_u16(u32::MAX)  65535                 # diff (truncated)
u32_as_i16(u32::MAX)  -1                    # diff (2's complement)
u32_as_i32(u32::MAX)  -1                    # diff (2's complement)
u32_as_u64(u32::MAX)  4294967295            # same
u32_as_i64(u32::MAX)  4294967295            # same

Signed underflow:

i32_as_u16(i32::MIN)  0                     # diff
i32_as_i16(i32::MIN)  0                     # diff
i32_as_u32(i32::MIN)  2147483648            # diff (not 2's complement)
i32_as_u64(i32::MIN)  18446744071562067968  # diff (not 2's complement)
i32_as_i64(i32::MIN)  -2147483648           # same

i32_as_u16(-1)        65535                 # diff (not 2's complement)
i32_as_i16(-1)        -1                    # same
i32_as_u32(-1)        4294967295            # diff (not 2's complement)
i32_as_u64(-1)        18446744073709551615  # diff (not 2's complement)
i32_as_i64(-1)        -1                    # same

Signed normal:

i32_as_u16(1)         1                     # same
i32_as_i16(1)         1                     # same
i32_as_u32(1)         1                     # same
i32_as_u64(1)         1                     # same
i32_as_i64(1)         1                     # same

Signed overflow:

i32_as_u16(i32::MAX)  65535                 # diff (truncated)
i32_as_i16(i32::MAX)  -1                    # diff (2's complement)
i32_as_u32(i32::MAX)  2147483647            # same
i32_as_u64(i32::MAX)  2147483647            # same
i32_as_i64(i32::MAX)  2147483647            # same

As a result we can set up the following rules:

CastSafe if
uS as uBalways
uS as iBalways
uN as iNuN <= iN::MAX
iN as uNiN >= uN::MIN
uB as uSuB <= uS::MAX
iB as uSiB >= iS::MIN && iB <= iS::MAX
iB as uSiB >= uS::MIN && iB <= uS::MAX
uB as uSuB <= iS::MAX

Where the letters have the following meanings:

  • S: small
  • B: big
  • N: number (same size)
  • u: unsigned
  • i: signed

As a conclusion we could say the followings about casting integers:

  • Use the From, Into traits whenever the type system allows it. It’s the safest way and it doesn’t have any overhead.
  • Use the TryFrom, TryInto traits whenever the type system requires it and you can not be sure about the input value. Even though it has a bit of an overhead and it decreases the cache locality of your code but it’s always safe to use and let’s the compiler warn you if you unintetionally modify the code in an incorrect way later on.
  • Use the as keyword instead of TryFrom and TryInto only if you can always be sure about the input number. Even though it doesn’t require an unsafe block it’s easy to shoot you into the foot by refactoring the code without realizing that the input value is not deterministic anymore. In this case you will have hard to determined bugs.

Result

Let’s start with a very simple code:

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

use core::arch::asm;

#[no_mangle]
#[inline(never)] 
fn ok() -> Result<(), ()> { 
    unsafe { asm!( "nop", options(nostack)) };
    Ok(()) 
}
#[no_mangle]
#[inline(never)]
fn err() -> Result<(), ()> { 
    unsafe { asm!( "nop", options(nostack)) };
    Err(())
}

#[no_mangle]
fn main() -> u8 { 
    println!("{:?}", ok());
    println!("{:?}", err());
    0
}

Rust does a good job with optimizing out the code which is not necessaryy and this feautre makes it difficult to investigate the code of a function. A simple function with a constant return value wont be put into the binary so we can not dump the assembly of it. To trick the compiler into leaving our code in the binary we can use a simple assembly line which doesn’t do anything but the since the compiler doesn’t check the value of it it just thinks that it’s important so it leaves it there. After the compilation our code looks like this:

> ./cargo.sh dump ok
00000000004012f0 <ok>:
  4012f0:       90                      nop
  4012f1:       31 c0                   xor    eax,eax
  4012f3:       c3                      ret

> ./cargo.sh dump err
0000000000401300 <err>:
  401300:       90                      nop
  401301:       b0 01                   mov    al,0x1
  401303:       c3                      ret

As you can the code of the two functions are quite similar. The rax register (or it’s parts) will be set and the code returns. In case of Ok(()) the rax is set to zero and by Err(()) it will be set to one. Since the content of the result is always zero sized () it doesn’t even need a register to be passed back to the caller function.

Let’s modify our code to return with some real value: for example an u32 number:

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

use core::arch::asm;

#[no_mangle]
#[inline(never)] 
fn ok() -> Result<u32, ()> { 
    unsafe { asm!( "nop", options(nostack)) };
    Ok(3) 
}
#[no_mangle]
#[inline(never)]
fn err() -> Result<(), u32> { 
    unsafe { asm!( "nop", options(nostack)) };
    Err(3)
}

#[no_mangle]
fn main() -> u8 { 
    println!("{:?}", ok());
    println!("{:?}", err());
    0
}

This changes already the output of the dump

> ./cargo.sh dump ok
0000000000401360 <ok>:
  401360:       90                      nop
  401361:       31 c0                   xor    eax,eax
  401363:       ba 03 00 00 00          mov    edx,0x3
  401368:       c3                      ret

> ./cargo.sh dump err
0000000000401370 <err>:
  401370:       90                      nop
  401371:       b8 01 00 00 00          mov    eax,0x1
  401376:       ba 03 00 00 00          mov    edx,0x3
  40137b:       c3                      ret

As you can see the value of Result is passed back to the caller in another register rdx. This alligns with the System V ABI.

But what happens if we use some realistic error type, like an Error enum?

#![no_std]
#![no_main]

#[macro_use]
extern crate linux;

use core::arch::asm;

#[derive(Debug, Clone)]
pub enum Error { A }

#[no_mangle]
#[inline(never)]
fn ok() -> Result<(), Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Ok(())
}

#[no_mangle]
#[inline(never)]
fn err() -> Result<(), Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Err(Error::A)
}

#[no_mangle]
fn main() -> u8 { 
    println!("{:?}", ok());
    println!("{:?}", err());
    0
}

It looks quit similar, right?

> ./cargo.sh dump ok
0000000000401310 <ok>:
  401310:       90                      nop
  401311:       31 c0                   xor    eax,eax
  401313:       c3                      ret

> ./cargo.sh dump err
0000000000401320 <err>:
  401320:       90                      nop
  401321:       b0 01                   mov    al,0x1
  401323:       c3                      ret

Then add another error variant to the Error enum and try it again

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub enum Error { A, B }
}
> ./cargo.sh dump ok
0000000000401320 <ok>:
  401320:       90                      nop
  401321:       b0 02                   mov    al,0x2
  401323:       c3                      ret

> ./cargo.sh dump err
0000000000401330 <err>:
  401330:       90                      nop
  401331:       31 c0                   xor    eax,eax
  401333:       c3                      ret

The return values have been changed. Now zero means Err(Error::A) and two means Ok(()). It seems like the compiler realizes that the Ok(()) value can only have one state so it can be represented just like another variant of the Error enum. It kind of creates another enum other the hood like

#![allow(unused)]
fn main() {
enum SpecialError {
    Err_Error_A = 0,
    Err_Error_B = 1,
    Ok = 2,
}
}

This way it’s enough to use only one register instead of two. Pretty nice, right? Let’s return with some real Ok() value to avoid this optimisation. For example a number like this:

#![allow(unused)]
fn main() {
#[no_mangle]
#[inline(never)]
fn ok() -> Result<u8, Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Ok(3)
}

#[no_mangle]
#[inline(never)]
fn err() -> Result<u8, Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Err(Error::B)
}
}

With u8 it seems to be quite good. rax=0 means Ok rax=1 means Err and the rdx holds the value.

> ./cargo.sh dump ok
0000000000401320 <ok>:
  401320:       90                      nop
  401321:       31 c0                   xor    eax,eax
  401323:       b2 03                   mov    dl,0x3
  401325:       c3                      ret

> ./cargo.sh dump err
0000000000401330 <err>:
  401330:       90                      nop
  401331:       b0 01                   mov    al,0x1
  401333:       b2 01                   mov    dl,0x1
  401335:       c3                      ret

But what happens if we try to return with a bigger number like i32?

#![allow(unused)]
fn main() {
#[no_mangle]
#[inline(never)]
fn ok() -> Result<i32, Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Ok(3)
}

#[no_mangle]
#[inline(never)]
fn err() -> Result<i32, Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Err(Error::B)
}
}

It get’s already a bit scarry

> ./cargo.sh dump ok
0000000000401330 <ok>:
  401330:       90                      nop
  401331:       48 b8 00 00 00 00 03    movabs rax,0x300000000
  401338:       00 00 00
  40133b:       c3                      ret

> ./cargo.sh dump err
0000000000401340 <err>:
  401340:       90                      nop
  401341:       b8 01 01 00 00          mov    eax,0x101
  401346:       c3                      ret

I assume since the Result is an enum too which size must be equal to the tag size + the size of the biggest inner value it tries to encode the tag and the i32 inner value into a 64 bit register. The first half of the register represents the tag (zero = Ok) and the second half the i32 value (3). As opposed to this the will still be encoded as a single integer but since the value of the Error enum can not be bigger then 2 the compiler doesn’t have to use a full 64 bit register.

And it’s just the beginning. Replace i32 with i64 and you’ll get this:

> ./cargo.sh dump ok
0000000000401330 <ok>:
  401330:       48 89 f8                mov    rax,rdi
  401333:       90                      nop
  401334:       48 c7 47 08 03 00 00    mov    QWORD PTR [rdi+0x8],0x3
  40133b:       00
  40133c:       c6 07 00                mov    BYTE PTR [rdi],0x0
  40133f:       c3                      ret

> ./cargo.sh dump err
0000000000401340 <err>:
  401340:       48 89 f8                mov    rax,rdi
  401343:       90                      nop
  401344:       66 c7 07 01 01          mov    WORD PTR [rdi],0x101
  401349:       c3                      ret

This is even more hairy… The System V ABI says that if you need to return two integer values you can use the rax and rdx registers just like we did this above. As opposed to this if the return value has a MEMORY type (eg a big struct) then the caller functions needs to reserve space for the return value and the called function will write the value there. In this case the pointer to this space is passed as a hidden first argument to the function in the rdi register and the rax register should hold the pointer to this space at the return point. Hence mov rax,rdi at the beginning of both functions. And after that the provided memory location pointed by [rdi] will be filled with tiher 0 and 0x3 for the Ok(3) or with 0x101 for the Err(Error::B)

And this is kind of sad because we’re most likely hitting L2 (but at a minimum L1) cache for returning a simple number as number instead of passing it back in two registers basically for free.

It can be corrected by forcing the compiler to use the C ABI with the extern "C" declaration:

#![allow(unused)]
fn main() {
#[no_mangle]
#[inline(never)]
extern "C" fn ok() -> Result<usize, Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Ok(3)
}

#[no_mangle]
#[inline(never)]
extern "C" fn err() -> Result<usize, Error> { 
    unsafe { asm!( "nop", options(nostack)) };
    Err(Error::B)
}
}

We’ll see a warning that the Result enum is not FFI-safe but we also not want to use it currently as an FFI function. Just as a function which doesn’t do unnecessarry work:

> ./cargo.sh dump ok
0000000000401330 <ok>:
  401330:       90                      nop
  401331:       ba 03 00 00 00          mov    edx,0x3
  401336:       31 c0                   xor    eax,eax
  401338:       c3                      ret

> ./cargo.sh dump err
0000000000401340 <err>:
  401340:       90                      nop
  401341:       b8 01 01 00 00          mov    eax,0x101
  401346:       c3                      ret

So as long as we don’t use an Ok or Err type bigger than 64 bit we should be good now. Even though it’s a bit compilicated to use there are some benefits of using Result:

  • The value must be checked instead of simply using (like by malloc). It doesn’t to access its content until you proved that it’s has an Ok or Err value. This is a huge benefit.
  • The questionmark. You can forward the error by using simply Err()?. But how does it work under the hood?

The ? operator

Jump

  • Why is it so much faster to use unconditional jumps instead of if-else? – branch predictor
  • make the common case faster
  • got-plt example

ABI

Tools

readelf

#![allow(unused)]
fn main() {
// ==============================================================================
// Elf file
// ==============================================================================
#[derive(Debug, Clone)]
pub struct File<'a>{
    pub ehdr: &'a Ehdr,
    pub phdrs: &'a [Phdr],
    pub shdrs: &'a [Shdr],
}

impl<'a> File<'a> {
    pub unsafe fn from_slice(buf: &'a [u8]) -> Self {
        Self::from_ptr(buf.as_ptr())
    }

    pub unsafe fn from_ptr(p: *const u8) -> Self {
        let ehdr = &*(p as *const Ehdr);
        let phdrs = core::slice::from_raw_parts(
            p.offset(ehdr.e_phoff as isize) as *const Phdr,
            ehdr.e_phnum as usize
        );
        let shdrs = core::slice::from_raw_parts(
            p.offset(ehdr.e_shoff as isize) as *const Shdr,
            ehdr.e_shnum as usize
        );
        Self { ehdr, phdrs, shdrs }
    }

    pub unsafe fn shstr(&self, sh_name: u32) -> &str {
        let h = &self.shdrs[self.ehdr.e_shstrndx as usize];
        let p = (self.ehdr as *const _ as *const i8)
            .add(h.sh_offset as usize)
            .add(sh_name as usize);
        CStr::from_ptr(p).to_str().unwrap()
    }

    pub unsafe fn strtab(&self, st_name: u32) -> &str {
        for h in self.shdrs.into_iter() {
            if h.sh_type == SHT::SHT_STRTAB as u32 {
                let p = (self.ehdr as *const _ as *const i8)
                    .add(h.sh_offset as usize)
                    .add(st_name as usize);
                return CStr::from_ptr(p).to_str().unwrap();
            }
        }
        panic!("Missing strtab");
    }

    pub fn symtab(&self) -> &[Sym] {
        for h in self.shdrs.into_iter() {
            if h.sh_type == SHT::SHT_SYMTAB as u32 {
                return unsafe {
                    let p = (self.ehdr as *const _ as *const i8)
                        .add(h.sh_offset as usize) as *const Sym;
                    core::slice::from_raw_parts(p, (h.sh_size / h.sh_entsize) as usize)
                };
            }
        }

        &[]
    }

    pub fn dynsym(&self) -> &[Sym] {
        for h in self.shdrs.into_iter() {
            if h.sh_type == SHT::SHT_DYNSYM as u32 {
                return unsafe {
                    let p = (self.ehdr as *const _ as *const i8)
                        .add(h.sh_offset as usize) as *const Sym;
                    core::slice::from_raw_parts(p, (h.sh_size / h.sh_entsize) as usize)
                };
            }
        }

        &[]
    }

    pub fn dynamic(&self) -> &[Dyn] {
        for h in self.shdrs.into_iter() {
            if h.sh_type == SHT::SHT_DYNAMIC as u32 {
                return unsafe {
                    let p = (self.ehdr as *const _ as *const i8)
                        .add(h.sh_offset as usize) as *const Dyn;
                    core::slice::from_raw_parts(p, (h.sh_size / h.sh_entsize) as usize)
                };
            }
        }

        &[]
    }

    pub fn dump_phdrs(&self) {
        crate::println!("Program headers:");
        //crate::println!("{:?}", ""); // NOTE: Without this it will segfault in the for loop...
        crate::println!("  {:<3} {:<12} {:<10} {:<18} {:<18} {:<10} {:<10} {:<3} {:<10}", 
            "Idx",
            "Type", 
            "Offset",
            "VirtAddr",
            "PhysAddr",
            "FileSize",
            "MemSize",
            "Flg",
            "Align"
        );

        //crate::println!("{:?}", self.phdrs);
        for (idx, h) in self.phdrs.into_iter().enumerate() {
            crate::println!("  {:<3?} {:<12} 0x{:0>8x?} 0x{:0>16x?} 0x{:0>16x?} 0x{:0>8x?} 0x{:0>8x?} {:<3} 0x{:0>8x?}",
                idx,
                h.p_type().as_str(),
                h.p_offset,
                h.p_vaddr,
                h.p_paddr,
                h.p_filesz,
                h.p_memsz,
                h.flags(),
                h.p_align
            );
        }

        crate::println!("");
    }

    pub fn dump_shdrs(&self) {
        //crate::println!("{:?}", ""); // NOTE: Without this it will segfault in the for loop...
        crate::println!("Section headers:");
        crate::println!("  {:<3} {:<13} {:<18} {:<10} {:<10} {:<3} {:<3} {:<3} {:<3} {:<3} {}", 
            "Idx",
            "Type", 
            "Address",
            "Offset",
            "Size",
            "ENS",
            "FLG",
            "LNK",
            "INF",
            "ALI",
            "Name"
        );

        //crate::println!("{:?}", self.phdrs);
        for (idx, h) in self.shdrs.into_iter().enumerate() {
            crate::println!("  {:<3} {:<13} 0x{:0>16x} 0x{:0>8x} 0x{:0>8x} {:<3} {:<3} {:<3} {:<3} {:<3} {}", 
                idx,
                h.sh_type().as_str(),
                h.sh_addr,
                h.sh_offset,
                h.sh_size,
                h.sh_entsize,
                h.sh_flags,
                h.sh_link,
                h.sh_info,
                h.sh_addralign,
                unsafe { self.shstr(h.sh_name) }
            );
        }

        crate::println!("");
    }

    fn dump_symbols(&self, symbols: &[Sym]) {
        crate::println!("  {:<3} {:<8} {:<18} {:<10} {:<6} {:<10} {:<5} {}", 
            "Idx",
            "Type",
            "Address", 
            "Size",
            "Bind",
            "Visibility",
            "Shndx",
            "Name"
        );

        //crate::println!("{:?}", self.phdrs);
        for (idx, s) in symbols.into_iter().enumerate() {
            crate::println!("  {:<3} {:<8} 0x{:0>16x} 0x{:0>8x} {:<6} {:<10} {:<5} {}", 
                idx,
                s.st_type().as_str(),
                s.st_value,
                s.st_size,
                s.st_bind().as_str(),
                s.st_visibility().as_str(),
                s.st_shndx,
                unsafe { self.strtab(s.st_name) }
            );
        }
    }

    pub fn dump_symtab(&self) {
        //crate::println!("{:?}", ""); // NOTE: Without this it will segfault in the for loop...
        let symtab = self.symtab();

        crate::println!("Static symbols: {:?}", symtab.len());
        if symtab.len() > 0 {
            self.dump_symbols(symtab);
        }

        crate::println!("");
    }

    pub fn dump_dynsym(&self) {
        //crate::println!("{:?}", ""); // NOTE: Without this it will segfault in the for loop...
        let dynsym = self.dynsym();
        crate::println!("Dynamic symbols: {:?}", dynsym.len());
        if dynsym.len() > 0 {
            self.dump_symbols(dynsym);
        }

        crate::println!("");
    }

    pub fn dump_dynamic(&self) {
        let dynamic = self.dynamic();
        crate::println!("Dynamic reloaction: {:?}", dynamic.len());

        if dynamic.len() == 0 {
            return;
        }

        crate::println!("  {:<3} {:<15} {}", "Idx", "Type", "Value");

        //crate::println!("{:?}", self.phdrs);
        for (idx, s) in dynamic.into_iter().enumerate() {
            match s.d_tag() {
                DT::DT_SONAME | DT::DT_NEEDED => {
                    crate::println!("  {:<3} {:<15} {}",
                        idx,
                        s.d_tag().as_str(),
                        unsafe { self.strtab(s.d_val as u32) }
                    );
                }
                _ => {
                    crate::println!("  {:<3} {:<15} 0x{:0>16x}",
                        idx,
                        s.d_tag().as_str(),
                        s.d_val
                    );
                }
            }

            if let DT::DT_NULL = s.d_tag() {
                break
            }
        }

        crate::println!("");
    }
}
}

The dynamic linker

In these series we’re going to implement a basic version of a dynamic linker to load and relocate symbols at runtime. Let’s make a step back and see what’s dynamic linking all about. As always, we’re going to use the simplest example programs and for that we have to write some assembly. Let’s get some definitions done to avoid confusion:

Object file:

An object file is a compilation unit in which all the necessary information is collected. Once the program files are compiled and organized into object these files can be linked together to form a library or an executable file. This job is done by the program linker.

Let’s create a simplistic object file which only holds a global variable. We could use this variable to specify the exit code fo the program so let’s call the file rc.s and the variable RC

global RC:data
section .data
RC: db 1

You can compile and print the most important information about the object file like this: (I removed some unimportant lines)

> nasm -f elf64 rc.s
> readelf -Wa rc.o
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          64 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         5
  Section header string table index: 2

Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .data     PROGBITS 0000000000000000 000180 000001 00  WA  0   0  4
  [ 2] .shstrtab STRTAB   0000000000000000 000190 000021 00      0   0  1
  [ 3] .symtab   SYMTAB   0000000000000000 0001c0 000060 18      4   3  8
  [ 4] .strtab   STRTAB   0000000000000000 000220 00000b 00      0   0  1

Symbol table '.symtab' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ./rc.s
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .data
     3: 0000000000000000     0 OBJECT  GLOBAL DEFAULT    1 RC

This tells us the followings:

  • The type of the file is REL (Relocatable file)
  • It has a .data section (section .data in asm)
  • It has a global symbol called RC (last line) and it’s located in the fist section (Ndx=1) which is the .data section

As you can see the value of all the symbols are zero. The value should be a memory addres which the symbol points to, so how can it be zero? It will be updated by the linker once this object file is merged into an executable or a shared library.

Static linking:

The simplest way to create an elf binary is merging all of its parts into a single file. This allows it to be fully independent from any other userspace code. As a result you can put it into a docker container / chroot environment and it will just run.

Let’s reimplement the /bin/false command in such a way: The only thin the false command does is exiting with 1 as return code. To make it a bit more interesting let’s use the rc.o file as a static library which we can include into our binary and use the value of RC defined there as the exit code of our binary. The source of our false.s looks lie this:

global _start
extern RC:data

section .text
_start:
    mov rdi,[RC]
    mov rax,0x3c
    syscall

With extern RC:data we tell the assembler that the RC with type data exists somewhere in another object file which we will link against. With mov rdi,[RC] we say the compiler to go to the address marked by RC and read the value of the memory there and move it intot the rdi register. This register is used as the return value of the exit system call. We can compile, link and run like this:

> nasm -f elf64 ./false.s
> ld -static ./false.o ./rc.o -o ./false
> ./false; echo $?
1

Let’s have a closer look with readelf

> readelf -Wa ./false
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          8480 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         6
  Section header string table index: 5

Section Headers:
  [Nr] Name      Type     Address          Off    Size   ES Flg Lk Inf Al
  [ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
  [ 1] .text     PROGBITS 0000000000401000 001000 00000f 00  AX  0   0 16
  [ 2] .data     PROGBITS 0000000000402000 002000 000001 00  WA  0   0  4
  [ 3] .symtab   SYMTAB   0000000000000000 002008 0000c0 18      4   3  8
  [ 4] .strtab   STRTAB   0000000000000000 0020c8 00002d 00      0   0  1
  [ 5] .shstrtab STRTAB   0000000000000000 0020f5 000027 00      0   0  1

Program Headers:
  Type Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0000e8 0x0000e8 R   0x1000
  LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x00000f 0x00000f R E 0x1000
  LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000001 0x000001 RW  0x1000

 Section to Segment mapping:
  Segment Sections...
   00
   01     .text
   02     .data

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ./false.s
     2: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ./rc.s
     3: 0000000000401000     0 NOTYPE  GLOBAL DEFAULT    1 _start
     4: 0000000000402001     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start
     5: 0000000000402001     0 NOTYPE  GLOBAL DEFAULT    2 _edata
     6: 0000000000402008     0 NOTYPE  GLOBAL DEFAULT    2 _end
     7: 0000000000402000     0 OBJECT  GLOBAL DEFAULT    2 RC

As we can see the Type of the file is EXC (Executable file) now and as a result there is a new part in this dump compared to the output of rc.o: Program Headers. The three lines under the Program Headers describes how this executable needs to loaded into the memory when it gets run. The first line has R in the Flg column meaning that it can only be read. The second line has RE meaning it can be read and executed. This hold the .text section as we defined in the assembly code with section .text. The third line shows the .data section which can be read and written (Flg = RW)

We can also see that the RC symbol was merged into the .symtab of this file and it points to the location 0x0000000000402000 (last line). As in the rc.o file the RC symbol is located in the .data section (Ndx = 2 meaning the index two in the Section Headers above. One can also use the Value of the symbol (0x0000000000402000) and find the same address in the Address column of the Section Headers. This means that the RC symbol points exactly to the byte of the .data section.

Since we have an executable part in our file we can dump its content with objdump like this:

> objdump -M intel -d ./false
0000000000401000 <_start>:
  401000:       48 8b 3c 25 00 20 40    mov    rdi,QWORD PTR ds:0x402000
  401007:       00
  401008:       b8 3c 00 00 00          mov    eax,0x3c
  40100d:       0f 05                   syscall

As you can see the mov rdi,[RC] was replaced with mov rdi,QWORD PTR ds:0x402000. As you can see the address in this intruction is the same as the Value of the RC symbole. So it point to the same byte of the .data section and it will use the value located there which is in our case 1.

Let’s checkout the memory mappings of our executable in gdb

> gdb ./false
(gdb) break _start
(gdb) run
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
            0x400000           0x401000     0x1000        0x0  r--p   /false
            0x401000           0x402000     0x1000     0x1000  r-xp   /false
            0x402000           0x403000     0x1000     0x2000  rw-p   /false
      0x7ffff7ff9000     0x7ffff7ffd000     0x4000        0x0  r--p   [vvar]
      0x7ffff7ffd000     0x7ffff7fff000     0x2000        0x0  r-xp   [vdso]
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

As you can see there are three mappings pointing to our executable with the same permission as we discussed about the Program Headers part. These mappings are created by the kernel as it’s initalizes our process. As you can see the Start and End Address are the same as the VirtAddr in the Program Headers.

Dynamic linking

As you could see in case of static linking all the code will be merged into a single executable. This makes everything really simple but it also means that if there is two executable using the same library the code of the library will be two times in the memory. It also takes twice as much space on the disk. Since the code is not shared it can not have the same cache entries either. So even though the library code is exactly the same if one process loads it into the CPU cache another needs to overwrite it resulting into constant cache misses.

Luckily there is a solution for that called shared libraries. But as always flexibility brings complexity. Let’s create a shared library from the rc.o and link our false.o dynamically against it. Since the code of rc.s is dead simple, it doesn’t need to be recompiled with nasm. But for the bigger code bases needs to be written differently if it’s mean to be a shared library. More about that later on. To create the lib we have to link it as shared.

> ld -shared rc.o -o rc.so

Since this will generate much more information we don’t print everything with readelf -Wa but only the important parts with some command line flags. All of them can be found with readelf --help.

> readelf -Wh rc.so | grep Type
Type: DYN (Shared object file)

As we can see in the elf header the type of this file is DYN (Shared object file). In the Program Headers wen can see that there is no more execution part (Flg=RE) but there are some other types like DYNAMIC and GNU_RELRO. TODO: Describe what are these for.

> readelf -Wl ./rc.so
Program Headers:
  Type      Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD      0x000000 0x0000000000000000 0x0000000000000000 0x001000 0x001000 R   0x1000
  LOAD      0x001f40 0x0000000000001f40 0x0000000000001f40 0x0000c1 0x0000c1 RW  0x1000
  DYNAMIC   0x001f40 0x0000000000001f40 0x0000000000001f40 0x0000c0 0x0000c0 RW  0x8
  GNU_RELRO 0x001f40 0x0000000000001f40 0x0000000000001f40 0x0000c0 0x0000c0 R   0x1

Let’s checkout the symbols in this file

> readelf -Ws rc.so
Symbol table '.dynsym' contains 2 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000002000     0 OBJECT  GLOBAL DEFAULT    7 RC

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ./rc.s
     2: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS
     3: 0000000000001f40     0 OBJECT  LOCAL  DEFAULT    6 _DYNAMIC
     4: 0000000000002000     0 OBJECT  GLOBAL DEFAULT    7 RC

It looks a bit different from the one we saw in our staticly linked binary or in our object file. It has now the address of 0x0000000000002000 which is much smaller then the on we saw in the staticly linked binary (0x0000000000402000). This happens because it’s still an intermediate address. It shows only where it is located in the shared object file. As opposed to this in the statically linked binary it showed us a real memory address where it will be located once the code is loaded in to the memory and the process gets run.

By static linking we have the luxury that we can expect that the code will be mapped always into the same location of the memory (0x0000000000400000) and so we can calculate the absolute addresses of the symbols already at the link time. As opposed to this the dynamically loaded libraries must expect to be loaded into a random location of the address space. Otherwise we should have a global register about the memory addresses where the different libraries are going to be loaded. (A bit like the public ip addresses get assigned to companies).

As a result all the symbol addreses of a shared library needsto be updated once it got loaded into the memory. That’s the job of the dynamic loader which we are going to implement in these series.

But first let’s create our executable by dynamically linking against our rc.so library. This time we need to modify our source code. Since the executable can only know the exact location of the library once it’s got loaded we have to write our code in a way which respects this approach

global _start
extern RC:data

section .text
_start:
    mov rax,[rel RC wrt ..got]
    mov rdi,[rax]
    mov rax,0x3c
    syscall

Let’s recompile and run our command. To do that we need to find the dynamic loader of the system which can be done like this

> ls /lib64/ld*
/lib64/ld-linux-x86-64.so.2

Now we can pass it to our linker

> nasm -f elf64 rc.s
> ld --dynamic-linker /lib64/ld-linux-x86-64.so.2 -o false false.o -L. -l:./rc.so
> ./false; echo $?
1

Let’s checkout the memory in gdb

> gdb ./false
(gdb) break _start
(gdb) run
Starting program: /home/taabodal/work/blog/code/target/false
Breakpoint 1, 0x00007ffff7fe3290 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
            0x400000           0x401000     0x1000        0x0  r--p   /false
            0x401000           0x402000     0x1000     0x1000  r-xp   /false
            0x402000           0x404000     0x2000     0x2000  rw-p   /false
      0x7ffff7fbd000     0x7ffff7fc1000     0x4000        0x0  r--p   [vvar]
      0x7ffff7fc1000     0x7ffff7fc3000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7fc3000     0x7ffff7fc5000     0x2000        0x0  r--p   /lib/ld.so
      0x7ffff7fc5000     0x7ffff7fef000    0x2a000     0x2000  r-xp   /lib/ld.so
      0x7ffff7fef000     0x7ffff7ffa000     0xb000    0x2c000  r--p   /lib/ld.so
      0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x37000  rw-p   /lib/ld.so
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

There are multiple things to see: Even though we break at the _start function of ours it stoppes at another _start function. This is the one of the dynamic linker (ld.so). (Note that I rewrote name of the ld.so because it doesn’t matter but makes the look of the article ugly) The other thing is to see is that compared to our static binary there is the dynamic loader also mapped into our memory address space. And if you hit continue in the debugger, let it stop at our _start function and check the mappings again you’ll see that the rc.so is mapped to. The loading of such shared libraries at the startup of the program is one of the jobs of the dynamic loader.

(gdb) continue
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
            0x400000           0x401000     0x1000        0x0  r--p   /false
            0x401000           0x402000     0x1000     0x1000  r-xp   /false
            0x402000           0x403000     0x1000     0x2000  r--p   /false
            0x403000           0x404000     0x1000     0x3000  rw-p   /false
      0x7ffff7fb6000     0x7ffff7fb8000     0x2000        0x0  rw-p
      0x7ffff7fb8000     0x7ffff7fb9000     0x1000        0x0  r--p   /rc.so
      0x7ffff7fb9000     0x7ffff7fba000     0x1000     0x1000  r--p   /rc.so
      0x7ffff7fba000     0x7ffff7fbb000     0x1000     0x2000  rw-p   /rc.so
      0x7ffff7fbb000     0x7ffff7fbd000     0x2000        0x0  rw-p
      0x7ffff7fbd000     0x7ffff7fc1000     0x4000        0x0  r--p   [vvar]
      0x7ffff7fc1000     0x7ffff7fc3000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7fc3000     0x7ffff7fc5000     0x2000        0x0  r--p   /lib/ld.so
      0x7ffff7fc5000     0x7ffff7fef000    0x2a000     0x2000  r-xp   /lib/ld.so
      0x7ffff7fef000     0x7ffff7ffa000     0xb000    0x2c000  r--p   /lib/ld.so
      0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x37000  rw-p   /lib/ld.so
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

Position independent executable (PIE)

As we discussed above all the shared libraries needs to be position independent, since they can be loaded anywhere in the memory. To achive that we have to write pisition independent code (PIC) or instruct the compiler to write pic assembly for us (gcc -fpic). But can we do the same for executables? Yes we can. In princip it is the same process. We need to write code that must expect to be loaded anywhere in the memory and link it with the -pie flag. Since the source code of our executable is basicly empty we can already link it as pie. A position independent executable can statically as well as dynamically linked. There is a restriction though. All the components which we are linking against needs to be written in a PIC way. By dynamic libraries it is by default so, but in case of static libraries we need to rewrite or regenerate our code in a PIC way.

Static PIE

Our false.s should look like this now:

global _start
extern RC:data

section .text
_start:
    mov rax,[rel RC wrt ..got]
    lea rdi,[rax]
    mov rax,0x3c
    syscall

And we can compile it like

> nasmf -f elf64 false.s
> ld -static -pie --no-dynamic-linker -o false false.o rc.o

It will change the header of the elf file

> readelf -Wh ./false | grep Type
Type: DYN (Position-Independent Executable file)

Create the DYNAMIC and GNU_RELRO program headers

> readelf -Wl ./false
Program Headers:
  Type      Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD      0x000000 0x0000000000000000 0x0000000000000000 0x0001d9 0x0001d9 R   0x1000
  LOAD      0x001000 0x0000000000001000 0x0000000000001000 0x000015 0x000015 R E 0x1000
  LOAD      0x002000 0x0000000000002000 0x0000000000002000 0x000000 0x000000 R   0x1000
  LOAD      0x002f20 0x0000000000002f20 0x0000000000002f20 0x0000e1 0x0000e1 RW  0x1000
  DYNAMIC   0x002f20 0x0000000000002f20 0x0000000000002f20 0x0000e0 0x0000e0 RW  0x8
  GNU_RELRO 0x002f20 0x0000000000002f20 0x0000000000002f20 0x0000e0 0x0000e0 R   0x1

And if we have a look at the mapping of the running process we can see that our executable wasn’t mapped at 0x400000 anymore but at 0x7ffff7ffb000.

> gdb ./false
(gdb) break _start
(gdb) run
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
      0x7ffff7ff5000     0x7ffff7ff9000     0x4000        0x0  r--p   [vvar]
      0x7ffff7ff9000     0x7ffff7ffb000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7ffb000     0x7ffff7ffc000     0x1000        0x0  r--p   /false
      0x7ffff7ffc000     0x7ffff7ffd000     0x1000     0x1000  r-xp   /false
      0x7ffff7ffd000     0x7ffff7fff000     0x2000     0x2000  rw-p   /false
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

Dynamic PIE

We can use the same false.s like we did in the dynamic library section and link it with

> ld -pie --dynamic-linker /lib64/ld-linux-x86-64.so.2 -o false false.o -L. -l:./rc.so

In gdb we can also see that it was mapped into the high address range:

> gdb ./false
(gdb) break _start
(gdb) run
(gdb) continue
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
      0x555555554000     0x555555555000     0x1000        0x0  r--p   /false
      0x555555555000     0x555555556000     0x1000     0x1000  r-xp   /false
      0x555555556000     0x555555557000     0x1000     0x2000  r--p   /false
      0x555555557000     0x555555558000     0x1000     0x3000  rw-p   /false
      0x7ffff7fb6000     0x7ffff7fb8000     0x2000        0x0  rw-p
      0x7ffff7fb8000     0x7ffff7fb9000     0x1000        0x0  r--p   /rc.so
      0x7ffff7fb9000     0x7ffff7fba000     0x1000     0x1000  r--p   /rc.so
      0x7ffff7fba000     0x7ffff7fbb000     0x1000     0x2000  rw-p   /rc.so
      0x7ffff7fbb000     0x7ffff7fbd000     0x2000        0x0  rw-p
      0x7ffff7fbd000     0x7ffff7fc1000     0x4000        0x0  r--p   [vvar]
      0x7ffff7fc1000     0x7ffff7fc3000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7fc3000     0x7ffff7fc5000     0x2000        0x0  r--p   /lib/ld.so
      0x7ffff7fc5000     0x7ffff7fef000    0x2a000     0x2000  r-xp   /lib/ld.so
      0x7ffff7fef000     0x7ffff7ffa000     0xb000    0x2c000  r--p   /lib/ld.so
      0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x37000  rw-p   /lib/ld.so
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

There is a new section in the output of readelf which we haven’t seen before: the Relocations

> readelf -Wr false
Relocation section '.rela.dyn' at offset 0x298 contains 1 entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000002ff8  0000000100000006 R_X86_64_GLOB_DAT      0000000000000000 RC + 0

As we’re referencing a variable which is located in a shared library we can not the address of it before the library will be mapped. So the linker does an indiretion for us. Instread of referencing the variable directly we are referencing a memory address which is tied to our binary and which serves as a pointer to the real address of the variable. Hence the assembly code mov rax,[rel RC wrt ..got] which could be interpreted like this:

  1. Calculate a relative location of RC With Reference To GOT
  2. Put the value which can be found in this location into rax

So what is GOT? It resolves to the Global Offset Table. GOT is a location in our program which we can use to delay the referencing of a value. It works a bit like a phone book. You know the name of the person you wanna call so you look the number of it in the book and after you use that number to reach the person. It’s also a bit different from the book because it’s empty at the beginning of our program.

> xxd -c8 false | grep 002ff8
00002ff8: 0000 0000 0000 0000  ........

If you look at the offset of the relocation (0x0000000000002ff8) and look it up in the Section Headers, you’ll see that it points to the first element of the GOT

> readelf -WS false | grep -E '0000000000002ff8|Name'
  [Nr] Name Type     Address          Off    Size   ES Flg Lk Inf Al
  [10] .got PROGBITS 0000000000002ff8 002ff8 000008 08  WA  0   0  8

Every time a pie program is started the dynamic linker will check if there is any relocations in the program which needs to be made and if there is any, it’ll fix up the addresses of the executable. It’s also true for every dynamic libraries.

Let’s prove this with gdb

> gdb ./false
(gdb) break _start
(gdb) run
(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
      0x555555554000     0x555555555000     0x1000        0x0  r--p   /false
      0x555555555000     0x555555556000     0x1000     0x1000  r-xp   /false
      0x555555556000     0x555555558000     0x2000     0x2000  rw-p   /false
      0x7ffff7fbd000     0x7ffff7fc1000     0x4000        0x0  r--p   [vvar]
      0x7ffff7fc1000     0x7ffff7fc3000     0x2000        0x0  r-xp   [vdso]
      0x7ffff7fc3000     0x7ffff7fc5000     0x2000        0x0  r--p   /lib/ld.so
      0x7ffff7fc5000     0x7ffff7fef000    0x2a000     0x2000  r-xp   /lib/ld.so
      0x7ffff7fef000     0x7ffff7ffa000     0xb000    0x2c000  r--p   /lib/ld.so
      0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x37000  rw-p   /lib/ld.so
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

As you can see our prgram was mapped at the address of 0x555555554000. If we add the offset of the relocation to this address we can get the value of this memory region. At this point it is zero because the dynamic linker has just started and haven’t done any fixings. Once we let the program continue and stop on out _start function the dynamic linker has already finished it’s first job and the value pointed by the relocation has been changed.

(gdb) x/1gx 0x555555554000 + 0x002ff8
0x555555556ff8: 0x0000000000000000

(gdb) continue

(gdb) x/1gx 0x555555554000 + 0x002ff8
0x555555556ff8: 0x00007ffff7fba000

At this point our program is ready to use this indirection to access the memory location of RC.

(gdb) x/1bx 0x00007ffff7fba000
0x7ffff7fba000: 0x01

Conclusion

To summarize the above we could say the followings:

  • PIC: Position independent code is a type of assembly code which only uses relative addressing. This is must for dynamicly linked libaries and an option for the executables.
  • PIE: Poisition independent executable is an executable which written with PIC code only and so it can be loaded anywhere in the memory address sapce
  • Object file: is a compilation unit which will be relocated during the linkage. Multiple object files can be merges into an archive (static library) a shared object (dynamic library) or into an executable.
  • Shared object file: is a dynamically linked library which can be loaded anywhere in the memory because it’s written in PIC
  • Static linking: is way to combine multiple object files into a single executable
  • Dynamic linking: is a way to tell the linker that some of the dependencies will only be available at runtime

Interpreter

Let’s create a basic dynamic loader (aka interpreter) which can hand over the control to our executable. To be able to do that we need to have information about where is the entry point of the main executable.

When the kernel loads the program it has to find out a couple of iformation about it. Although the dynamic loader could do the same since there informations have already been parsed the kernel can simply put them onto the stack of the process and let the dynamic loader to find them. These are the information in the auxiliary vector which we already implemented earlier.

For a quick recap let’s print out the values. Let’s use ls command to check what kind of data will be passed when it gets started.

> gdb /bin/ls
(gdb) break _start
(gdb) run
(gdb) info auxv
33   AT_SYSINFO_EHDR      System-supplied DSO's ELF header 0x7ffff7fc1000
51   AT_MINSIGSTKSZ       Minimum stack size for signal delivery 0xe30
16   AT_HWCAP             Machine-dependent CPU capability hints 0xf8bfbff
6    AT_PAGESZ            System page size               4096
17   AT_CLKTCK            Frequency of times()           100
3    AT_PHDR              Program headers for program    0x555555554040
4    AT_PHENT             Size of program header entry   56
5    AT_PHNUM             Number of program headers      13
7    AT_BASE              Base address of interpreter    0x7ffff7fc3000
8    AT_FLAGS             Flags                          0x0
9    AT_ENTRY             Entry point of program         0x55555555aaa0
11   AT_UID               Real user ID                   1066129479
12   AT_EUID              Effective user ID              1066129479
13   AT_GID               Real group ID                  1065878017
14   AT_EGID              Effective group ID             1065878017
23   AT_SECURE            Boolean, was exec setuid-like? 0
25   AT_RANDOM            Address of 16 random bytes     0x7fffffffec19
26   AT_HWCAP2            Extension of AT_HWCAP          0x2
31   AT_EXECFN            File name of executable        0x7fffffffefec "/usr/bin/ls"
15   AT_PLATFORM          String identifying platform    0x7fffffffec29 "x86_64"
0    AT_NULL              End of vector                  0x0

As we can see the entry point is marked by AT_ENTRY so let’s find that value. Our main function could simply look like this:

#[no_mangle]
fn main() -> u8 { 
    for aux in linux::env::auxv() {
        if let AT::AT_ENTRY(entry) = aux {
            unsafe { 
                core::arch::asm!(
                    "jmp {}", 
                    in(reg) entry,
                    options(nostack, noreturn),
                );
            }
        }
    }

    unreachable!()
}

We also need to rebuild our false binary. And when we’re there we should also get rid of the compelxity of using a library. Let’s create a simple executable which exits with one.

global _start

section .text
_start:
    lea rdi,0x1
    mov rax,0x3c
    syscall

Wenn we recompile it with our rust binary as an dynamic linker we can prove the result with

> nasm -f elf64 false.s
> ld -pie --dynamic-linker ld.so -o false false.o

> readelf -Wl false | grep interpreter
[Requesting program interpreter: ld.so]

> ./false; echo $?
1

As you can see we started the false executable and still our rust binary got run first. So far so good but what happens if we need to run a non-pie executable? Let’s rebuild false with -no-pie option.

> ld -no-pie --dynamic-linker ld.so -o false false.o
> file false
false: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

It will be statically linked. It looks like we do need to link against a shared library to convince the linker to create a dynamically linked executable. So let’s link against our rc.so even if we don’t use the variable defined there anymore.

> ld -no-pie --dynamic-linker ld.so -o false false.o -L. -l:rc.so
> file false
false: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter ld.so, not stripped

It looks better now but if we try to run it then we see the problem

> ./false
panicked at Segmentation fault

Let find out with gdb what’s the cause:

> gdb ./false
(gdb) b _start
(gdb) r
Starting program: /home/taabodal/work/blog/code/target/false
Cannot access memory at address 0x66c1d40f66eec178
Cannot access memory at address 0x66c1d40f66eec170
Cannot access memory at address 0x66c1d40f66eec178
Cannot access memory at address 0x66c1d40f66eec178
Cannot access memory at address 0x66c1d40f66eec170

(gdb) info proc mappings
          Start Addr           End Addr       Size     Offset  Perms  objfile
            0x400000           0x401000     0x1000        0x0  r--p   /ld.so
            0x401000           0x404000     0x3000     0x1000  r-xp   /ld.so
            0x404000           0x405000     0x1000     0x4000  r--p   /ld.so
            0x406000           0x407000     0x1000     0x5000  rw-p   /ld.so
            0x407000           0x408000     0x1000        0x0  rw-p
      0x7ffff7ff9000     0x7ffff7ffd000     0x4000        0x0  r--p   [vvar]
      0x7ffff7ffd000     0x7ffff7fff000     0x2000        0x0  r-xp   [vdso]
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0  rw-p   [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0  --xp   [vsyscall]

As you can see there is a bunch of memory address at the startup of the process and if we list the mappings there is no executable at all. We only have the ld.so mapped into the address space. So what’s the problem? We built a binary which depens on the where it gets loaded but at the standard position (0x400000) we have already mapped our ld.so and it collides with the binary it should load. We should really build our ld.so with pie so it can live together with pie and non-pie executable in the same address space. We can do that by specifiying the link arguments of our rust binary in the cargo.sh like this -nostartfiles -pie -Wl,--no-dynamic-linker. Once we’ve done that it should be mapped into a random location and let the main executable do its job.

> ./false
Segmentation fault

Or not… But what’s the problem?

> gdb ./false
(gdb) r
Program received signal SIGSEGV, Segmentation fault.
0x0000000000001140 in ?? ()

(gdb) backtrace
#0  0x0000000000001140 in ?? ()
#1  0x00007ffff7ff8629 in linux::__rust_main (rsp=<optimized out>) at lib.rs:63
#2  0x00007ffff7ff85c4 in linux::_start () at lib.rs:42

(gdb) up
(gdb) disassemble
Dump of assembler code for function linux::__rust_main:
   0x00007ffff7ff85e0 <+0>:  push   rbp
   0x00007ffff7ff85e1 <+1>:  mov    rbp,rsp
   0x00007ffff7ff85e4 <+4>:  mov    rax,QWORD PTR [rdi]
   0x00007ffff7ff85e7 <+7>:  lea    rax,[rdi+rax*8]
   0x00007ffff7ff85eb <+11>: add    rax,0x10
   0x00007ffff7ff85ef <+15>: mov    rcx,rax
   0x00007ffff7ff85f2 <+18>: data16 data16 data16 data16 cs nop WORD PTR [rax+rax*1+0x0]
   0x00007ffff7ff8600 <+32>: cmp    QWORD PTR [rcx],0x0
   0x00007ffff7ff8604 <+36>: lea    rcx,[rcx+0x8]
   0x00007ffff7ff8608 <+40>: jne    0x7ffff7ff8600 <linux::__rust_main+32>
   0x00007ffff7ff860a <+42>: add    rdi,0x8
   0x00007ffff7ff860e <+46>: mov    QWORD PTR [rip+0x59eb],rdi    # 0x7ffff7ffe000
   0x00007ffff7ff8615 <+53>: mov    QWORD PTR [rip+0x59ec],rax    # 0x7ffff7ffe008
   0x00007ffff7ff861c <+60>: mov    QWORD PTR [rip+0x59ed],rcx    # 0x7ffff7ffe010
   0x00007ffff7ff8623 <+67>: call   QWORD PTR [rip+0x598f]        # 0x7ffff7ffdfb8
=> 0x00007ffff7ff8629 <+73>: pop    rbp
   0x00007ffff7ff862a <+74>: ret
End of assembler dump.

(gdb) x/1gx 0x7ffff7ffdfb8
0x7ffff7ffdfb8: 0x0000000000001140

It seems like we’re doing some relative addressing there and trying to jump to the location 0x0000000000001140. At this address there is definitelly nothing to look for. So what’s this address? Where does it come from? It seems to be a relative relocation to somewhere. But where

> readelf -Wr bin | grep 1140
0000000000006fb8  0000000000000008 R_X86_64_RELATIVE                         1140

> nm --demangle=rust bin | grep 1140
0000000000001140 T main

That’s our main function. The ld.so try to start its main function but to be able to call that it needs to be relocated first. And who will do this relocation if there is no ld.so running? Well, there is one. We’re building it right now…

To be continued…

Self relocation

GOT relocation

PLT relocation