Alignment

Now that we have implemented a couple of handy helper methods we can go back to the question from chapter one: What is the Options(nostack) in the _start assembly block used for. Let’s put a panic into the main function:

#[no_mangle]
fn main() -> u8 { 
    painc!();
}

and execute the program

> ./cargo.sh build
> ./target/bin
panicked at ./bin.rs:9:5:
explicit panic

It all looks fine, right? But what happens if you remove the nostack option from the assembly block of the _start function?

> ./cargo.sh build
> ./target/bin
Segmentation fault (core dumped)

The process crashes with segfault. Let’s analyse that in gdb:

gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) run
Starting program: /home/taabodal/work/blog/src/chapter-02/target/bin

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402557 in rust_begin_unwind ()

(gdb) disassemble
Dump of assembler code for function rust_begin_unwind:
   ...
   0x0000000000402552 <+50>:    movups xmm0,XMMWORD PTR [rsp+0x78]
=> 0x0000000000402557 <+55>:    movaps XMMWORD PTR [rsp+0x60],xmm0
   0x000000000040255c <+60>:    movaps xmm0,XMMWORD PTR [rsp+0x60]
   0x0000000000402561 <+65>:    movaps XMMWORD PTR [rsp+0x50],xmm0
   ...
End of assembler dump.

From the output above I removed some lines to make it easier to digest. The process crashes at the instruction movaps XMMWORD PTR [rsp+0x60],xmm0. The line above moveups seems to be quiet similar but it doesn’t crashes. Let’s lookup what these instructions are doing:

movaps: Move Aligned Packed Single Precision Floating-Point Values
movups: Move Unaligned Packed Single Precision Floating-Point Values

The key difference between these two is alignment of the memory address. While movups doesn’t expect any alignment of the memory address the movaps expects that it is 16/32/64 byte aligned:

When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version), 32-byte (VEX.256 encoded version) or 64-byte (EVEX.512 encoded version) boundary or a general-protection exception (#GP) will be generated.

The instruction which crashes process uses [rsp+0x60] as memory address. 0x60 is 16 byte aligned but what is the value of the rsp register? Let’s go back to gdb and print the current value of the register with

(gdb) info registers rsp
rsp            0x7fffffffe828      0x7fffffffe828

It seems like we have found the reason: The value of rsp is not 16 byte aligned so [rsp+0x60] wont be 16 byte aligned either which causes the processor to throw a general-protection exception.

That’s all nice but if the aligment of the memory address is so important then why does the compiler not check if rsp is in good state before calling movaps? As always the System V ABI has the answer for this question. In the section 3.2.2 The Stack Frame it says:

The end of the input argument area shall be aligned on a 16 byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 when control is transferred to the function entry point.

Since it’s documented in the ABIs calling convention the compiler can asume that before a function is called the rsp is 16 byte aligned. So if it doesn’t do stack operation which misaligns the stack it should remain 16 byte aligned. Let’s go back gdb and check the stack alignment throught of our process:

> gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) break _start
(gdb) break rust_begin_unwind
(gdb) run
Breakpoint 1, 0x0000000000402500 in _start ()

(gdb) info registers rsp
rsp            0x7fffffffe960      0x7fffffffe960

(gdb) continue
Breakpoint 2, 0x0000000000402520 in rust_begin_unwind ()

(gdb) info registers rsp
rsp            0x7fffffffe8b0      0x7fffffffe8b0

(gdb) disassemble
Dump of assembler code for function rust_begin_unwind:
=> 0x0000000000402520 <+0>:     sub    rsp,0x88
   0x0000000000402527 <+7>:     mov    QWORD PTR [rsp+0x10],rdi
   ...

As wee can see the rsp register is 16 byte aligned at the beginning of both, the _start and rust_begin_unwind functions. The problem seems to be comming after that: the first instruction of the rust_begin_unwind function substract 0x88 from the stack pointer which becomes unaligned this way. But why does it do that if it knows that movaps needs 16 byte alignment?

The reason for that is that the rsp has to be 16 byte aligned before the call instruction is executed. Since call instruction pushes the return current value of the instruction pointer (rip) onto the stack which is 8 byte long the compiler needs to compensate this as the first step of every function call. So the sub rsp,0x88 should actually make the rsp 16 byte aligned again which means that it wasn’t aligned at all wenn the rust_begin_unwind function was started. To find out when did it get misaligned we need to go up on the stack frames and check the rsp registers. Let’s see how does the stackframes look like:

(gdb) backtrace
#0  0x0000000000402520 in rust_begin_unwind ()
#1  0x0000000000401033 in core::panicking::panic_fmt () at library/core/src/panicking.rs:72
#2  0x00000000004010dc in core::panicking::panic () at library/core/src/panicking.rs:146
#3  0x000000000040128d in main ()

(gdb) up
#1  0x0000000000401033 in core::panicking::panic_fmt () at library/core/src/panicking.rs:72
72      in library/core/src/panicking.rs

(gdb) info registers rsp
rsp            0x7fffffffe8b8      0x7fffffffe8b8

(gdb) up
#2  0x00000000004010dc in core::panicking::panic () at library/core/src/panicking.rs:146
146     in library/core/src/panicking.rs

(gdb) info registers rsp
rsp            0x7fffffffe8f8      0x7fffffffe8f8

(gdb) up
#3  0x000000000040128d in main ()

(gdb) info registers rsp
rsp            0x7fffffffe948      0x7fffffffe948

If if go up on the stackframes we can checkout the value of registers right before call instruction. The bad news is that the rsp seems to be misaligned already in the main function. This means that the whole code is corrupted. Since main function is called from our _start function let’s invesigate that one:

> gdb ./target/bin
(gdb) set disassembly-flavor intel
(gdb) break _start
(gdb) run
(gdb) disassemble
Dump of assembler code for function _start:
=> 0x0000000000402500 <+0>:     push   rax
   0x0000000000402501 <+1>:     call   0x401270 <main>
   0x0000000000402506 <+6>:     mov    rdi,rax
   0x0000000000402509 <+9>:     mov    rax,0x3c
   0x0000000000402510 <+16>:    syscall
   0x0000000000402512 <+18>:    ud2

We seems to have the same construct here. The first instruction of the function push rax realigns the stack after that the main will be called. The only difference is since the _start function is the entry point of our code it has never been called and as such this is the only function which is started with the stack 16 byte aligned. As a result the first instruction which was meant to compensate the misalignment of the stack will be the reason of the misalignment of it.

So let’s get back to the options(nostack). The documentation says:

The asm! block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is not used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.

If we compile and dump the _start function with nostack option enabled then we get the working assembly code:

> ./cargo.sh build
> ./cargo.sh dump _start
0000000000402500 <_start>:
  402500:       e8 6b ed ff ff          call   401270 <main>
  402505:       48 89 c7                mov    rdi,rax
  402508:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  40250f:       0f 05                   syscall
  402511:       0f 0b                   ud2

and without this option we get the crashing assembly code:

> ./cargo.sh build
> ./cargo.sh dump _start
0000000000402500 <_start>:
  402500:       50                      push   rax
  402501:       e8 6a ed ff ff          call   401270 <main>
  402506:       48 89 c7                mov    rdi,rax
  402509:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  402510:       0f 05                   syscall
  402512:       0f 0b                   ud2

So what’s here happening, isn’t it exactly the opposite of what the documentation says? And the answer is no. The compiler guaranties the stack alignment the right way in case of a function call. But no knowledge about that the _start code section never gets called. It thinks that it’s a function just like any other. We can prove this by moving the call main outside of the assembly block. It will generate the push rax even if the assembly block has the nostack option enabled.

#[no_mangle]
fn _start() -> ! {
    extern "C" { fn main() -> u8; } 
    unsafe { main(); }
    unsafe {
        core::arch::asm!(
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}

> ./cargo.sh build
> ./cargo.sh dump _start
0000000000402500 <_start>:
  402500:       50                      push   rax
  402501:       48 8d 05 68 ed ff ff    lea    rax,[rip+0xffffffffffffed68]        # 401270 <main>
  402508:       ff d0                   call   rax
  40250a:       48 89 c7                mov    rdi,rax
  40250d:       48 c7 c0 3c 00 00 00    mov    rax,0x3c
  402514:       0f 05                   syscall
  402516:       0f 0b                   ud2

Now that we agreed that stack alignment is important let’s make it permanent to avoid this bug in the future. The simplest way to clean up the last 16 byte of the number is and rsp,-0x10. Let’s add this to the beginning of the asm block:

#![allow(unused)]
fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "add rsp,-0x10",
            "call main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
}

Now it should work even without the nostack option because we the generated first instruction push rax will have simply no effect on our code. It’s a better to use the and instead of the sub or pop instruction here because sub and pop would remove 8 bytes in every case while the and instruction only modifies the rsp if it wasn’t aligned. Last but not least the System V ABI also says that the user space code is responsible for cleaning up the rbp register:

The content of this register is unspecified at process initialization time, but the user code should mark the deepest stack frame by setting the frame pointer to zero.

So let’s do that by adding an extra assembly line xor rbp,rbp.

#![allow(unused)]
fn main() {
#[no_mangle]
fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "xor rbp,rbp",
            "add rsp,-0x10",
            "call main",
            "mov rdi,rax",
            "mov rax,0x3c",
            "syscall",
            options(nostack, noreturn),
        )
    }
}
}

Keyboard shortcuts

Living without standards

Alignment