Pointless Hacks - A Webserver in Assembly

A Webserver in Assembly

A rather pointless goal of mine was to build a webserver "from scratch" - no libraries, no dependencies - not even libc. Only relying on system calls to provide the service. This means I restricted my goal to just a single operating system on a single CPU architecture, namely x86_64 linux.

So how does one go about this task? First it's good to understand how a "pure" assembly program is compiled and linked so I know what I'm working with. Here's about the simplest assembly program one can write:


.global _start
_start:
    mov $60, %rax
    mov $0, %rdi
    syscall

This program declares a (globally visible) symbol _start, which is the default entry point for a program as declared in the linker ld. It then loads the number for the system call exit into %rax, the value 0 into %rdi, and then syscall makes the operating system handle the request. This program is the same as true - it always exits with success. I compile this short program with the following commands:


as true.s -o true.o
ld true.o -o true

This assembles the code, and then links the resultant object (to nothing but itself) to produce a binary.

Command-line Arguments

I now have a program that can return a number, but can't do much else. In order to read arguments from the command line (I want my server to read bind address/port and files to serve from the command line) I need to know how to access these variables.

The answer is in the x86_64 ABI. Even assembly programs with no library support start with arguments on the stack as though they were passed to libc int start(int argc, char *argv[], char *envp[], ...) - which means that I can access the arguments just by incrementing the stack pointer. Here's what I came up with:


.global _start
_start:
    cmp $1, (%rsp)
    jle print_usage
    lea 8(%rsp), %rdi
    # push a 0 onto the stack to mark
    # end of loaded files
    pushq $0
    # loop through each argument
1:
    add $8, %rdi
    mov (%rdi), %rsi

    # start the server at end of arguments
    test %rsi, %rsi
    jz start_server

    # parse an option if arg begins with -
    cmpb $'-', (%rsi)
    je 2f

    # load file and push details to the stack
    call load_file
    push %rax
    push %rdx
    push (%rdi)
    jmp 1b
2:
    call parse_option
    jmp 1b

First I check argc, which is stored at %rsp. If it's less than or equal to 1, I jump to print_usage. Otherwise, I load argv into %rdi and start looping through until (%rdi) is null (the last element of argv is guaranteed to be a null pointer). If the argument starts with a '-', I parse an option, otherwise I load the file with the given name into memory. The push calls are to store the filenames and loaded file data so that the server knows which files to serve and how.

Loading and `mmap`-ing a File

I want to load files from the command line directly into memory so that writing them back out to a socket is as quick as possible. Thankfully linux makes this easy with mmap. Consulting man 2 mmap gives me all the details I need to write some assembly to start memory-mapping a file.


# mmap the file with filename in %rsi
# return pointer to file data in %rax
# return file length in %rdx
load_file:
    mov %rsi, %r12

    # open
    mov $NR_open, %rax
    mov %rdi, %rbx
    mov %rsi, %rdi
    mov $O_RDONLY, %rsi
    syscall
    test %rax, %rax
    js load_failed

The first thing to note here is that I've used some defined symbols such as NR_open instead of hard-coded constants. These are not preprocessor #defines - they are symbols which will be filled in. I'm not using a preprocessor for this project, so this is the cleanest way to use constants. Otherwise, this code is pretty simple: load the file with the open system call, and jump to load_failed if it returns a negative value.


    # stat
    mov %rax, %rdi
    mov $NR_fstat, %rax
    sub $SIZEOF_STAT, %rsp
    mov %rsp, %rsi
    syscall
    test %rax, %rax
    js stat_failed

    # mmap
    mov %rdi, %r8
    mov ST_SIZE(%rsp), %rsi
    add $SIZEOF_STAT, %rsp
    mov $NR_mmap, %rax
    xor %edi, %edi
    mov $PROT_READ, %rdx
    mov $MAP_PRIVATE, %r10
    xor %r9, %r9
    syscall
    test %rax, %rax
    js mmap_failed

mmap requires me to know the length of the file so I can map the whole thing into memory at once, so after opening the file I extract its size using the fstat system call and taking an offset into it (ST_SIZE) to input to the call to mmap. I load the file in read-only mode (PROT_READ) and prevent any changes to the underlying file (MAP_PRIVATE).


    # close original fd
    mov %rax, %rdx
    mov $NR_close, %rax
    mov %r8, %rdi
    syscall
    test %rax, %rax
    js close_failed

    # restore rdi and return
    mov %rbx, %rdi
    mov %rdx, %rax
    mov %rsi, %rdx
    ret

Once the file is mmaped a pointer to its data is stored in %rax. I store this temporarily in %rdx while I close the open file handle. The file handle can be closed since the mmap keeps a handle open until munmap is called. Then I move the variables into their return locations so they can be pushed onto the stack in the loop above.

Starting the Server

To start the server I need to:

Open a socket
Bind the socket to a local address
Start listening for incoming connections
Accept each connection
Read the request from the connection
Write a response header (200 OK or 404 NOT FOUND)
Write the response data
Close the socket
Go back to step #4

Here's what steps #1-#5 look like in assembly:


# create a socket, listen and accept connections
start_server:
    # socket
    mov $NR_socket, %rax
    mov $AF_INET, %rdi
    mov $SOCK_STREAM, %rsi
    xor %edx, %edx
    syscall
    test %rax, %rax
    js socket_failed

    # bind
    mov %rax, %rdi
    mov $NR_bind, %rax
    mov $sockaddr, %rsi
    mov $sockaddr_len, %rdx
    syscall
    test %rax, %rax
    js bind_failed

    # listen
    mov $NR_listen, %rax
    mov $128, %rsi
    syscall
    test %rax, %rax
    js listen_failed

1:
    # accept
    mov $NR_accept, %rax
    xor %esi, %esi
    xor %edx, %edx
    xor %r10, %r10
    syscall
    test %rax, %rax
    js accept_failed

    # read
    sub $0x400, %rsp
    mov %rdi, %rbx
    mov %rax, %rdi
    mov $NR_read, %rax
    mov %rsp, %rsi
    mov $0x400, %rdx
    syscall
    cmp $5, %rax
    jle 3f

I open a socket, accept connections on the socket, make space on the stack for the request, and read the request in. All errors are indicated by a negative return value from each syscall. For most, this is a hard error from which I do not recover, but for the read system call, I want to close the socket and accept the next connection, so this situation can recover smoothly.

Responding to a Request

To respond to a request I take a big shortcut. To keep things simple, I assume I'm getting an HTTP GET request with a single string after it representing the page address. I don't parse any options or even the HTTP version.


    # store end of request in r11
    lea (%rsp, %rax, 1), %r11

    # check for HTTP GET request
    cmpl $0x20544547, (%rsp) # "GET "
    jne 3f
    lea 0x3e8(%rsp), %rbp
    mov %rdi, %r9

    # compare each loaded filename with the request
2:
    add $24, %rbp
    mov (%rbp), %rdi
    test %rdi, %rdi
    jz 404f

    mov $-1, %rcx

    # get length of loaded filename in %rcx
    xor %eax, %eax
    repnz scasb (%rdi), %al
    not %rcx
    dec %rcx

    # compare request with filename
    mov (%rbp), %rdi # get filename in %rdi
    lea 5(%rsp), %rsi # get request in %rsi; skip the "GET /"
    mov $index_html, %r10
    cmpb $' ', (%rsi) # check for root request
    cmove %r10, %rsi # use "index.html" if root request
    repe cmpsb (%rdi), (%rsi) # do the comparison
    jne 2b  # strings do not match
    cmpb $' ', (%rsi) # request terminated by space
    jne 2b # strings not same length
    cmp %r11, %rsi
    ja 2b # read past the end of the request

Here's where things get tricky. First I need to know where the request ends so I don't accidentally run off the end, exposing the previous request (which uses the same buffer). Then I check the first 4 characters against "GET ", encoded in little-endian. Now I load a pointer to the array of loaded filenames, files and their lengths into %rbp, and loop through them.

If I've reached the end of the loaded filenames, I return a 404 error. Otherwise, I get the length of the loaded filename using the repnz scasb instruction. This decrements %rcx until %rdi points to a byte containing the value of %al (which is a null byte). The not and dec instructions fix up %rcx to equal the length of the filename. Now I compare the filename with the request (with a special rule that replaces '' with 'index.html'), using the repe cmpsb instruction. I check that I haven't read past the end of the request, and that there is a space just after the matched filename in the request. If all of these checks pass, I'm ready to serve the file pointed to by 16(%rbp).


    # write 200 OK message
    mov %r9, %rdi
    mov $http_200_str, %rsi
    mov $http_200_strlen, %rdx
    call write_all

    # write file data
    mov 16(%rbp), %rsi
    mov 8(%rbp), %rdx
    call write_all

Here I write an HTTP 200 OK message header, followed by the mmapped file. The header contains the 'Connection: close' header, since I intend to close the connection immediately after sending the data. Maintaining open connections is tricky for this low-level approach. The write_all function, below, does the work of handling partial writes to the socket and retrying until all bytes have been written.


# write bytes to file in $rdi until all have been written
# or a failure occurs
write_all:
1:
    mov $NR_write, %rax
    syscall
    test %rax, %rax
    js 2f
    add %rax, %rsi
    sub %rax, %rdx
    test %rdx, %rdx
    jnz 1b
2:
    ret

The only things left to do are steps #8 and #9, which are just 6 lines of assembly, including fixing up the stack.


    # close
    mov %r9, %rdi
    mov $NR_close, %rax
    syscall
    add $0x400, %rsp
    mov %rbx, %rdi
    jmp 1b

Wrapping Up

There was a bunch of stuff I didn't cover here - ignoring SIGPIPE, setting TCP_DEFER_ACCEPT on the listening socket are two improvements I've made to the above.

I also didn't get into specifics of the linker script used to reduce the file size, but with everything together, the resulting binary weighs in at a statically-linked 2584 bytes!

Hopefully this pointless hack was interesting! You can find the full code here. It's the code that's running this site right now!