Pointless Hacks

A Webserver in Assembly

A rather pointless goal of mine was to build a webserver "from scratch" - no libraries, no dependencies - not even libc. Only relying on system calls to provide the service. This means I restricted my goal to just a single operating system on a single CPU architecture, namely x86_64 linux.

So how does one go about this task? First it's good to understand how a "pure" assembly program is compiled and linked so I know what I'm working with. Here's about the simplest assembly program one can write:

.global _start _start: mov $60, %rax mov $0, %rdi syscall

This program declares a (globally visible) symbol _start, which is the default entry point for a program as declared in the linker ld. It then loads the number for the system call exit into %rax, the value 0 into %rdi, and then syscall makes the operating system handle the request. This program is the same as true - it always exits with success. I compile this short program with the following commands:

as true.s -o true.o ld true.o -o true

This assembles the code, and then links the resultant object (to nothing but itself) to produce a binary.

Command-line Arguments

I now have a program that can return a number, but can't do much else. In order to read arguments from the command line (I want my server to read bind address/port and files to serve from the command line) I need to know how to access these variables.

The answer is in the x86_64 ABI. Even assembly programs with no library support start with arguments on the stack as though they were passed to libc int start(int argc, char *argv[], char *envp[], ...) - which means that I can access the arguments just by incrementing the stack pointer. Here's what I came up with:

.global _start _start: cmp $1, (%rsp) jle print_usage lea 8(%rsp), %rdi # push a 0 onto the stack to mark # end of loaded files pushq $0 # loop through each argument 1: add $8, %rdi mov (%rdi), %rsi # start the server at end of arguments test %rsi, %rsi jz start_server # parse an option if arg begins with - cmpb $'-', (%rsi) je 2f # load file and push details to the stack call load_file push %rax push %rdx push (%rdi) jmp 1b 2: call parse_option jmp 1b

First I check argc, which is stored at %rsp. If it's less than or equal to 1, I jump to print_usage. Otherwise, I load argv into %rdi and start looping through until (%rdi) is null (the last element of argv is guaranteed to be a null pointer). If the argument starts with a '-', I parse an option, otherwise I load the file with the given name into memory. The push calls are to store the filenames and loaded file data so that the server knows which files to serve and how.

Loading and mmap-ing a File

I want to load files from the command line directly into memory so that writing them back out to a socket is as quick as possible. Thankfully linux makes this easy with mmap. Consulting man 2 mmap gives me all the details I need to write some assembly to start memory-mapping a file.

# mmap the file with filename in %rsi # return pointer to file data in %rax # return file length in %rdx load_file: mov %rsi, %r12 # open mov $NR_open, %rax mov %rdi, %rbx mov %rsi, %rdi mov $O_RDONLY, %rsi syscall test %rax, %rax js load_failed

The first thing to note here is that I've used some defined symbols such as NR_open instead of hard-coded constants. These are not preprocessor #defines - they are symbols which will be filled in. I'm not using a preprocessor for this project, so this is the cleanest way to use constants. Otherwise, this code is pretty simple: load the file with the open system call, and jump to load_failed if it returns a negative value.

# stat mov %rax, %rdi mov $NR_fstat, %rax sub $SIZEOF_STAT, %rsp mov %rsp, %rsi syscall test %rax, %rax js stat_failed # mmap mov %rdi, %r8 mov ST_SIZE(%rsp), %rsi add $SIZEOF_STAT, %rsp mov $NR_mmap, %rax xor %edi, %edi mov $PROT_READ, %rdx mov $MAP_PRIVATE, %r10 xor %r9, %r9 syscall test %rax, %rax js mmap_failed

mmap requires me to know the length of the file so I can map the whole thing into memory at once, so after opening the file I extract its size using the fstat system call and taking an offset into it (ST_SIZE) to input to the call to mmap. I load the file in read-only mode (PROT_READ) and prevent any changes to the underlying file (MAP_PRIVATE).

# close original fd mov %rax, %rdx mov $NR_close, %rax mov %r8, %rdi syscall test %rax, %rax js close_failed # restore rdi and return mov %rbx, %rdi mov %rdx, %rax mov %rsi, %rdx ret

Once the file is mmaped a pointer to its data is stored in %rax. I store this temporarily in %rdx while I close the open file handle. The file handle can be closed since the mmap keeps a handle open until munmap is called. Then I move the variables into their return locations so they can be pushed onto the stack in the loop above.

Starting the Server

To start the server I need to:

  1. Open a socket
  2. Bind the socket to a local address
  3. Start listening for incoming connections
  4. Accept each connection
  5. Read the request from the connection
  6. Write a response header (200 OK or 404 NOT FOUND)
  7. Write the response data
  8. Close the socket
  9. Go back to step #4

Here's what steps #1-#5 look like in assembly:

# create a socket, listen and accept connections start_server: # socket mov $NR_socket, %rax mov $AF_INET, %rdi mov $SOCK_STREAM, %rsi xor %edx, %edx syscall test %rax, %rax js socket_failed # bind mov %rax, %rdi mov $NR_bind, %rax mov $sockaddr, %rsi mov $sockaddr_len, %rdx syscall test %rax, %rax js bind_failed # listen mov $NR_listen, %rax mov $128, %rsi syscall test %rax, %rax js listen_failed 1: # accept mov $NR_accept, %rax xor %esi, %esi xor %edx, %edx xor %r10, %r10 syscall test %rax, %rax js accept_failed # read sub $0x400, %rsp mov %rdi, %rbx mov %rax, %rdi mov $NR_read, %rax mov %rsp, %rsi mov $0x400, %rdx syscall cmp $5, %rax jle 3f

I open a socket, accept connections on the socket, make space on the stack for the request, and read the request in. All errors are indicated by a negative return value from each syscall. For most, this is a hard error from which I do not recover, but for the read system call, I want to close the socket and accept the next connection, so this situation can recover smoothly.

Responding to a Request

To respond to a request I take a big shortcut. To keep things simple, I assume I'm getting an HTTP GET request with a single string after it representing the page address. I don't parse any options or even the HTTP version.

# store end of request in r11 lea (%rsp, %rax, 1), %r11 # check for HTTP GET request cmpl $0x20544547, (%rsp) # "GET " jne 3f lea 0x3e8(%rsp), %rbp mov %rdi, %r9 # compare each loaded filename with the request 2: add $24, %rbp mov (%rbp), %rdi test %rdi, %rdi jz 404f mov $-1, %rcx # get length of loaded filename in %rcx xor %eax, %eax repnz scasb (%rdi), %al not %rcx dec %rcx # compare request with filename mov (%rbp), %rdi # get filename in %rdi lea 5(%rsp), %rsi # get request in %rsi; skip the "GET /" mov $index_html, %r10 cmpb $' ', (%rsi) # check for root request cmove %r10, %rsi # use "index.html" if root request repe cmpsb (%rdi), (%rsi) # do the comparison jne 2b # strings do not match cmpb $' ', (%rsi) # request terminated by space jne 2b # strings not same length cmp %r11, %rsi ja 2b # read past the end of the request

Here's where things get tricky. First I need to know where the request ends so I don't accidentally run off the end, exposing the previous request (which uses the same buffer). Then I check the first 4 characters against "GET ", encoded in little-endian. Now I load a pointer to the array of loaded filenames, files and their lengths into %rbp, and loop through them.

If I've reached the end of the loaded filenames, I return a 404 error. Otherwise, I get the length of the loaded filename using the repnz scasb instruction. This decrements %rcx until %rdi points to a byte containing the value of %al (which is a null byte). The not and dec instructions fix up %rcx to equal the length of the filename. Now I compare the filename with the request (with a special rule that replaces '' with 'index.html'), using the repe cmpsb instruction. I check that I haven't read past the end of the request, and that there is a space just after the matched filename in the request. If all of these checks pass, I'm ready to serve the file pointed to by 16(%rbp).

# write 200 OK message mov %r9, %rdi mov $http_200_str, %rsi mov $http_200_strlen, %rdx call write_all # write file data mov 16(%rbp), %rsi mov 8(%rbp), %rdx call write_all

Here I write an HTTP 200 OK message header, followed by the mmapped file. The header contains the 'Connection: close' header, since I intend to close the connection immediately after sending the data. Maintaining open connections is tricky for this low-level approach. The write_all function, below, does the work of handling partial writes to the socket and retrying until all bytes have been written.

# write bytes to file in $rdi until all have been written # or a failure occurs write_all: 1: mov $NR_write, %rax syscall test %rax, %rax js 2f add %rax, %rsi sub %rax, %rdx test %rdx, %rdx jnz 1b 2: ret

The only things left to do are steps #8 and #9, which are just 6 lines of assembly, including fixing up the stack.

# close mov %r9, %rdi mov $NR_close, %rax syscall add $0x400, %rsp mov %rbx, %rdi jmp 1b

Wrapping Up

There was a bunch of stuff I didn't cover here - ignoring SIGPIPE, setting TCP_DEFER_ACCEPT on the listening socket are two improvements I've made to the above.

I also didn't get into specifics of the linker script used to reduce the file size, but with everything together, the resulting binary weighs in at a statically-linked 2584 bytes!

Hopefully this pointless hack was interesting! You can find the full code here. It's the code that's running this site right now!