A Webserver in Assembly
A rather pointless goal of mine was to build a webserver "from scratch" - no libraries, no dependencies - not even libc. Only relying on system calls to provide the service. This means I restricted my goal to just a single operating system on a single CPU architecture, namely x86_64 linux.
So how does one go about this task? First it's good to understand how a "pure" assembly program is compiled and linked so I know what I'm working with. Here's about the simplest assembly program one can write:
.global _start
_start:
mov $60, %rax
mov $0, %rdi
syscall
This program declares a (globally visible) symbol _start, which is the default entry
point for a program as declared in the linker ld
. It then loads the number
for the system call exit
into %rax
, the value 0
into %rdi
, and then syscall
makes the operating system handle
the request. This program is the same as true
- it always exits with success.
I compile this short program with the following commands:
as true.s -o true.o
ld true.o -o true
This assembles the code, and then links the resultant object (to nothing but itself) to produce a binary.
Command-line Arguments
I now have a program that can return a number, but can't do much else. In order to read arguments from the command line (I want my server to read bind address/port and files to serve from the command line) I need to know how to access these variables.
The answer is in the
x86_64 ABI.
Even assembly programs with no library support start with arguments on the stack
as though they were passed to libc int start(int argc, char *argv[], char *envp[], ...)
- which means that I can access the arguments just by incrementing the stack pointer.
Here's what I came up with:
.global _start
_start:
cmp $1, (%rsp)
jle print_usage
lea 8(%rsp), %rdi
# push a 0 onto the stack to mark
# end of loaded files
pushq $0
# loop through each argument
1:
add $8, %rdi
mov (%rdi), %rsi
# start the server at end of arguments
test %rsi, %rsi
jz start_server
# parse an option if arg begins with -
cmpb $'-', (%rsi)
je 2f
# load file and push details to the stack
call load_file
push %rax
push %rdx
push (%rdi)
jmp 1b
2:
call parse_option
jmp 1b
First I check argc
, which is stored at %rsp
. If it's less than or
equal to 1, I jump to print_usage
. Otherwise, I load argv
into
%rdi
and start looping through until (%rdi)
is null (the last
element of argv
is guaranteed to be a null pointer). If the argument starts
with a '-', I parse an option, otherwise I load the file with the given name into memory.
The push
calls are to store the filenames and loaded file data so that the
server knows which files to serve and how.
Loading and mmap
-ing a File
I want to load files from the command line directly into memory so that writing them
back out to a socket is as quick as possible. Thankfully linux makes this easy with
mmap
. Consulting man 2 mmap
gives me all the details I need to
write some assembly to start memory-mapping a file.
# mmap the file with filename in %rsi
# return pointer to file data in %rax
# return file length in %rdx
load_file:
mov %rsi, %r12
# open
mov $NR_open, %rax
mov %rdi, %rbx
mov %rsi, %rdi
mov $O_RDONLY, %rsi
syscall
test %rax, %rax
js load_failed
The first thing to note here is that I've used some defined symbols such as
NR_open
instead of hard-coded constants. These are not preprocessor
#define
s - they are symbols which will be filled in. I'm not using a
preprocessor for this project, so this is the cleanest way to use constants.
Otherwise, this code is pretty simple: load the file with the open
system call, and jump to load_failed
if it returns a negative value.
# stat
mov %rax, %rdi
mov $NR_fstat, %rax
sub $SIZEOF_STAT, %rsp
mov %rsp, %rsi
syscall
test %rax, %rax
js stat_failed
# mmap
mov %rdi, %r8
mov ST_SIZE(%rsp), %rsi
add $SIZEOF_STAT, %rsp
mov $NR_mmap, %rax
xor %edi, %edi
mov $PROT_READ, %rdx
mov $MAP_PRIVATE, %r10
xor %r9, %r9
syscall
test %rax, %rax
js mmap_failed
mmap
requires me to know the length of the file so I can map the whole
thing into memory at once, so after opening the file I extract its size using the
fstat
system call and taking an offset into it (ST_SIZE
)
to input to the call to mmap
. I load the file in read-only mode
(PROT_READ
) and prevent any changes to the underlying file
(MAP_PRIVATE
).
# close original fd
mov %rax, %rdx
mov $NR_close, %rax
mov %r8, %rdi
syscall
test %rax, %rax
js close_failed
# restore rdi and return
mov %rbx, %rdi
mov %rdx, %rax
mov %rsi, %rdx
ret
Once the file is mmap
ed a pointer to its data is stored in %rax
. I
store this temporarily in %rdx
while I close the open file handle.
The file handle can be closed since the mmap
keeps a handle open until
munmap
is called. Then I move the variables into their return locations
so they can be pushed onto the stack in the loop above.
Starting the Server
To start the server I need to:
- Open a socket
- Bind the socket to a local address
- Start listening for incoming connections
- Accept each connection
- Read the request from the connection
- Write a response header (200 OK or 404 NOT FOUND)
- Write the response data
- Close the socket
- Go back to step #4
Here's what steps #1-#5 look like in assembly:
# create a socket, listen and accept connections
start_server:
# socket
mov $NR_socket, %rax
mov $AF_INET, %rdi
mov $SOCK_STREAM, %rsi
xor %edx, %edx
syscall
test %rax, %rax
js socket_failed
# bind
mov %rax, %rdi
mov $NR_bind, %rax
mov $sockaddr, %rsi
mov $sockaddr_len, %rdx
syscall
test %rax, %rax
js bind_failed
# listen
mov $NR_listen, %rax
mov $128, %rsi
syscall
test %rax, %rax
js listen_failed
1:
# accept
mov $NR_accept, %rax
xor %esi, %esi
xor %edx, %edx
xor %r10, %r10
syscall
test %rax, %rax
js accept_failed
# read
sub $0x400, %rsp
mov %rdi, %rbx
mov %rax, %rdi
mov $NR_read, %rax
mov %rsp, %rsi
mov $0x400, %rdx
syscall
cmp $5, %rax
jle 3f
I open a socket, accept connections on the socket, make space on the stack
for the request, and read
the request in. All errors are indicated by a
negative return value from each syscall
. For most, this is a hard error from
which I do not recover, but for the read
system call, I want to close the
socket and accept the next connection, so this situation can recover smoothly.
Responding to a Request
To respond to a request I take a big shortcut. To keep things simple, I assume I'm getting an HTTP GET request with a single string after it representing the page address. I don't parse any options or even the HTTP version.
# store end of request in r11
lea (%rsp, %rax, 1), %r11
# check for HTTP GET request
cmpl $0x20544547, (%rsp) # "GET "
jne 3f
lea 0x3e8(%rsp), %rbp
mov %rdi, %r9
# compare each loaded filename with the request
2:
add $24, %rbp
mov (%rbp), %rdi
test %rdi, %rdi
jz 404f
mov $-1, %rcx
# get length of loaded filename in %rcx
xor %eax, %eax
repnz scasb (%rdi), %al
not %rcx
dec %rcx
# compare request with filename
mov (%rbp), %rdi # get filename in %rdi
lea 5(%rsp), %rsi # get request in %rsi; skip the "GET /"
mov $index_html, %r10
cmpb $' ', (%rsi) # check for root request
cmove %r10, %rsi # use "index.html" if root request
repe cmpsb (%rdi), (%rsi) # do the comparison
jne 2b # strings do not match
cmpb $' ', (%rsi) # request terminated by space
jne 2b # strings not same length
cmp %r11, %rsi
ja 2b # read past the end of the request
Here's where things get tricky. First I need to know where the request ends
so I don't accidentally run off the end, exposing the previous request (which
uses the same buffer). Then I check the first 4 characters against "GET ",
encoded in little-endian. Now I load a pointer to the array of loaded filenames,
files and their lengths into %rbp
, and loop through them.
If I've reached the end of the loaded filenames, I return a 404 error. Otherwise,
I get the length of the loaded filename using the repnz scasb
instruction.
This decrements %rcx
until %rdi
points to a byte containing
the value of %al
(which is a null byte). The not
and
dec
instructions fix up %rcx
to equal the length of the
filename. Now I compare the filename with the request (with a special rule that
replaces '' with 'index.html'), using the repe cmpsb
instruction. I check
that I haven't read past the end of the request, and that there is a space just
after the matched filename in the request. If all of these checks pass, I'm ready
to serve the file pointed to by 16(%rbp)
.
# write 200 OK message
mov %r9, %rdi
mov $http_200_str, %rsi
mov $http_200_strlen, %rdx
call write_all
# write file data
mov 16(%rbp), %rsi
mov 8(%rbp), %rdx
call write_all
Here I write an HTTP 200 OK message header, followed by the mmap
ped file.
The header contains the 'Connection: close' header, since I intend to close the connection
immediately after sending the data. Maintaining open connections is tricky for this
low-level approach. The write_all
function, below, does the work of
handling partial writes to the socket and retrying until all bytes have been written.
# write bytes to file in $rdi until all have been written
# or a failure occurs
write_all:
1:
mov $NR_write, %rax
syscall
test %rax, %rax
js 2f
add %rax, %rsi
sub %rax, %rdx
test %rdx, %rdx
jnz 1b
2:
ret
The only things left to do are steps #8 and #9, which are just 6 lines of assembly, including fixing up the stack.
# close
mov %r9, %rdi
mov $NR_close, %rax
syscall
add $0x400, %rsp
mov %rbx, %rdi
jmp 1b
Wrapping Up
There was a bunch of stuff I didn't cover here - ignoring SIGPIPE, setting TCP_DEFER_ACCEPT on the listening socket are two improvements I've made to the above.
I also didn't get into specifics of the linker script used to reduce the file size, but with everything together, the resulting binary weighs in at a statically-linked 2584 bytes!
Hopefully this pointless hack was interesting! You can find the full code here. It's the code that's running this site right now!