Coding a Web Server in 25 Lines - Computerphile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
um I thought we'd look at something a bit different today um and we'll look at how to build uh a web server formally we'll call it an HTTP server so that is the thing if I go to website so let's have a look if I go to I'll choose this very interesting website only your mother could love that face how does the data from you know this remote server in this case is the one I'm running how do we ask that server for that content how does it come over to us and you know people may have heard of um web servers like Apache or enginex I use a slightly different one on my site but they're all fundamentally similar they're all very very big and if you look at the code for this you just cannot see the forest for the trees but it turns out that we can simplify heavily some would say cheating um but we can cut it down to a very small thing and I think you can really see the essence of it and not only does that tell us about how uh a server that gives us a normal website is but people use these HTTP servers for all sorts of other things like micro services and so so on and it's really useful to the FRS know how they work but I have implemented this twice in software I've written in the last couple of months so I you know there are reasons why we might want to do it ourselves as well what I'm going to do is I'm going to write one uh I'm going to do it in Rust now that doesn't really matter which language I use I could have just as easily done this in Python and I bet you I'm going to make some mistakes as we go along um you can work out which ones are deliberate and which ones aren't and what we'll try and do is Build It Up bit by bit so that there'll be some gobbley go at some points can't really avoid that but I hope each of the stages will kind of make sense in and of itself and we can work out what it is do another stage see what works and doesn't add a little bit more I'm wearing long sleeves today but I'm going to show I haven't got anything up my sleeves I'm going to start right from the beginning so I'm going to make a a fresh rust project and what we're going to do is you can literally see it's hello world so we'll just check again to check that I'm not doing anything silly so print at hello what so we're starting at the very Basics so what is a web HTTP server so it's a program that listens for connection so imagine this is running some website you know about it's whether it's my site track.net or something else it's sitting there listening for web browsers to make connections to it so I'm going to assume that we know how the internet works broadly TCP IP addresses those sorts of things and if you don't know the mechanics of those it's not super important what we've got to do is listen for incoming connection ction so that's the first thing we have to do so let's do that and then we'll see how we augment that cuz it turns out you know not everything works quite straight off the bat we will first of all say we're going to make a listener and let's see TCP listener I think yeah oops and we have to say where do I want to listen on so uh 127.0.0.1 also known as Local Host that's my local machine so I'm just going to make sure this this is a server only I and my local machine can connect to so the one of the corners I'm going to cut later is about security I don't want any bad guys getting in so I'm going to say only listen on my local computer and I need to choose a port number HTTP normally runs on Port 80 that's only available to the super user I don't want to do that so I'm just going to say I'm going to listen on Port 9999 and we'll see what that does in a little bit so oh and because it's rust I'm going to say whenever you see dot unwrap that means an error can happen if it does just make the program CR crash we'll deal with those crashes as them when we do them so this is the first little step and we'll run this and just see what it does not very much it listened no one connected and it gave up straight away that's what we want so what we have to do is say multiple people might come in they could be C with these people we're only going to deal with one person at a time we're not going to be very clever so we're going to say do incoming so this says as someone comes in we want to potentially listen to what they have to say and give them something back and then for reasons that you don't want me to go into I going say do flatten that's not very interesting so now inside this loop someone's connected to us so this is the first time we have to think what's going to happen next they're going to send us a request we're going to give them a response what's the request looks like well there is a formal document um and we'll have a quick mention of that in a minute but let's actually do what the browser will do so what I'm going to have my program do is just print out whatever it's been asked of it even though we don't yet know what is so I have to do a little bit of weird stuff here but it doesn't matter too much do buff reader and then this is one of the weird things you always fine with Network program you have to do some of this sort of boiler plate stuff it's it's just NH you get used to it so I'm going to say read into that line so if I've got that right and I bet you I've made mistakes um we're going to just read all of the lines in the requests given and just see what they look like let's see what happens oh this is one of the fun things with rust I'll need to do okay cool uh I missed off a do unwrap right okay so notice now my program has not immediately exited that's a very good start so let's now go to our browser and I'm going to do this so notice I've gone 127 to O o1 99999 So that's the port number from before so do that like nothing seems to be happening my browser is kind of stuck but if I go back to my terminal it's now printed out stuff here is a request so this is what the browser has sent to my program and it turns out that the only thing we're actually going to care about is really this line here so this whole thing that I'm going to highlight now this is formally a request in HTTP 1.1 which is specified by rfc2616 you'll be glad to know I'm not going to go through that document in any detail but the browser sent us this request and we've just gone thank you and not done anything else you did it perfectly though well yes I mean we've got a slightly sociopathic program here or you could call it a very good listener depending on which way around you want to think about this so it turns out that when the browser sends you this request it's expecting you to give a response or to do something else but we've kind of done neither and its request you can't easily tell it from here but its request will finish with a blank line so the easiest thing we can now do is say okay when I see a blank line it's sent me all the data it's going to send me I'm simplifying things here but that will do for what we need we're just going to say if and I'm going to get the doctrine just gets rid of any space or weird character at the beginning end if we hit a blank line we're done so now we'll run that again and reload our page and did I get that bit right I probably got something wrong oh yeah well I need to now send it back some data otherwise it's still waiting for me to do something so I'm going to say and this is the fun bit that's really surprisingly easy I'm going to send back response now this is specified by the RFC blah blah blah that I mentioned earlier and browsers have to be quite tolerant don't they so they have to so I am abusing this monstrously I am an incredibly bad citizen here so I'm cutting Corners knowing that the browser will will deal with this if you were doing the full job you would do some other things that rather more carefully so for example where I'm just reading in one line at a time you have to merge lines technically but the browsers won't make use of that here so yeah I'm you know I'm happy to make use of their thing so I'm now writing the response I'm just going to send them back hello kind of world sort of thing so everyone's heard of 404 right this is a meme known even by everyone's family members or otherwise don't know anything about computers it's one of the error messages and it just means there's a missing file 200 is a response code that says success 404 says something's missing so we're going to send back a 200 saying something good has happened and we're going to say it's an okay now for weird reasons because this is an old standard you'd think we just have to put a blank line here but we have to put the old fashioned carriage return new Line thing it's weird and we have to create a blank line so we're going to do that twice and then we'll just say hello and I'll have to put an unwrap here right so now let's run that uh yes I will now have to do another fun little bit of weird incantation but hopefully the last one okay go back to my browser uh so I've done something wrong so what have I done wrong I can look at the code on I can see what I've done this line here number seven should be inside the loop otherwise they'll always have some content in so let's run that again uh and then when I go here and press reload okay my thing prints out hello let's and just to check that I'm really not lying we'll say Hello computer file doing so now we can really see the browser's made a request we've given it a response cool we have now made a fully functioning HTTP server but I am not sufficiently impressed and I can tell Sean that you are not sufficiently impressed yet either I'm not seeing cat pictures just yet you know I don't have any cat pictures are you allowed to have websites without cat pictures it's crept in isn't it certainly crept in so if you're expecting cat pictures I'm really sorry I'm afraid all you're going to see is one of my ugly mugs so what we're going to do now is I'm going to move my website no capat pictures and we're going to see if we can get that working so we can make actually send it my website via this thing so I've just got a literal copy of my entire website I'm going to put it there so I've now put all of my website locally and it looks like it's going to be quite hard to do this but what we can do is see something if we go and change our little request so let's run our program again and one thing that's not obvious is in the quest line what's that slash in the middle mean for the forward slash well let's and there's a reason I'm typing it like this so it still says Hello computer file but now we can see it says getor index.html so you can see a link between the thing in the Brows URL the middle thing so that middle thing in between get and HTTP 1.1 that's called the resource we're requesting so we want to break that bit out and we're going to ignore everything else the browser sent us absolutely everything is completely boring for from our perspective today so what we're going to do we know that that first line is special so let's do that and we're going to say right and I'm going to do a little bit of gobble the go here I'm going to say um take that first line split it by the space characters this is the weird bit I'm going to put it in a veck of things doesn't matter why and then I'm going a slice of it and what I'm going to do is say if someone gives us a get request because there turns out there are some things that are not get if you've heard of post queries and things that we're not going to worry about those there's the resource that's the bit we're asked for and we're going to check that people really are sending us only HTP 1.1 cuz we don't want to deal with anything else we got to fill in the dot dot dot if anyone gives us anything else we're just going to complete the error and give up nothing clever now we're going to move all this into here so our web Ser is getting slightly more sophisticated but not by very much so let's not print that out so what we're going to do now is say we want to find the file someone asked and we want to send them the file so I know that I put all of that website in a directory called HT docs oops that's going to need to be mutable and so then I'm just going to say okay just and this is right anyone who's security conscious close your eyes pretend I haven't done the next bit that I'm going to do I'll explain to you why I'm doing it sort of we're just going to say okay just load up the file that they've asked no checking because they they can ask for Naughty files here when we'll we'll we'll just happily send it to them we can no longer send Hello computer file but what we can say instead oops what did I forget there a simp right see if we got something of course I've missed a on wrap you really start to like on wraps when you're writing Rost in this sty okay so let's save and run it and we'll press F5 H our browser is now displaying nothing and our server has has has thrown an error it said I can't do it and the reason for that if I print out the file that we're asking for is that the resource starts with a forward slash and actually I'm now trying to load a let's say a a file that I'm forbidden from accessing on my computer so we're just going to say this is again not great security but it'll work for us we're just going to get rid of the first four slash so when someone asks for SL lindex HML we're just going to get rid of the first slash it's all we're going going to do do that boom it works it's fast mostly because it's very simple so I'm now very happy I feel like if achiev something it's not a cat photo but it'll do so I can you I can go and click around on this stuff it works but if I do this oh no it's crashed again in this case I clicked on the home button and if you look at the URL the URL now ends in a forward slash and anyone who's done any HTML programming knows that the convention it's not mandated by HTTP but the convention on most web services if you reference a directory name so it ends in the slash you'll look for a page called something like index.html so I've got to handle that case carefully and as you'll know I'm not a careful person so I'm just going to handle it UNC carefully so I'll say if the resource that you ask for just ends with a for slash then I'm just going to say I just assumed you asked for index HTML it'll be save that run it oh no I didn't name put boom now that works so no notice now I can I'm picking up the right things there the other things work we're done so let's just go back to this and there are some more things I could do you know I could make it more secure I haven't done a 404 page for those who wanted one you can work out with a couple of lines where I might put that in but that is 26 lines of code I'm going to not count the blank lines so I'm going to count really 26 lines and a blank line up there 25 if you want to be more precise then it's a complete uh very simple Corner cutting HTTP server and I hope by building it up bit by bit you can see there's not much magic there when you look at it it's easy to get distracted by the gobbley but by um building it up bit by bit I hope you can see how we've ended up with what we have why it work Works you've seen me make some very stupid mistakes and hopefully show also some of the real normal mistakes you'd make in building up to that full thing I can see how that kind of works and Corner cutting we might come to in a moment but why if you can do it with so few lines of code are some of these service so big and complicated they're normally trying to handle many more cases and they're trying to do things much faster so one of the corners that I've cut here is Imagine two people ask for um data at virtually the same time whoever gets there first will handle them however long it takes to do so so if they're on a really slow connection it takes a couple of seconds to send them the data the second person just has to wait until the first person's completely served so doing things in parallel that turns out to be a great deal of fun very difficult to do but absolutely necessary for performance people also then want a ton of features so one of the things I did here is I assumed that just by connecting to my server I knew what site you wanted to go to but actually one HTTP server can uh serve multiple different sites with different names that introduces complexity now you're going to have configuration files you're going to have all of this sorts of stuff that is the things that people expect from a real industrial grade one as it were and if you were going to use this simple I'm not going to call it simplistic but simple version is it possible to make that secure or is security part of the comple complexity you could make this secure um and probably only with another three or four lines of code so one of the things and I'll leave this as an exercise for the viewer um someone who is clever and evil uh you got to be both I think being evil is not enough being clever is not enough you got to be the both could work out how to get this to give them files that are outside of my uh website directory you can add a check in a couple of lines that says if someone asks for a file outside of where I want them to abort the thing and say no no I won't do that and actually that's pretty much going to be enough on its own to get you all of the security this very simple one needs now if you get more complex you've got mult you're a server serving multiple sites you'll want to do extra security things like making sure people from one site can't access things from the other or if you've got connections at all but for a simple um system like this that security there is no certain problem because we don't support it so it's just that thing really about making sure we don't give people files that they didn't that they shouldn't have access to and it now runs in well that's a 10th of second it has run two orders of magnitude faster and in fact I think we'll see if no other station will begin or continue its own transmission and this is really the core the heart
Info
Channel: Computerphile
Views: 320,128
Rating: undefined out of 5
Keywords: computers, computerphile, computer, science
Id: 7GBlCinu9yg
Channel Id: undefined
Length: 17min 49sec (1069 seconds)
Published: Thu Feb 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.