How to write a multithreaded server in C (threads, sockets)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
fifty-three days in Africa and finally we get rain it's sprinkling yes it's not really rain but it's like a little bit of rain there is water falling from the sky yeah it's crazy actually I haven't really seen more than one or two clouds in the sky so this is actually kind of cool still it's pretty cool it's not a rain storm per se but it's still pretty cool hey everybody well and we need to see if it's actually gonna rain I thought we'd do a quick tutorial on two topics that I've touched on in the past and that is threads and sockets a while ago I did a couple videos on sockets I did a simple client video and a simple server video I also did some videos about threads and today I wanted to bring those together and actually show you how you could put threads and sockets together to make a multi-threaded server so this video is going to be an intermediate video I'm assuming you understand the basics of C and that you've seen my sockets and threads videos if you haven't maybe this is a good time to pause this video and I'll put links in the description so go watch those and then come back and watch the rest of this okay so for today's video I want to start with a really quick example now this is a lot like the previous server example I showed you it follows the fairly standard flow that you see with most TCP socket servers but let me take you through it really quickly so we create a socket we bind that socket to the port we want we call listen then this is fairly typical for servers we're using stream sockets that means we're using TCP and our connections going to look like a stream of bytes coming in one after another instead of having separate packets of data coming in it's just gonna look like a stream of data and we're listening on this server port which happens to be set to 8989 but it could be any available port that your operating system lets you access the check function is an error handling function that I wrote basically it just takes advantage of the fact that a lot of these functions fail in exactly the same way they return a negative one if there's an error and so I made a single function to handle them all in a standard way rather than cluttering up my code with a bunch of if statements then if we keep moving once we're listening then we just go into an infinite loop accepts new connections and then passes each connection off to this handle connection function now the handle connection function is fairly straightforward for this example my server allows clients to read files from the server the client sends the file name and the server will then read the file and send the contents back to the client and this is actually really similar to how a web server works but without the extra HTTP parsing and it does have one big security problem and that is that in its current form this code just allows the client to read any file from your hard drive so do be careful this is an example it's an effective example to show you how stuff works but it's not meant to be used out in the wild on the big bad Internet running this code on a publicly available server could get you into trouble ok so handle connection reads whatever the client sends it up to buff size and until we get a newline character and we check for errors and we mil terminate the buffer ok then I'm gonna print out the request just so we can see what's going on ok so I check to make sure that the path is a valid file path using real path yes I also have a video about real path if you want more information there I'll put a link in the description then we open the requested file we read its contents out and we send it back to the client so we close the socket we close the file and then that connection is basically finished we're done with that so then we go back and we just wait for another and we do this forever this server just keeps accepting connections and handling connections over and over again it accepts a new connection it handles that connection then it accepts another that's just what it does until we use control-c to kill it ok so to show you that it works I made a quick Ruby script to be my client this script connects to port 89 89 that's the port that my server is listening on a requests a file name and we're gonna have it request a particular file from my computer I've also made a make file that compiles my server ok so we can compile the server and we run it and now we can see it prints out that it's waiting ok so then in another terminal window notice that I've set up some test files in slash temp they're all just copies of the server source code but they could be anything and so our client script is going to request one of these files and we can run it and it's going to grab the file and print it out so we see that this works but what I really want to talk about today is performance if I were writing a real server that needed to handle many many connections it is really going to matter a lot to me how my server behaves when there is a lot of traffic coming in all at once okay so let's look at how my little server behaves when we have multiple connections coming in at once okay so first off let's just time my client script and I'm gonna pipe the output to dev null because printing stuff to the terminal takes time and I really don't want a time of that I just want to time the interactions with the server also note that I'm connecting to the server on the same machine because that's easy for me to test right now but if we were connecting over the network then there would be longer delays at least I would expect there to be longer delays and your results may be different than mine depending on your network connection and your network setup okay so now I want to look at what happens if I provide a bunch of connections all at once to do this I've made a shell script that basically just runs my little client script 50 times at the same time in separate processes so they're basically just all hitting the server at roughly the same time they're not gonna be exactly the same time but close enough they're not going to wait until the previous one is completed they're just all going to try to connect more or less at the same time which is the case that we're interested in it's basically simulating high traffic and if I run it we see some interesting things first you'll notice that several of my connections failed okay to understand this we got to look back at the code to see what's going on now remember when I called listen I specified a backlog now this is the number of connections that the system will queue up before it starts rejecting connections I started it back with a backlog of one but let's go in and change this to 100 because that backlog was the problem a backlog of one basically meant that when I had all these connections come in it wouldn't queue them all up now with a backlog of a hundred we can now allow a hundred waiting connections and if we run it again you can see that now it works and it's not really surprising but 50 connections does take quite a bit longer than a single connection but can we make things faster the answer is probably maybe but it depends a lot on how the server is spending its time if the network connection is slow and the server spending all of its time getting data to and from the client then we probably can't at least not without fixing our connection because we're not really waiting on the server we're waiting on the network but let's say that we're waiting on the server and the server is spending a lot of time handling connections this is a reasonable assumption because disk access is 10 to slow compared to processing now if the server is spending a lot of time on disk accesses or on tasks that can be done on different processor cores then maybe we can speed things up by handling connections in separate threads okay so let's turn our server into a multi-threaded server now there are different ways that we could use threads in our server I'm going to start out with the simplest and then follow up with some more complicated options in a future video for today where we are going to use threads is down here in this code that handles connections I'm going to change this so that it handles connections in separate threads so each connection it will be in its own thread and then if connection handling can actually be done in parallel we might get some speed up so first I need to include P thread H this gives me access to all those nice P threads functions that we're going to use to create threads then we come down here where we call handle connection and let's create a new P thread underscore t variable that's going to keep track of our thread for us then we call pthread create passing in the address of that variable and we need to provide a thread function and an argument just like we always do when we create a new key threat for our thread function let's just pass in handle connection since that's what we want the new thread to do for the argument I would like to pass in the client socket but it needs to be a pointer because that's how P threads works and preferably a pointer that isn't going to be messed with by any other thread so let's allocate some space on the heap for an int and will store the value of client socket right there we can then pass in this P client pointer as an argument to the thread so now we're passing the socket effectively to the thread and then we need to come down and change our handle connection function because thread functions need to return a void pointer and they need to accept a pointer now let's rename the argument to remind us that it's a pointer now not an int and will then copy it to a local variable once the function starts and I'm gonna free the pointer right here because we don't need it anymore and I just really don't want to forget to do it later on this isn't the only way to do things but it's what I'm gonna do right now and then let's return null everywhere that we called return before to appease the compiler since now our function is supposed to return a pointer and then we need to come back up top and fix the function prototype so that it matches okay now let's come down and make it and that morning is gonna bug me I know it's just a warning but let's fix it basically what it's saying here my thread function accepts an int pointer and Peter create expects a thread function that takes a void pointer and really a pointer is a pointer and so from P thread creates perspective this really shouldn't matter at all but that's okay I'm gonna fix it anyway so let's come down and change handle connection to accept a void pointer instead and then we'll just cast it to an int pointer because that's what it really should be and now we should be fine okay and we got to change it up top as well and we run make and now it compiles just fine and it runs okay so far so good and now we can make a few requests notice that the request times are pretty similar to what we were seeing before and if we run our 50 clients at once we notice it takes about the same amount of time as before which is kind of a bummer so the question is did adding threads make any difference at all well let's look into it to do this I'm going to make some changes to my client instead of printing out the file I'm going to have it time how long it took to get the file and then print that out because we know that it's giving the content successfully but now we really worried just about performance and time and I'm going to change my higher traffic script to request each of the different test files round-robin style I'll show you why in a minute and this will let us see how our server looks to each of our clients okay so let's compile and run our server now I said before that all of my test files were just copies of the server source code and that wasn't quite true there is one exception and just to make things interesting there is a test file number six that I put in here that is a bit larger quite a bit larger than the others so some requests are going to take a while longer than the others and that's why I changed my test script so now one out of every six requests is going to request the big file so we can see how this is going to influence my server's behavior anyway now we can test out a few single accesses you can see that the short accesses are quick and requesting number six takes longer not surprising and when we run all 50 at once you can see this again some connections are very fast and some take a lot longer especially down here at the end some requests are taking more than a second to complete okay so what happens if we don't use threads we can just comment out pthread create and call the function directly now we are handling connections sequentially and we compile it and run it and single connections are taking about the same amount of time as they as they did before but when we run fifty at once we see something different notice how all of the connections are now taking about the same amount of time and most of them are taking a lot longer than they did before what's happening is that slow connections are causing delays and the other pending connections have to just wait their turn they can't be processed until the one before them finishes so with threads the super fast requests finished quickly and they got out of the way right handling multiple requests concurrently slow down the big downloads a bit but a lot of our clients got better service and now just for clarity I want to tweak one little thing and have my client print out which file it's accessing so now we can see in our output which files are taking how much time and now when we run it you can see that with threads the slow connections are mostly the long number six connections but without threads the long number six connections stall everybody that comes afterward ok so imagine if all threads did this when you accessed YouTube your YouTube video request would have to wait and tell everybody before you inline finish downloading their entire video the server might still be able to serve up the same amount of video but you the client could be waiting here for days before your video starts to play the reality is that clients requesting large downloads are probably willing to tolerate a small slowdown they may not like it but they're downloading the special extended version of Lord of the Rings trilogy and they know it's going to take a while so what's gonna be an extra five minutes of download time especially if they can start watching the video before the whole thing is done but an extra 10 second delay on your websites home page could mean the difference between people using your site or people leaving your site and never coming back okay so I'm almost out of time but let's look at one last thing our server here is pretty fast it's accessing a small set of files from a solid-state drive so these requests are pretty fast and they're pretty uniform but what if my server had to access a slower magnetic disk or had to access another server to handle the request or basically if it has to do anything slow that doesn't tie up the processor and for illustration purposes I'm just gonna come down to my code and I'm just gonna add a one second sleep in our handle can function so this is my slow but not CPU intensive task and this is really where we are going to see major overall differences from using threads now when we run our 50 concurrent connections they all just stack up and things are super slow it takes over 50 seconds to run all of them through is each one waits for the one that arrived first to finish but if we go back and use threads now all of that waiting is done concurrently as well and now we do save a ton of time and all those accesses finish in just a few seconds okay so just a few things this is where I'm gonna stop for the day but I wanted to point out a few downsides to this approach yes it's simple when we have more work to do we just spin up a new thread to handle it but what happens if I have 10,000 connections all at once what about a million connections each of these threads requires memory and CPU time and at some point more threads is not gonna help more threads is just going to start killing my performance using up all my memory and things are going to get really really slow the other thing is that we're creating a new thread for every connection and creating new threads takes a little time we can probably save some of that time by reusing the threads we create for future connections and we'll talk about how to do that in a future video be sure to subscribe to the channel if you want to make sure you don't miss that video and I'll see you soon bye
Info
Channel: Jacob Sorber
Views: 56,621
Rating: 4.9465332 out of 5
Keywords: multithreaded server, multi-threaded server, server programming, multithreaded programming, multithreaded, threads, custom server, server in c, multiple threads, server with multiple threads, c programming, socket programming, multiple clients, tcp client serve programming in c, multiple client server program in c, client server programming in c, client server in c
Id: Pg_4Jz8ZIH4
Channel Id: undefined
Length: 14min 30sec (870 seconds)
Published: Fri Sep 20 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.