How one thread listens to many sockets with select in C.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody it's really time that we talk about select welcome back everybody to another video where I try to teach you something that maybe your other professors haven't taught you and to help you thrive as a learning developer the topic for today is select it's a function that's been on my list to cover for a while I did a series a while back about socket programming and making network servers we also talked about threads and multi-threaded web servers and a lot of you have asked me to look into other styles of programming specifically asynchronous i/o and event-driven software design so today is the beginning of that journey it's one that's gonna take a while I probably will talk about this for you know 10 videos into the future but we got to start somewhere and select seems to me like a good place to start note that all the source code for this video is available through patreon a big thanks to all of you who helped support this channel in different ways and of course if you like what you're seeing on here please like please subscribe so you don't miss the next video and of course if you don't then well don't so let's start today with some simple web server code now this is basically the code that we wrote together a while back when I taught you how to make a simple socket server in C I've modified it slightly to make it easier to follow but it's pretty simple it initially just sets up the server telling it to listen on a specific port and then it waits for connections and when it gets a connection it handles that connection with this handle connection function and once that's done we come back and wait for the next connection and we basically rinse and repeat we just do this cycle forever here's a quick glance at these functions my previous video goes into more detail so set up server does just that is basically just calling bind and listen to set up our server socket which is then returned for us to use elsewhere except connection basically just calls accept and does a little error checking this check function down here that's what's doing the error checking it just does some common error checking and it's really just there to end the program if anything goes awry and then handle connection is where most of the server's heavy-lifting happens though it's really not that complicated it's not anything to worry about it just reads the request from a socket which is the path to a file and then it reads the contents of that file and sends them over the network back to the client as I mentioned in an earlier video this code is illustrative it's helpful for educational purposes but it's not very secure so please don't go posting this on a public web server or if you do please don't blame me for the consequences now where was I let's back to me as I have mentioned in other videos the upside of this server design is that it's simple I mean it's it's seriously just 128 lines of code with comments and this is C so you know about 40 of those lines or probably includes up at the top the downside is that it's fairly slow it only handles one connection at a time and if one of those connections stalls either because of a slow connection to the internet or because a malicious user is out there being intentionally slow to mess things up it can make things bad for everyone else now I've already talked about threads as one way to handle this links to those videos in the description but using threads does take up a decent amount of memory each new thread you create takes up a significant amount of memory and a slow connection can still mess up one of my threads and a hundred intentionally slow connections can still clog up a pool of 100 threads so let's look at an alternative and that is select like just about everything else in computing select is not perfect more about that later but he does let us get some degree of concurrency without creating new threads so here's the basic idea most standard i/o functions that's functions that do input and output most of them use a blocking approach what that means is that we call the function say a read from a file and that function blocks or pauses the current thread until the operation is complete once the data from the file comes back from the disk the function returns and the computation resumes so that's how blocking means we also sometimes call these blocking calls synchronous calls so the alternative is to use non blocking or asynchronous calls these calls make our request and then rely on an event a signal a callback function or an interrupt something like that to let you know when the request completes and this allows you to get work done while you are waiting for that operation to finish select which is the function we're going to look at today is sort of asynchronous I mean it still blocks but he's trying what select does is take a group of file descriptors which can be open files open network sockets or really anything file like and pretty much everything is like a file in a UNIX system and select is going to tell you when there's something to read on any of them so say I have 10 current connections and I'm waiting for data to come through on each I can say a select watch these 10 connections and let me know when any of them is ready for reading it also works for writing but we're going to focus on reading for this example now in this example let's start by declaring 2fd sets this is short for file descriptor sets right now you can think of an FD set as a set a collection of file descriptors it's really a bit field which I'm planning to talk about in the near future also select gives us a few different macros for working with these FD sets like FD 0 which I'm going to use here to zero out or initialize my set of current sockets and FD set lets me add one socket so my server socket to the current set ok so this code is just initialized in the current set we're going to add to this later you probably noticed that I declared to FD sets but so far I've only touched one of them and that's because select is destructive it's going to change the set we pass in so I need a temporary copy that's what the other is basically there for so each time through my loop I'm going to copy the current set of sockets to the ready socket set ok this is just my temporary copy then I call select with select I tell it the range of file descriptors to check this is not the number of descriptors in my set which right now would just be one it's the maximum possible file descriptor and for now we'll just use FD set size and we'll come back to that later so after that select takes in 4 more arguments 3 RFD sets the first one being the set of file descriptors that I want to check for reading the second is for writing the third is for errors so like errors on a socket or errors on a file for now I'm just interested in reading so I pass in my read socket set in and leave the others as null so they'll just be ignored the last argument is an optional timeout value so say you wanted to select to only wait a certain amount of time for changes we could pass in a timeout value there to make that happen for now I'm just going to leave it as null so select is going to wait forever or until one of my file descriptors has something for me to read from it and for now if I get an error we'll just print it out an exit you should of course handle errors in whatever way makes the most sense for your application now when select returns we know that one of our file descriptors has work for us but which one now select is a bit strange as I mentioned before its destructive meaning that it changes our FD set so we passed in the set of file descriptors to tell select which file descriptors to keep an eye on and when it returns now that same FD set contains just the file descriptors that are ready for reading and that's why we made a copy I didn't want to lose the list of descriptors that I'm watching okay so how do we know which ones already we basically have to go through and check we start at zero and for now we're just going to go until we get to FD set size which as I mentioned before is the largest numbered file descriptor that we can store in an FD set and that's a little annoying we'll try to improve on that later but yeah so we're going to go through the range of possible file descriptor values from 0 to FD set size and for each one we'll use the F D is set macro to check to see if that one is set and if it is then we know that I is a file descriptor with data that we can read of right now now once this happens we're interested in two cases there's really two cases we're interested in one is that the file descriptor I might be our server socket in that case it's telling us that there's a new connection that we can accept so in that case I'm going to call accept new connection to get the new connection to actually get that connection and we know that it's going to be fast it's going to return immediately because select told us that there was data there to read and then once we get that new connection we use FD set to add the newly accepted socket that's the new client connection to the set of sockets that we're watching so that's case one the other case that we're interested in is when the socket that's ready to read from is one of those client sockets and in that case we just want to read its data and handle the connection and then of course once we're done handling the connection then we want to use FD clear to remove the socket from the list of file descriptors that we're watching and we really just keep doing that forever I have this return statement down here I guess it's fine but it's not really ever going to run since this server will just run in as loop forever until I hit control C and kill it whatever it's fine and let's make sure it compiles and runs and it does you it looks like I didn't set up any test files for this server that's okay the point of this example is just to show you how select works and in this case it's working just fine so let's talk about the good and the bad here we have a single thread that's avoiding dead waiting by using select select tells us when connections actually have stuff for us to read so that allows that single thread to use its time more efficiently and that's great one limitation of my program here is that it still handles the entire connection in one shot so note that if a client connects sends a little bit of data and then stalls I'm still going to stall that thread still has to wait and I could fix that by only handling a single read call each time through the loop each time select returns but that's going to require some more serious modification and restructuring to my example and so I'm gonna leave that for another day since this already shows the essence of how select works and of course I do have some gripes about select in general the main one is that I have to go through the whole range of possibilities as I mentioned before this is annoying I mean who even knows how big FD setsize is on my machine if we print it out you can see that it's 1024 so that means that if I let's say I only have two sockets my server socket and one connected client each time select returns I'm going to have to check 1024 different possibilities every time through the loop so that's not cool I can adjust this a bit by keeping track of the largest socket that I've seen so far so that way if my server socket is equal to 3 and I have one client that's 4 now I'm only going to have to go up to 4 each time so that's going to improve things definitely but it isn't a complete fix because well let's say that I have 500 simultaneous connections come through and this maximum starts to get big well that from that point on my loop will have to check a larger number of socket numbers until I restart my server so if that ever happens my server gets slower with age and let's face it that's not very satisfying but it's a start also note that because FD set size is 1024 I can't have more than 1024 active connections at least on my machine on your machine it may be different but before we get too down on select it has one great thing going for it and that is that it is portable select is basically available everywhere well some of its more modern replacements are not and sadly folks that's where I'm going to have to pause for today I'm going to be improving on this example in future videos I'm going to be talking more about asynchronous i/o and event-driven programming I'm also planning on talking about bit fields and bit masking in a future video so you'll be able to see what's actually going on under the hood with those FD sets so stay tuned for that be sure to subscribe if you don't want to miss it thanks for watching please consider giving it a like if you liked this video tell your friends classmates co-workers and please consider supporting this channel through patreon where you can get access to the source code into my virtual office hours so keep up the good work happy coding and I'll see you in the next video [Music]
Info
Channel: Jacob Sorber
Views: 34,073
Rating: 4.9711308 out of 5
Keywords: select in c, programming, C programming, c/c++, sockets, client, server, select, file descriptor, select tutorial, tutorial, multiple sockets, select multiple files, c select, c sockets, socket in c, select server, socket programming, multiple clients, handling select system call, select function, c programming tutorial, programming socket server
Id: Y6pFtgRdUts
Channel Id: undefined
Length: 12min 1sec (721 seconds)
Published: Tue Feb 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.