FreeBSD Kernel Internals, Dr. Marshall Kirk McKusick

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello my name is Marshall Kirk McKusick and I've been around as long as dinosaurs and mainframes have ruled the world which is to say the sixties and seventies however by 1970s a new breed of mammals had begun to show up on the scene known as mini computers although they were just toys in the 1970s they would soon grow and take over most of the computing market In 1970 at AT&T Bell laboratories two researchers Ken Thompson and Dennis Ritchie began developing the UNIX operating system Ken Thompson who had been an alumnus at Berkeley came back on a sabbatical in 1975 bringing UNIX with him In the year that he was there he managed to get a number of graduate students interested in UNIX and by the time he left in 1976 Bill Joy has taken over in running the UNIX system and in fact continuing to develop software for it. Bill began packaging up the software that had been developed under Berkeley UNIX and and distributing it as the Berkeley Software Distributions whose name was quickly shortened to simply BSD BSD continued to be distributed with yearly distributions for almost fifteen years initially under Bill Joy and later under others including yours truly. By the late 1980s interest had began to grow in freely redistributable software so a number of us at Berkeley began separating out the AT&T proprietary bits of BSD from those parts that were freely redistributable. By the time of the final distribution at BSD in 1992 the entire distribution was freely redistributable. I live in a capsule history here but if you're interested in the entire story I have this three-and-an-half hour epic which is available from my website www.mckusick.com that gives the entire history of Berkeley. Following the final distribution from Berkeley two groups sprung up to continue supporting BSD the first of this was the NetBSD whose primary goal was to support as many different architectures as possible everything from your microwave oven all way upto your cray XMP In fact today NetBSD supports nearly sixty architectures. The other group that sprang up was FreeBSD. Their goal was to bring up BSD and support as wide a set of devices as possible on the PC architecture. They also had a goal of trying to make the system as easy to install as possible to attract by a wide group of developers I chose to work primarily with the FreeBSD group both doing software and also together with George Neville Neil writing this book ""The Design and Implementation of the FreeBSD Operating System"". Together with this book I developed a course which runs for twelve chapters and thirty hours. The purpose of this video is to give you a taste of that course. What follows are excerpts from the first lecture of the course which of course you can also get from my website www.mckusick.com. Enjoy. This class is nominally about FreeBSD because well that's what I know best and that's what the textbook is organized around but the fact of the matter is that it's really a class about your UNIX and that really covers sort of the broad range of things in the open source arena as its FreeBSD in Linux which of course you use a lot out and it also covers a commercial systems %uh Solaris, HP-UX, AIX and so on. I am going to tend more towards the open side open source side of things.So it's really going to be more FreeBSD in Linux than it's going to be Solaris and HP-UX and so on. For the most part at the level of this course we're dealing with the interfaces to the system and the fact that the matter is a those interfaces are highly standardized at this point and whether it's FreeBSD or Linux or Solaris or whatever the Socket system call has to do the same thing, it has to have the same arguments in that, it has to have the same effect and so until you get down to the really nitty details of how they actually go about implementing that the differences are relatively minor. So I would say that sixty to seventy percent of the material that I'm covering is just as true for FreeBSD as it would be for Linux or for Solaris %uh AIX is a little bit sort of off in the weeds %uh as is HP-UX but luckily we don't have to worry too much about that. Okay so the other thing is that I'm going to assume that all of you have used the system. I get really sort of worried when people you know raise the hands and ""Hey, what's a Shell?"" or I don't put a lot of code up but a one piece of code and someone said ""Why are there two pipe symbols in the middle of that that If statement?"". No we're not programming the Shell we're programming in C. So hopefully you can tell the difference between Shell scripts and C code. so okay but I am but am gonna assume you haven't really looked inside the system. So I gonna start everything to at a very high level. The problem is I have already discovered you come from a lot of different sort of backgrounds and levels of knowledge and so the way that I find works best to sort of be useful to everybody is that three pass algorithm so what I will do is start the first pass a very broad brush high level description of what's going on and then I will go back and i'll go through the same material again but at a lower level of detail then i finally go back and go through a very nittily low-level of detail and the fact of this is if you are learning new stuff as I'm doing the high-level thing you are gonna be utterly washed by the time I get to low level niggly details but since I'm going to do it topic by topic when I get to the end of one of those nearly low level niggly details i'll give you a clue as i will say ""Brain reset, I'm starting a new topic"" so even if you're completely lost you can now start listening again plus I'm gonna get the broad brush up again. okay and for those of you that know a lot of this stuff already you'll probably find the broad brush rather boring but by the time we get down to nearly low level details I think you'll actually pick up some things that you will find useful and interesting. So in this way hopefully everybody will get some useful percentage of material out of the course. I am gonna start out by just walking through and giving you the outline of what we're going to try and do here here As i said we're going to go roughly just about two-and-an-half hours of lecture about two hours forty minutes per week and so we will start off this week with an introduction. This is as I said we're going to start from the top and then just start working our way down so the general thing I'm going to do is to talk about the interface %uh which is something that you are presumably fairly familiar with since you've worked with that system and then you have to sort of layout terminology although we use normal english words they have sometimes rather bizarre meanings compared to their common usage and so I will just sort of lay out the terminology lay out the the way we talk about how the system is structured and this week we will also talk about the basic services ""What is it that the kernel is providing for us?"" and then of course we'll proceed to dive down in and and see how that is done so here in Week number 2 we're gonna look at the system from the perspective of something that manages processes. One way of looking at the kernel is it's really just a the resource manager and the resource that its managing are things going to do with processes So we'll look at a process, what the structure of it is and talk about the different ways that they can be structured. Process can for example be an address space and can have one thread running in it can have multiple threads running in it. so we'll talk about the different ways that we think a process is. We will look at the management of those processes we've got to lay out the bits and pieces that need to be managed and then talk about how we do that. we'll talk about jails.. this is something that you currently find only in FreeBSD hasn't made it into Linux yet although the concept is being actively worked on so my guess is that you'll see that fairly soon. we'll also then talk about scheduling which is in essence how we decide what gets to run, when it gets to run, how long it gets to run, etc. okay The week after that we will go into virtual memory. Signals aren't really part of virtual memory but they didn't fit into next week's material so I just would dropped that at the beginning but the bulk of Week 3 is going to be the management of Virtual Memory. So we've got a bunch of physical memory, a bunch of processes that are trying to use their address spaces and we will talk about essentially how you will make that all work It's called a virtual memory because it's sort of a cheat. We promise you the world and then we deliver you as small number of pages as we think we can get away with. Okay. So the first three weeks then essentially get us through looking at the world as if it was all all about processes. Then in Week 4 we change gears. we say okay well you know the kernel isn't just all about processes. You can sort of look at it orthogonally and you can say it's really just a giant I/O switch it's just like a traffic cop that's just managing these I/O streams and so let's look at it from that perspective. And we'll start with special files, again this sort of the interface when you talk about UNIX systems, when you talk about what's normally /dev interface that gets you access to the various I/O streams that are available and we'll look at how that's organized and the structure of it which used to be fairly simple but in the last decade has gotten incredibly complicated. We will also talk about pseudo terminals in job control this is about as interesting as watching the grass grow but unfortunately it's a major component of the system and especially people that deal with system administration have to know far more about this than they probably ever thought they wanted to. Okay we will then continue in Week 5 with the kernel I/O structure, We will start with multiplexing of I/O. The kernel of course has done this always but we're really talking more about how do we export I/O multiplexing to user applications. We will then move into auto configuration strategy Auto configuration is what happens typically or historically I guess you could say as the system boots. so all that stuff that comes out about what hardwares are on the machine and how it's all interconnected all of that is tied up in auto configuration and that used to happen just once it boots but in modern systems today it's an ongoing process. It happens at boot but it also happens anytime you plug a new I/O device, a PCMCIA card, or you remove a disk or you put in a new disk. or any sort of activity that changes the I/O structure of the machine auto configuration has to get fired back up and figure out what's disappeared and cleanup and figure out what new has arrived to configure it in. and then we'll talk a little bit about the configuration of the device driver this actually gets into an area that is one well let me just give it as a bit of advice to the class esspecially those of you who work in system administration. You really want to be careful that you don't learn too much about device drivers because there is really these three things that it's not good to learn about and if you do learn about it it's really good to keep it to yourself because if you become an expert or viewed as an expert in any of these areas you will become the designated stuccy for that and your site you'll never get to do anything but that so The three things that I highly recommend not learning very much about are device drivers, send mail configuration files or anything having to do with LDAP or anything in that general domain because as I say that will become your life's work and there's other things that you might find more interesting. ""Do you have a question?"" so one of my students empathizes with my point I believe you said you worked on that mail system so you you might know something about Sendmail configuration files but you don't have to answer that okay so we're going to talk about what a device driver does and really just sort of the entry points to it but we're not going to talk about how you write such a thing, how you debug such a thing or much of anything about it. I actually used to teach an entire class believe it or not about device drivers but then I realized the error of my ways and I have since gone through and made a point of forgetting every slide in that talk. okay so then we will move on to File system and as always we'll start at the high level talk about the interface what is it that is exported out of the system and then we will start diving down in the C and how do we go about implementing that so we'll start with the so called Block I/O system it's historically been called buffer cache and you still hear it called that periodically and the fact of the matter is that there isn't really about buffer cache anymore, there is just one big cache in it.Its the VM cache and the Filesystem has a view into it and the processes have a view into it but at the end of the day you really don't want the same information on two different pages of memory because that just leads to trouble. But Filesystems think they have buffers and so there's this manouver where we make these things that look like what historically were buffers that really just map into VM system but they're still managed in the way that they have been managed historically okay We will then get down into Filesystem implementation the local file system if you will and into also soft updates and snapshots. this for the time being is something that you see only in FreeBSD the alternative to soft updates is journalling which is %uh more commonly used for example what is used by ext3 and so i'll go through soft updates and a lot of the issues in soft updates are the same issues that you have to deal with journalling what is it that we're protecting and how do we go about doing that and the difference is in the detail. There is actually a paper in the back to your notes if this is something that interests you it's a comparison of journalling versus soft updates that was done about five or eight years ago. and not to spoil the punch line but the answers they both work about are the same Okay snapshots again is something that if you've worked with things like the network appilance box you're probably quite aware of what snapshots are and how they do or don't work for you this is the same functionality in the Filesystem implemented in a somewhat different way okay so this Week 6 is really going to be the local file system the disk connected to the machine that we are dealing with. Week 7 then we get into multiple Filesystem support so how do we abstract out that Filesystem layer and support Multiple Filesystems at the same time so for example in FreeBSD you can of course run with their traditional fast Filesystem but if you happen to like the Linux Filesystem better or you have to share a disk with a Linux machine you can run the ext2 or ext3 and it will perfectly happily do that so we will have to look then at how do we provide interface so that we can plug in all these different Filesystems that we want to support another area of which there's been a great deal of growth at least in code complexity is so-called Volume Management so in the good old days a Filesystem lived on a disk or piece of disk and that was that but in this day and age that won't do any more so we aggregate disks together by striping them or RAID arraying them or various other things and we need a whole layer in the system just to manage those disks we'll then get to the as an example of an alternative Filesystem we're going to talk about the Network Filesystem or NFS but that's not because this is the world's best remote file system or the cleanest design or any of the properties you might hope that such a class as this one would have but it's ubiquitous very widely used and so we're going to talk about that one okay we'll then once again switch gears in Week 8 and turn our attention to of Networking and Interprocess communication and again we'll start from the very top so we'll go through, we'll go with concepts, the terminology that gets used and what's the difference between domain based addressing and an address domain you know we'll go through what the basic IPC services are, essentially what are all the system calls that have anything to do with networking and just sort of describe what each of them are and I'm going to go through a somewhat contrived example that makes use of every one of those interfaces and just to sort of show how they all connect together and for those of you that work in networking or had done any kind of network programming if you're looking for a week to miss and the Week 8 is the one to miss that's 'cause that's the sort of most basic lecture that I'm going to give If you are not sure whether or not you need to go through that, there is one of the papers in the back it is an introduction to Interprocess communication read that paper if you say yeah yeah yeah yeah yeah you are done with Week 8. on the other hand if you dont come to Week 8 and then in Week 9 I say I call on you and say alright what is it that listen system call does and you can't tell me you're gonna get a demerit okay then in Week 9 we will get into the actual networking implementation itself, we go through system layers as we did in all the other areas and we will spend a significant portion of that class talking about routing routing for those of you that haven't had the pleasure of dealing with it is a black art or at least a dark science and so we'll talk about it from the perspective first of all of what do we do locally within the machine and then what are some of the bigger strategies that we can use for doing routing enterprise wide routing or area wide routing something like throughout the state of California or throughout the US whatever this again like device drivers is really just sort of a nickel tour through the what the choices are what that the basic strategies are that are used If you're thinking you're going to walk out of here knowing how to set up a routing well sorry we are not going to get that far but you should at least have a pretty good idea of what the issues are and what the general solutions are okay then finally in Week 10 well not finally but next few weeks and we will go through the Internet Protocols primarily TCP/IP and this is what are the algorithms that are used and I'm putting a particular emphasis for this particular class on changes that have been made in the protocols to deal with a lot of the sort of attacks that we've been seeing the SYN attacks and that sort of thing rather than just a straight iteration of what the the actual protocols are i'll talk primarily about IPv4 but I will also try and talk a bit about IPv6 as well all right so the first ten weeks are sort of the kernel course now we attack two weeks at the end to talk about sort of the bigger picture of System Tuning,Crash dump analysis that level of thing The idea is to really consolidate what we figured out or talked about in the first ten weeks and how that applies to tools that we have available to us to look at what the system is doing, analyze what the system is doing and hopefully improve the performance of what the system is doing and for the most part the kind of tuning that I'm talking about is not going in and hack hack hacking your kernel because the fact that the matter is most of the time you can't do that anyway so it's more looking at it from the perspective of saying is this system running badly because it doesn't have enough memory on it? or is it running badly because there isn't enough I/O capacity? or is it running badly because it's got enough I/O capacity but certain drives are being overloaded or is it being overrun because we're simply trying to do too much on this machine?,etc. so that's the sort of level of thing that we're looking at it but tied into lot of concepts that we talked before so we can talk about active virtual memory and what that means and essentially measure what it is and hopefully then you will understand in the context of what we talked about in the VM section what that really means the Crash dump analysis is one of these topics that you are gonna love or hate you actually have to deal with crashed dumps its people find it invaluable and if you don't have to deal with Crash dumps it's an incredible mass of boring detail the only good part of it is that that's the whole session is only about an hour long If it interests you, listen closely and if it bores you, well, its only an hour long okay lastly we'll talk a little bit about security issues again this is really more to the tools that are available to deal with security staff as opposed to a complete tutorial on how to implement security so those of you that deal with security this is just gonna to be sort of security one oh one for those of you that have but you'll have to deal with it but haven't really thought about it it'll probably scare you to death and you wonder how to keep the machines from being hijacked everyday Okay so that's in essence what we're going to try and do here anybody have any comments, questions, thoughts. No? All right well. Let's get started we will be begin on page fifteen with an overview of the kernel. Hopefully nobody's lost yet. What's a kernel? All right. so starting at the very top the big broad brush what we have is a UNIX virtual machine and virtual machines are actually something that has been around as a concept since the sixties difference is really just sort of the level of the interface that people have dealt with when they talk about Virtual Machines in the 1960s computers were these enormous things you would have your computer room would be something that'd be three times the size of this conference room if you had a computer the computer itself was tall as a refrigerator freezer imagine five or eight or ten of these units side by side that itself made up the computer that would be one big for the core processor and the one which should be the floating point unit and several of them that would be the memory the core momory literally the core memory and then they'd be other rows of these disk drives which were about the size of the washing machine and then behind that since you couldn't store everything on disks so then you had rows of tape drives and then you had this little set of sort of munchkins that would run around and and tend to the machine and they'd mount tapes and take off tapes and mount disc packs and remove disc packs because the drives themselves were very expensive and so you wouldn't just as today we have a one spindle that was dedicated just to one set of platters you could take out a set of platters and put in another hundred megabytes set of platters and these are platters that are this big around and it's like six or eight of them and giant head assemblies they comes rumbling in and out anyway one of these giant giant machines that costs many millions of dollars would run at about ten million instructions per second, 10 mips and 10 mips was more computing power than anybody could possibly imagine using in a single application just by contrast you know this four-year-old laptop here is probably on the order of one or two hundred mips but anyway people couldn't really view what we would do with a lot of computing power and the other thing was that you didn't have a notion of sort of an operating system that had applications running on it because everybody wanted to write straight to the raw hardware and so what IBM who was a big manufacturer of machines in those days did what they came up with this thing called the VM and this was a little you'd call an operating system really but what it did is it cloned independent copies of the machine that worked just like the original machines so you could boot something that you thought it was an operating system on top of VM so you take one least ten mip machines and it would clone six identical one mip copies and then you could boot whatever you wanted on each one of those machines so if you were doing database stuff you would boot your database because database cannot ran on the raw hardware or if you're doing payroll who would boot up the payroll program or if you actually tried to service users you could boot a time sharing batch thing that would read card images and print stuff out or they even had TSO the Time Sharing Option where you could interactively sit and type and send stuffs in and get answers back and also you could boot TSO so whatever set of things you need you could boot them and they ran independently as if they were running on their own machine but all the VM did was it give you an exact raw copy of the hardware so when UNIX came along they sort of liked the notion of providing the concept of independent things that you could operate in but they wanted it at a higher level so you're looking really to do it instead of at the raw hardware level to do it at a process level and the idea that then was that the interface you would program to would be what we think of as a System call interface today and the idea then was that you would be given a process or set of processes and those were independent. your process couldn't affect the address space of another processor. You couldn't reach over and mess around with their addresses, you couldn't mess around with their I/O channels you could slow them down by being a pig but that was about the only way that you could affect other processes and so what the interfaces that they had there was one that had these characteristics had a a paged virtual address space so you din't have to know as in the old days how much physical memory is on the machine and make your application fit into that amount of memory you just had what looked like a large uniform address space even if the underlying hardware had segments or some other hardware brain damage it looked to you like he just had a big uniform address space and the size of your address space was independent of the amount of memory that was on your machine your address space couldn't be bigger than amount of physical memory cause we sort of move pages around underneath whatever part address space was actually active and there's obviously limits to this if you are trying to run a 1 gigabyte of application on top of ten megabytes of memory it's probably going to bring new meaning to same day service but if you're willing to wait long enough it will eventually move the pages around and you will progress through getting your application run another thing was dealing with software interrupts in the old days you had to understand how the hardware worked in order to deal with exceptional conditions so for example if you did a divide by zero the hardware would jump through some vector location or something and you had know how that worked and make sure that you had your program usually some little bit of assembly language set up to deal with that and UNIX said let's let's get away from the hardware here and so they did this thing called signals and so they just define a set of the signals is that if you do divide by zero you simply register a routine you want to have called you don't have to know how the hardware figured it out you just know that that routine is going to get called and you can deal with it at that point well we got set of timers and counters to keep track of what we're doing, this is really more for counting than anything else but applications may want to have access to that. we have a set of identifiers that we're going to use for things like accounting, protection and scheduling and so on and one of the the early philosophies of UNIX was to try and keep it simple. operating systems have gotten very baroque in particular the thing that pre dated UNIX was a thing called Multix Multix was was a joint project between Honeywell, a big computer manufacturer of the time AT&T bell laboratories the big industrial labratory at that time and MIT a big university then and still today and those three organizations got together to try and build this time sharing operating system and it it just got bigger and more grandiose and more complex and never finished because as soon as they sort of see oh we know how to do that but we could do this other thing too and so then they would tear it apart and they never really got to something that could be put into production and so the AT&T Bell laboratories decided to pull out of that project and the two of the people that had been working on that project, Ken Thompson and Dennis Richie were sort of bummed because they were now back to typing cards and putting them through card readers and they had gotten used to the idea that you could actually sit at an ASSR33 teletype and interact with your computer and so they found an old %uh PDP-8 sitting off in the corner that had been abandoned and started working on this little tiny operating system which they called UNIX which eventually moved to the PDP-11 and became what we have today but because it was they were coming first of all from Multix where everything had been done and in great grandiose detail and because they're fundamentally were two of them working on it and they wanted to get something done and within a year or so one of their philosophies was let's find the one way of doing things let's not have eight ways from Sunday let's just get the one way and that's what we will provide. So what is the sort of core set of things that we need. well first thing is when it comes to identifiers, let's not have you know eighty thousand different identifiers so they came up with process identifiers, user identifier and at that time a single group identifier and later expanded and they used that sort of identifiers for everything so its used for counting, used for making protection decisions, used for scheduling decisions and again it was the simplicity of thing which was what was driving their decision but they're really sort of two key ideas that they had that really made the difference that that's what set them up side from what everybody else had done before them and which in retrospect is something that has been pervasive more or less ever since the first of these was the notion that we have a unique descriptor space that is given a descriptor it can reference any I/O device so or even any kind of I/O channel so you can have a descriptor for terminal or descriptor for a file or descriptive for a disk or descriptor for a pipe or descriptor for a socket and you don't need to know what it references in order to be able to read and write that thing so if i hand you a descriptor you can read from that the descriptor or you can write to that descriptor and the correct thing will happen and you'd say well that's so obvious I mean how else could you possibly think of doing it? well predating UNIX everything was done with a little subsystem that would open a file, read a file, write a file, close a file and there was another set of system calls which would open a terminal,read a terminal, write terminal, close terminal and yet another one which was create a pipe,read a pipe, write a pipe and so on. so if you are just a drop dead stupid program like say CAD you would have to have code in there and say was my input a terminal which in case I need to use the read terminal or is it a file which in case i need to use read file or is it a pipe in which in case i need to use read pipe and so the program itself had to have all this coding in it whereas when they went to the uniform descriptor space CAD doesn't know it doesn't need to know it just says read my input, write the output and it works and we add a new type of descriptor and CAD just continues to work just as it always did. So this proved to be a very powerful construct and pretty much every operating system after UNIX did that there's one exception of %uh large company in the Pacific North-West that still has not quite uniform descriptor space but %uh that's part of their legacy that really they're working on that. Longhorn will be here. and anyway this set of facilities then makes up the UNIX virtual machine and in some sense we still see virtual machines being used today in fact we're seeing sort of a reversion back to some of the IBM stuff in things like the VMware which is essentially allow you to go back to booting native operating systems again so sort of interesting to watch that the sort of pendulum of back going back and forth of what's the correct layer for for doing virtual machines Okay? so far so good? all right so i said that there were two key ideas that UNIX had the first of these being the uniform descriptor space the second one which was really critical was this notion of processes as a commodity item so here on Page 17 I've tried to lay it out the that the components that make up a process and what do I really mean when I say a process as a commodity item okay leading up to UNIX the systems that pre-dated it, processes were these very large heavyweight expensive things and if you look at MVS which was the operating system that ran on IBM for doing multiple processing and the system administrator would decide at boot time what degree of multiprocessing they wish to support so they'd say well well, we'll let upto six things happen at once and so as part of booting up they would create six processes and now you as a user if you wanted to do something let's say you wanted to compile and run a program you would be given a process and it was up to you to figure out how to stage what you needed done and that this was often fairly complex and so you would have to write out all the steps that you wanted in this wonderful thing called JCL Job Control Language. Job Control Language was send mail configuration file of the sixties there where people whose sole job at the company was how to put this stuff together 'cause all you had to do is get one extra space or a missing comma something in there and the whole thing would just blow up. it would just sort of spit the card deck back at you and say well somewhere in there is a mistake that's sort of in the general area of this card and I can't deal with it. Fix it. and of course in those days it wasn't just a matter of hitting carriage when you know make carriage return you have to get your deck pull out the card, and type the new one, put it back in and re-submit it As heaven forbid you couldnt touch that card reader you know, it had to be done by an operator so the card deck will read through it would disappear and you know if you're lucky a few minutes later if you were not lucky a few hours later you would get a print out which was what had happened and then you could look at it and you know I put a comma in the wrong place I guess I get to do it all again so the thing you would need to do there for compiling and running a program was you'd have to break into these steps. well I need to run the the preprocessor and so clean out whatever gump that was left over on that process from the previous user put the preprocessor in there and then read from this file here let's say I gotta put it somewhere so creative scratch file over on this disk and it was excruciating detail like how many cylinders and how many tracks and this and that blocks blah blah blah and don't forget any of those parameters 'cause it'll spit it out if you do and so then it would run the first step in that if its successful then you'd have sitting in this scratch file that you had created the output of the preprocessor and then you'd load the first pass of the compiler and you say now read from that scratch file and create this other scratch file over here and when thats successful and we need to delete that one and then load the second pass, put that back into another scratch file and then we run this assembler, and the optimizer then the loader this and that finally run the program and if all goes well you know at step sixteen out comes the answer forty two. so UNIX said, look this is silly a lot of this is just bookkeeping and computers do bookkeeping really well and you'll recall yeah but it's going to take all these cycles it's like computers are supposed to be labor-saving devices right? so they came up with this notion that they would create processes on the fly as needed you had you've had a preprocessor in two steps of the compiler and then optimizer and then a loader we just create Boom seven processes and we connect them together with pipes and so we take the input and you know run through in through the pipes and you know out the end you get the the executable and we will simply create each of these processes and so you as a user just type you know the C compiler and it just fork these things pipe them together got the result and then once it was done with this processes is just threw them away so any time you'd create a new process and it came to you pristine clean and you needed a bunch of things it did put everything in intermediate files the fact of the matter is in the early days those computers didn't really have enough memory to support all that stuff at once so behind you those pipes were actually implemented as files but you didn't have atleast to remember to create them and delete them and deal with them as far as you were concerned it just look stuff flowing through pipes and of course today it just does flow through pipes in memory okay so this notion then that that we're just gonna create processes on the fly is needed and connect them together as needed it was a novel concept and it wasn't that somehow mysteriously figured out how to create processes cheaply cause they hadn't they were still really expensive to create but that extra effort was worth it because it was saving a lot of programming time so my favorite example is you run ls so we have to create a process load the ls binary into it it prints a line or two on your screen and we tear the entire thing down and return all its resources back to the system more than ninety percent of the cost of running ls is creating and destroying the process a tiny fraction of it is actually running ls but it goes so fast, who cares right so the point is that that concept of just creating things as needed again was very powerful and is one that is just pervasive today okay so what is a process actually made up of it gets some amount of CPU time or at least we do dearly hope that it gets some amount of CPU time, the lack of getting CPU time that makes it a computer so sluggish of course others really boils down to scheduling and we're going to talk about scheduling probably more than you care to in a couple weeks time we have the asynchronous events these are the external events that are coming in so they may be either things that were coming in from the outside world like start, stop and quit oh out-of-band data arrival notification that kind of thing or it may in fact be things that the program is bringing down upon itself such as a segment fault,a divide by zero and some other what would normally be viewed as incorrect operation and so we'll talk about that when we talk about signals every program gets some amount of memory it gets an initial amount when it starts up injured generally allocates more as it goes along this of course we will deal with very extensively will spend an entire week on it when we talk about how virtual memory is implemented and then we get I/O descriptors I used to say that every program had to have at least one I/O descriptor since it absolutely had no input absolutely no output then it was sort of pointless of course I had to have one of my students come up and point out to me there is an a class of programs which don't need I/O descriptors and that is these things called benchmarks it just compute something all we really care about is how long it takes them to compute we dont actually care what the answer is In theory we dont I personally like my benchmark stop with something so I can see it there doing computing the right thing but in theory that wouldn't be necessary outside of that class of programs everything needs some sort of descriptors and of course we'll talk about descriptors quite extensively as we go through the I/O subsystem okay so the executive summary is that processes are the fundamental service that is provided by UNIX and what we're going to spend essentially the next two and a half weeks working on is what what makes up processes we'll go into much more detail about each of these four points and then how do we actually go about providing that bit of service the next thing that I'm going to do now is this go through and lay out some of the terminology that we have when we're talking about processes so this is sort of the big picture here were on page eighteen and you can see we have sort of three bits that make up the system we have the currently running user process and then what we call the top half of the kernel and the bottom half of the kernel now this would be a picture for a uniprocessor so one CPU if we had a multiprocessor %uh then we would have one instance of the kernel but multiple instances of the user process but for any given CPU on a multiprocessor it is running exactly one process so you may think they we're running for four-five processes all at once but the fact of the matter is that any instant in time there's only one process which is actually running and that is the one that we have loaded in the system now we give the illusion that were running lots of things because we switch between them rather quickly so it looks like things are happening in all windows at once but in reality that's not really happening okay so there is a set of properties that I want to look at that had to do with each one of these parts here but just to sort of look at it from the big picture perspective what you see here is there is boundary between the user process and the top half of the kernel which is really just like a glorified sovereignty call it's a lot like calling into a library routine like calling strcat, strcpy or something like that when you do a system call we take that same set of parameters now this is sort of brick Wall here if you will that is protecting the top half of the kernel from the application I'll go more into some detail about how that actually gets implemented but in essense you can think of it is is there sort of this whaling Wall and these little chinks there and you can sort of push a request through and somebody other sides sort of pulls that looks at it and decides whether they're going to dain to provide service to you and if they do then they sort of send it back well like a library where you can just sort of reach in and walk around if you want to you good programming practices you don't do that but you could all right so the the top half of the kernel is really looks a lot like a big library %uh it just happens to be a library routines that deal with things where processes need to interact with each other in fact for many people they don't understand for what's the difference between the C library and the top half of the kernel if it's something that you're doing that no other process needs to know about then it can be in the C library so if you call strcat to concatenate two strings together nobody else needs to know you're doing that you don't need to coordinate with anybody else that you're doing that it's just happening so that goes in the C library. on the other hand if you're reading or writing the file there may be other processes that are also reading and writing that file and therefore that has to be done by the kernel because they can coordinate all the different processes that are trying to access that file. so the top half of the kernel is pretty straightforward code it looks a lot like any other library that you would write if you look at top half kernel code you know you see all read,come in it's got these parameters we Mark around we get some data that we put it in the buffer and we return back and in fact writing code for the top half of the kernel is not all that difficult to do it's you have for many of the same properties that you would when you're writing user level application code the bottom half of the kernel is where things start to get nasty because the bottom half of the kernel is the part of the system that deals with all of the asynchronous events in the system is things like device drivers, timers that level of thing that are driven by hardware events so for example a packet arrives on the network that causes an interrupt to come and that will be handled by the bottom half of the kernel and historically when an interrupt came in it preempted whatever else was going on and it ran until it finished and then it returned and it could not go to sleep to wait for resources or other things %uh in current systems you can actually go to sleep in the interrupt driver and waiting for some other activity to complete it is however not a good idea to do that because the usual case of most device drivers is they can finish whatever they're doing in an interrupt without ever blocking and so when an interrupt comes in we assume that you're not going to sleep and if you actually then go to sleep.oh man you didnt tell us you're going to do this we have to go off to do a whole lot of other work that we had originally planned on doing so if you go to sleep in a device driver you are taking a very serious performance hit so it's highly recommended that you don't do that but if you have to you can on it's because of this historic behavior or of not being able to sleep in the bottom half of the kernel that you have certain properties that have taken over in device drivers and that is that a device driver should be handed all the resources it needs to get his job done you don't give a disk device driver Go read this and put it somewhere you have to say Go read this particular block here is a chunk of memory that I want that data to put in and notify me when it's done because things like allocating memory are classic places where you end up having to go to sleep to wait for stuff to happen and historically you couldn't do that even currently don't want to have to do that so device drivers generally have all resources pre allocated and then they can just go the one place where this doesn't work is the network and in particular you don't know when somebody's going to send packets to you you say well you're looking to open connections but if you're doing something like IP forwarding there's no top half state it's dealing with this packets they're just coming in on one interface being sent out on another interface they never pass through any part of the top half of the kernel and so in the case of network device drivers they need to allocate memory and if memory gets into short supply and they try to allocate memory and it's not available they historically coudnt wait for memory to be available and even in practice today don't wait for memory to become available they simply drop the packet on the floor it's like well I didn't have any place to put it sorry oops now that doesn't cause incorrect behavior because the higher level protocols will retransmit but it does cause great performance problems because retransmission means that connections stall they have to back up they have to resend data and so on so you really want to avoid dropping packets if you can possibly help it and consequently we tend to pre allocate a certain amount of memory for the network drivers and we try very hard to make sure that we're not going to run out of memory but if packets come fast enough and we can't deal with them as quickly as they are arriving then over short period of time we get to the point where we simply have to start dropping packets okay this is a part of kernel that you do not wish to write code for because it is extremely difficult to debug you get these bugs where the only time it happens is on the third Tuesday when there's a full moon and we have a disk interrupt followed by %uh a terminal character coming in and the network packet arriving of size fifteen twenty two and when all those things happened the system panics and of course there's like it panics cause you're following some bad pointer something that should have been there but was freed some time in the distant past we are not sure when and try to debug things like that is extremely difficult and you can think well I think I found the problem but it's not reproduceable you know you have to wait for the next third Tuesday with a full moon and blah blah blah to happen and you know so you sort of statistically guess that you fix that you know I was getting this bug once every three days and now it's gone for two weeks without happening did you fix that? or if you've been lucky and and it's that coupled with the fact that you're dealing with hardware and hardware rarely works the way it's documented to work and so you know they're doing everything that it says you're supposed to do it still doesn't work because you didn't set the fiddle bit over on that other place over there that's not documented anywhere but if it's not said it doesn't work occasionally so this is another reason that you really want of avoid dealing with this part of the system if you can possibly help okay but lets go through and and look at some of the properties here starting up at the user process we're running with preemptive scheduling now there's several caveats here preemptive scheduling is the default so called shared scheduler that is what you normally use there are other schedulers like the real time scheduler where what I'm saying isnt that true we'll talk about some of the schedulers was later but the usual scheduler that you're running on under UNIX is a shared scheduler and under the shared scheduler user applications run with pre emptive scheduling and pre emptive scheduling means that you run at the whim of the system if it wants you to run you run once you to start running you have no guarantee of how long you're going to run it might like to run for three instructions and then decide it doesn't like you many more it wants to run something else or you might get to run for several seconds and in a row with the with no intervening things interrupting you you just don't know and really all you know is that they claim that they're using statistics and that and that the statistics are fair and so on average you're going to get a reasonable amount of time but thats up to the system you don't control that the real point here is that you don't have any way of creating a critical section you can't say okay I don't want to be interrupted during this particular sequence of things so you have to program assuming that you may be interrupted at any point okay the next thing is that when you're running in a user process you are running in with the processor in what's called unprivileged mode one of the requirements for running any kind of a UNIX system is that you have to have a processor that support privileged and unprivileged two different modes of operation in privileged mode which is what the kernel runs in the entire repertoire of the hardware is available by this I mean you can set all the registers you can fiddle with the memory management unit you can initiate I/O you can access any memory anywhere etc when you're running in unprivileged mode which is what user processes run in and this a large subset of the instructions which you cannot execute you cannot initiate I/O on devices you cannot change the memory mapping you cannot access memory that's not part of your address space you cannot execute certain instructions like halt and so in general you are protected from manipulating anything that's outside of your address space this of course is desirable because when you're running in this unprevileged mode you're protected from other processes manipulating you and they're protected from you manipulating them for those of you that have had that misfortune to have to use early versions of windows up to about ninety eight they always ran with the processor running in privileged mode even in applications and so either maliciously or accidentally you could stop on other people address space or you could stop on the kernel and a lot of the blue screen of death was people just following wild pointers and trashing different parts of the system taking everything down it also makes it far easier to implement things like viruses and worms and other things because a user application can we rewrite the boot block on the disk they can just the write down and manipulate the registers that allow them to do whatever they want whereas when you're running in unprivileged mode you cant write those kinds of of things so modern versions of Windows anything from about 2000 on now run with privileged and unprevileged mode but UNIX has always required that and so when you're running an user process you cannot block i mean you cannot execute the instructions which cause a context switching to occur you can't pick what's going to run next you can't make that thing run next all you can do is go to the operating system and say hey I've got nothing to do. pick somebody else to run and the operating system is the think they can then execute the instructions which cause a different process to be loaded and run alright.finally while you're in a user application you're running on a user stack that's part of the user's address space so part of creating a process gives you a runtime stack as part of a virtual address space and so it can be more or less up to the limits of the hardware as big as you want it to be so if you are running on thirty two-bit processor you're stack can get the 2 gigabytes and the what this means is that anytime you allocate local variables you don't have to worry about Oh is that gonna overrun my stack? so if you need a hundred thousand double precision floating point numbers you can just as a local variable allocate an array of size a hundred-thousand type double and it just decrements your stack pointer by hundred hundred thousand bytes away you go it's just virtual address space as you'll see when we get into the kernel that ceases to be the case
Info
Channel: bsdconferences
Views: 87,858
Rating: undefined out of 5
Keywords: freebsd, kernel
Id: nwbqBdghh6E
Channel Id: undefined
Length: 59min 56sec (3596 seconds)
Published: Tue Jan 13 2009
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.