Containers From Scratch • Liz Rice • GOTO 2018

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

I love this talk , but she gave it at least in 20 different places

👍︎︎ 9 👤︎︎ u/bechampion 📅︎︎ Aug 09 2018 🗫︎ replies

In case this post doesn't get much traction, this gets posted every once in a while, and got upvotes last month

👍︎︎ 7 👤︎︎ u/pzl 📅︎︎ Aug 09 2018 🗫︎ replies

Lol saw this talk just yesterday. It's talk was great even though I am not a huge fan of live coding.

👍︎︎ 1 👤︎︎ u/DoomFrog666 📅︎︎ Aug 09 2018 🗫︎ replies

Captions

[Music] did you all hear me okay right yes okay a few technical issues okay so how many of you in this room I'm currently using containers put your hands up and leave your hand up if you're totally confident you really know what container is yeah yeah so I've been wanting with containers for a few years now and I know there was a period of time where I kind of heard like Islamic way desert through the air that didn't leave me with a very sort of satisfying feeling that I really understood what was going on and then I saw a presentation by attract for Julian treatment from IBM and he did something kind of similar to what I'm gonna do today and it really helped me understand what we're talking about when we're talking about container so hopefully that knowledge you know you're going to leave here with a much greater knowledge and a much greater understanding of what we really mean when we talk about containers okay so I'm gonna be building a container this afternoon and I'm gonna write it in a few lines of go if we got any gold programmers in the room few hands going up you're going to be my peer reviewers who are gonna let me know when I go wrong hopefully for the rest of you it you know doesn't really matter what the languages and we're going to build a container out of three concepts in Linux namespaces changing the room and control groups so I'm going to talk about what those different things are as well as we go along but before we start building all right let's have a look at what happens when we run a container under docker in the kind of normal way is that big enough for the back you see wave your hands is not big enough gonna take it that that's studies okay so we can do something like dock around and I wanted to be interactive and I'm going to remove it when it's done and maybe I run based on the image it burn to and I ran a show so I can start with a container image and I can run an arbitrary command sort of inside the container where that container is based on the image in this case ability and we've got a host name inside the container that is this is actually the the identity of the container so it's some kind of random identity that's been allocated if I look at the processes running inside the container I only see the processes inside the container unless numbered starting from one whereas if I'm on the host this is the same machine but from the hosts perspective I've got much higher process numbers going on okay so I'm going to try and recreate something a bit like this docker run let me quit have this container and the first thing I'm going to do is I'm going to talk about namespaces so and the namespace is where we limit what the water process can see so we just saw a container running and it could only see a few of the processes on the host and that's because it's got a namespace for process IDs it could only see its own host name and that's of namespacing and we set up these namespaces using Siskel's there are half a dozen of them depending on your particular version of the Linux kernel and this is a big part of what makes a container into a container it's restricting the view of the process have has of the things that are going on on the host machine so I'm gonna build I'm going to start with namespaces but before I do that let's remember we're trying to do dr. Ron and I would normally do an image name and then some command and some parameters in my go program I can do I'm gonna say this as main go and then I can do go run main go and that compiles and runs my executable so that's kind of the equivalent of docker I'm gonna have a command called run it's going to become apparent why I need that I'm not gonna pass in an image because I'm just gonna work you'll see I'm gonna sort of give it an image later and then I want my containerized process to run some arbitrary command and there might be some parameters so I'm gonna have a main function and the first thing I'm going to do is look at the command so where I've got run as argument one I want to check that I've got run and if I am that's very happy function happy and if we get anything else there for now we are going to class in a big heap and be sad because we don't like it okay so now I need a brother function and for the moment all I'm going to do is print out what it is that we're supposed to be running say we've been given the command and parameters and they are in our arguments too and onwards so let's just check that we can run this and I need run as my command and I might say let's say hello Amsterdam so all it's done at the moment is it's logging out the fact that I wanted to say hello answer that but it's not even doing that yet so I need to write a bit of code to actually run on my command and in Goa we do that with a command an exec package so I set up a structure for my command that I want to run and when I I don't actually run it until I hit this run method and what I wanted to run is whatever the command is that was passed in in argument two and maybe there are some other parameters as well really not maybe I also have to wire up students to that mr. dirt so if I don't do this we can't see anything going on so that would be really dull I get to do multi cursors which is very exciting stood out Studdard okay so now I should have the ability to run an arbitrary commands let's check that out I can pass in I want you to echo hello Amsterdam and it it does echoing hello monster damn and I could do something like run a shell it's told us it's running a shell you can't really tell because what's going on there's some processes here this is our bash shell that I just started so now I just kind of containerize this command as I started and we're going to do that by creating some namespaces and I can do that by specifying here in this command structure that I can name spaces which I do is this problem should be to be structure and I'm passing it it's it's called clone flags because cloning is what creates the new process that we're going to run our arbitrary command in and as I do that I want to create a new we're going to start with the flag for unix time sharing system which sounds incredibly fancy and powerful and you know like it's going to be really significant but actually all there is in the unix time sharing system namespace is the hostname but this is going to let us have our own hostname inside the container so we can see its own it can't see what's happening on the host let's give that a try so run my shell again and if I do hostname it's inherited the host name from my my host machine will just check though the same but I can change it here check that that stuck yes I've changed it inside my container without affecting what's happening on the host machine so I've started to containerize my process so this is a star but it would be really nice if I could actually use this host name if I could set the hostname up before I swore my shell so that I could see it in the prompt because at the moment it's really hard to tell from the problem whether I'm in my container or not and you might start by looking at the the goal and sort of method for setting the hostname and you'd find set hostname and we have to send it in as a series mr. bite so we might think we could do something like that but we have a problem if I call that function here after I've completed this run method that doesn't complete until after I've exited the command that I want to run so I can't do it after and I can't really do it before anywhere up here because I haven't actually although I specify that I want my namespace here it's not until inside this run method that we actually clone this new process and actually get in there namespace so I can't do it before and I can't do it after so what I'm gonna do is I'm gonna have this process clone a new process with the new namespace and then I'm going to create another process in which I can have in which I'm going to run on my command so I'm going to duplicate this have one version called round and one version called child so the first one is going to create your namespace but rather than saying I want you to run the arbitrary command I want this to run this program again run itself and we can do that by running from self XE the command we want to run we'll have a look at that in a moment oops right and I want it to call itself but instead of having run as that first command I'm gonna pass in child now I have so when I come in here I'm going to see if I see child is that command I'm gonna call this other version of the function down here I have to do a bit of goes of syntactic eNOS I have to make this into a list of strings and then I've got arguments to okay so I'm gonna call it run runners gonna reinvest process but inside it's in your namespace second time in its going to be calling child and we don't need to create a namespace this time but we do want to set the host name because this pro this time around it should already be in the new namespace so let's give that a whirl so straight away we can see a couple of things we see the log happening twice because we've called it once inside runner and once inside child but more interestingly the bash prompt has picked up that host name so that's done what I wanted it to do this is pretty good and then we can more easily tell whether in the container or not which is pretty beneficial okay so if I now want to have PS return me just the processes inside my container so at the moment I haven't namespace process IDs and I'm still seeing the same kind of higher numbered processes that I can see if I do if I look at all of them we'd find all of those processes up here so like seven six I know whatever 76 to one example there is my bash response to that okay so I'm gonna see inside my container I want to see these starting from wow let's quit out of that and there's another one of these namespace facts cold new PID for process ID and I'm also gonna print out the process idea they were running s get process ID we'll do that inside run and we'll also do back inside the child that looks pretty promising right the first time we were running as some high numbered process when we Britain fine child which should be inside this new process ID no space we've got the ID one that's exactly what we wanted so hands up if you think I can do PS and it's gonna work yeah a few a few hands okay are you going to disappoint you it's still finding those highly numbered process IDs and the reason for that is because PS doesn't kind of magically get the process information directly it gets it from the slash proper directory remember a minute ago we looked at frosh proc self XE so such crock has information about all of the running processes we can see them there's a directory for each of the numbered processes I'm just going to go back to the host foot go back to the host for a minute if I do something like LS - ow for proc self XE and you can see that that is a link to my lab is running LS so it makes sense that if I do it again well you know that's always what it's gonna be but if I go to prop self we can see every time I get a new process ID and prop self changes as we start a new process and there's all sorts of interesting information inside slash prop for each of the processes we'll see a little bit more of that in a moment PS is looking prop directory so inside my container I need my own version of frog at the moment it seemed the same you know from inside the container it's seeing the same /proc and this is where should we come to I'm going to change the route of what the container can see so at the moment if we look at the root directory I have marked it with this routes the host file I also happen to have a copy of an ability file system and this has got route for container in it so I'm going to change the route for the container so that it sees this as its root directory in here we can do Charice and I want that vagrant ability FS directory to be my reads I'm also going to change the directory to read because why don't do this it's actually undefined where you end up when after you've done it should read I run my container again and now if I look in the root directory routes the container is mine it's in my root directory I can't see anything higher up than roots like that that is my really inside my container I've limited its view of your filesystem but I could do something interesting like so if I do sleep for a little while inside the container from the host we can find that sleep process seven eight four zero and if we look again in /proc four seven eight four zero this is all the information about that sleep process and there's a root directory let's have a look at what that tells us and that shows us that for that process the root is actually the file system that we just mounted now that's pretty much the equivalent of a container image when you specify the image it takes a copy of the file system that's packed up in that image on unpacks it into a somewhere on your host machine and Charice the container to see that new faster so we've kind of done the equivalent okay so back to you the the prop directory and back to EPS because if you remember we're trying to get PS to show us just the processes that are running inside the container and I did that to read so that I would have a prop directory inside the container and there's nothing in it and if I do PS it's gonna tell us what it is we need to do so it turns out that proc is it's a pseudo filesystem it's a mechanism for the kernel and the U and a user space to share information and at the moment slash crop inside my container in the charity filesystem has nothing in it and I need to mint that directory as a proxy need a file system so that the colonel knows I'm going to populate that with all the information about these running processes so in here I can mount that very important but I get these parameters in the right order as you can see all right and I'm gonna tidy up after myself by unmounting them when we finish absolutely it when we finish okay so I need to run that again and this time what do you reckon will it work hands off you think it work mixed confidence here it works it works look very gay crisis ID number one thank you right one other thing we did there was we mounted a prop and we can see that amount inside the container that's kind of what we would expect there are quite a lot of things mounted on my host so I'm just going to grab the things that relate to frog from my house I can see this one at the bottom which is the same one that we've got mounted inside the container we can tell that because it's it's inside that file system that I meant it now there is a namespace for mount it is called new NS and a school namespace apparently this was the first of the namespaces to be invented and added to the kernel and I guess at the time they didn't really think there would ever be a need for any other namespaces so they called it namespace but it's really for maps I also have to do something else here now by default under system D mounts get kind of this recursively shared property and at the moment my root directory on my host recursively shares between all namespaces any Mouse and I have to deliberately turn that off with an unsure I'm going to say unshared flags for the new namespace I'm probably doing that I can say I've got this new manage space in my container and I don't want you to share it with the hosts because by default it would have shared it back the best so if I do this again we can run on mount inside the container we can still see frog but on the house I no longer see that truck and that's how we can end up with we can avoid having like hundreds of containers cluttering our heart our main command with all sorts of information about mounts any fiber containers that we don't really care about from the host I could find out about them by looking in the prop directory though so I can do the same thing with sleep and find that process from the hosts perspective and if I look I think I need to look at 802 9 this time maybe yeah so I can still see the amount that this process is aware of from the hosts perspective but they don't go cluttering up my my main command ok so we have looked at the namespaces for the UNIX Tron shown system which is the host name for process IDs and formats and you can imagine how isolating things like your network interfaces so that your container only sees a specific set of network interfaces and user IDs and inter-process communications all working the same sort of way ok we also saw how it works and how that limits the container so can only see a subset of the file system that the host can see and there's one last sort of property of containers and that's control groups if no space is restricted what we can see from inside the container a control group limits the resources that we can use inside it and we do this it's configured using another one of these pseudo file system interfaces so it's another set of what looks like directories and files but we can manipulate them to set properties that we want the kernel to understand and the kernel will write information into the file system so that we can read it back out again and we could be talking about things like how much memory the container is allowed to be how much CPU how much Sherman IO band width it's allowed and also how many processes were allowed which we're gonna use as an example see wrong direction right before we do that let's have a look at how they have that file system looks and it's typically in sis FSC group I'm gonna change into that directory and we have a directory for each of the different types of control group that we can set up and let's use memory as an example so we're go in here and there's actually a large number of parameters that you can set related to memory and I could look at let's say memory limited by that because it's a very very very large number which is basically saying by default processes can use all the memory in the system also we've got this some docker directory which docker presumably set up I have a look in there and at the moment it's just another set of all the same parameters some of these are also sort of statistics being reported back into the user space and let's see what happens when we run a container so if I do exactly the same as I did before with ability and I run a shell [Music] and we've got a container that starts with the ID zero seven zero C and now we've also got a directory inside the control room structure starting with zero seven zero C so doctor has created a control group basically for this container but we didn't ask it for any particular restrictions and if we were to look at what's inside that memory limiting fight it's not restricted it's still a max if I don't even want to count the digits of that let's see what happens when we do constrain the memory so I can say memory is let's say 10 megabytes and this time it's F 4 9 7 is the identity we've got a control group F 497 and if I look in the limit this time that's basically 10 Meg so doctor wrote that number into that file and that's how it tells the kernel to limit that particular container to that amount of memory so we're gonna do the same kind of thing and we're going to do it for the number of processes actually let's just have a quick look so inside pit we can see things like the maximum number okay well I know how we can look at it look at it inside docker and there should be a pig's max yeah so by default when you create a dr. container there is no limits to the number of processes that can be spawned inside that process but I'm going to create a control group that does limit the number of processes and I've got a little bit of code that I prepared earlier to save a little bit of time just copy that and we're going to call that from here right so that is going to the C group directory inside backgrounds of hits inside that going to the control group that I'm creating cordless and I'm gonna write a fire I'm gonna write a limit of 20 so I'm saying inside my control group there can only be 20 processes and the other important thing I'm doing here is this last line here where I'm getting the current process ID with that get PID and I'm writing it into a file inside my control group called C Group frogs and that's adding my approach is saying this process is now a member of this control group and is subject to the same limits okay let's run that - okay from my host I should well I've got a list are actually and let's make sure we've got the maximum that we expect yeah so should be limited to 20 processes inside this secret and let's just check again using that trick with sleep I'm gonna find the ID of that process eight five six three and if we look in that secret profile eight five six three automatically got added in there because it's a child process of whatever presumably eight five four three which was the kind of first of these containerized processes inside my container I should only be able to create Trinity processes and no more so let's put that to the test with one of these who knows what that is it is a fault bomb right so what that how that works we define a function called : and in that function we call : we pipe the results into code on which we run in the background that's the definition of our function and then we invoke it so it should keep calling itself creating you know just for King processes for everyone but hopefully it will be limited by my control group just double check that I'm running inside the container okay okay so it's having a bit of a go at trying to fork a lot of processes and it's clearly not able to form as many processes as it would like from the host perspective well for a start I can type perfectly okay nothing bad seems to be happening in terms of the responsiveness of my house and this is my here's my go run main go here are some defunct processes we will never see more than 20 processes in there we can look at pigs currents which should tell us the current number of processes and there should be 20 of them we can look at if we look at see groups what we'll actually in sorry secret procs will actually see the parent processes we don't see that the full threads but if we look in tasks we should see them probably maybe we didn't get the dead ones anyway we can tell from the pins currents that there are no more than 20 being created and yet we can continue to use the computer rather than it haven't been completely destroyed by that fork bomb don't run a fault bomb on a machine unless you know is constrained by secret it's my advice to you okay so we have seen how a container is created from namespaces and limited by control groups we've seen how the file system works or how images work by pointing you at a subset of the file system on the host we've seen how to bring a machine down by using football don't do that if you want to check out the code you'll find it in that is rice containers from scratch repo are you definitely encourage doing that and playing about building your own container if you want to really kind of understand how these things work now those of you who had your hands up earlier that running containers do that again you're using trainers put your hand up and leave your hand up if you are doing vulnerability scanning on your containers yeah why if you feel you or not it's a very good idea to scan your container images for known exploits that my christiana is a free to use tool for scanning your images so I just wanted to give a quick shout out to that with that I really hope that you leave with a full of confidence that you understand what containers are thank you very much [Applause] and I think we have time for a few questions if they're yes please perchance start the first one what do you think about dr. rising database like positive what do I think about docker izing a database like Postgres why not and you I think as you can see there what you're doing when you run something in a container it's still a Linux process the thing that is different is where the storage is so if anything that's what state if you want that storage to persist you have to make sure that the the executable stuff is looking at a storage system that's persistent if you were just talking about one host you could just mount the you know Postgres or whatever into the container and that would be exactly the same as running Postgres on the host looking at a local you know file on disk the difference is that whenever we start talking about containers we're really talking about distributed systems we're talking about that storage being somewhere else probably in the cluster and we're saying I want to be able to create my container anywhere in that cluster and I wanted to point to that storage and then you know but mechanisms exist to do that so yeah there's nothing wrong with running Postgres inside a container it's the storage they're saying it's a good idea to run docker in docker do I think it's a good idea to run docker in Dhaka there are very few occasions where you really need to run docker in docker there are struggling to remember what they are but now one of the reasons why you might not want to do it is that as soon as you do that you're you're running with the privilege flag which means it's not just root it's rooted with full capabilities across the whole machine run it up for and opera if you need to to do something particular for your applications don't run topper in Dhaka just for the hell of it any other questions oh it doesn't I think it just came on as you left so thank you it's actually great but no nothing's been broken you'd ever make century I think it's very big achievement question if you have tools like some of them like Java before them or I don't know the tree if I'm not mistaken you know which are not contain you're aware where they're getting the information about the memory or CPU which has been granted to the container itself so for example you know like a Java you're doing like amount of memory basically shows that the whole machine is being available for the process why it's originally not you know just salted on the UNIX level just to restrict it to the process itself yeah I don't know enough about how Java works but I know there is the Java Virtual Machine I'm not a Java person so I'm definitely moving into the world of speculation here on your Linux kernel your Linux kernel is using that mechanism for doing things like limiting memory my assumption would be that the virtual machine the Java Virtual Machine would be grabbing a bunch of memory up front and then dividing that up in different ways that the kernel is no longer aware of but I'm pretty much speculating at that point anybody any Java experts want to tell us should I still run they still use the M strong daughter on bare metal should I still use VMs Toronto from bare metal and sort of depends on the application and sort of depends on what you're you know if you're running your own data center you might use the MS for for the ability to create and destroy the enemies really quickly and so sort of repurpose your machines pretty quickly you might run containers in VMs for isolation purposes so we've just seen that containers are you know they're Linux processes if I had two of those they're still sharing the same cat or all the containers within a virtual machine or sharing the same kernel so they're armed to say it's if there were a kernel exploits that allows you to escape it contain it well you know if they were a kernel exploit it's a shared kernel so there is potentially at risk that says if I can escape container I can see the other containers on that VM it's much harder to escape a VM safe or pretty specific security reasons you might say I want to run on my containers in separate the ends a really good application sort of that is if you doing kind of multi-tenancy so you have groups of people who you know you're running code on their behalf but they don't trust each other so putting each of them into their own VN makes the less it makes it more secure for each of those tenants any more questions Thank You Liz thank you very much [Applause]

Info

Channel: GOTO Conferences

Views: 138,199

Rating: undefined out of 5

Keywords: GOTO, GOTOcon, GOTO Conference, GOTO (Software Conference), Videos for Developers, Computer Science, GOTOams, GOTO Amsterdam, Liz Rice, Aqua Security, Containers, security, DevOps

Id: 8fi7uSYlOdc

Channel Id: undefined

Length: 42min 53sec (2573 seconds)

Published: Fri Jun 29 2018