Beginners Guide to Containers Technology and How it Actually Works

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay I think we'll start thank you everybody for coming my name is James Bottomly I'm CTO of server virtualization at a company that was called parallels but has now been rebranded to Odin the marketing department always gets very disappointed if I ask everybody in the audience who's if they've heard of Odin and nobody sticks their hands up so instead of subjecting myself to that embarrassment I'm just going to tell you who we are so parallels was originally the company that did Desktop for Mac and Linux containers it actually began life as SW soft which was the Linux container company I do actually have a history here so parallels is in fact the oldest container and company in the world as SW softer released virtuoso containers in 1999 in 2005 we had an open-source version of that we call open VZ in 2006 after the publication of the openvz source code the Linux process containers were actually put into the kernel then not related directly to the bean counters in openvz but they were something that a group of people headed by paul minaj looked at what openvz did and decided it was very useful functionality so went into the kernel kernel firstly as processed containers and then later as c groups on top of c groups in 2008 LX c version 0.1 was released so LX c is actually a container manipulation program you know it today as a sort of fully fledged functional container manipulation system that canonical is now actually basing or bunty's basing lxd on but back in the 2008 and actually all the way up to about 2011 and 12 it wasn't terribly functional every time you brought up an LX c container there were easy ways of breaking out of it one of the things that we at virtuoso had actually done a long time ago was to make containers that were secure because I Breton bought a business when we released virtue in virtuoso in 1999 was actually allowing service providers those the people you go and buy infrastructure as a service from for say ten dollars a month they give you root access a lot of those service providers today in the world below give you root access they're actually giving you root access to a container and that wasn't possible with Alexi for an enormous Li long time in 2011 is actually when we began working on the kernel container API and I'll get onto that in a little while so just for the marketing hype I'll remind you that parallels is now Odin so I've actually work at parallels as a container evangelist I'm also an open source advocate so I've been working for a long time on the business of converting businesses to open source and I'm actually a kernel developer as well I still maintain the scuzzy subsystem so the reason I wasn't actually present at the first three days of OpenStack is because someone very helpfully put the kernel summit directly over the OpenStack summit for Monday Tuesday and Wednesday so I had the immense pleasure of flying out of Korea very late last night landing here at about 1:00 in the morning when it was raining by the way so thank you whoever was looking after the weather so what you're here to learn about is container basics now it turned out the one I wrote the abstract I can think I'd only have 35 minutes plus five minutes for questions to give you all of this so I'm actually going to skip some of the history bits that I assume a lot of people who do container stuff I've already told you and in fact you've been to a few of my other talks I already give it as well and we're going to skip straight to the Watkyn containers due today what are they what do they actually do so to throw you all in the deep end oh by the way one of the things that I'm going to explain to you is the incredible painfulness of the Linux container api the things you actually have to do to make his work on Linux from the command-line are awful and it's actually going to be and I've had to deal with this pain for a long time sort of working in the container company so part of this talk is my pleasure to share that pain with you so if any of you are marketing people now would be probably a good time to leave before your heads explode the head exploding bit won't get work we won't come on to that until the demo which will be much later in this so I'm going to give you an overview but before I actually sort of introduce you to the rack I'm going to show you what it looks like so the main difference between containers and hypervisors is that hypervisor emulation is based on emulating hardware so you take a software machine you bring up a machine monitor that emulates virtual hardware and you bring up another operating system on top of that virtual hardware for those of you who work in the enterprise which I believe is most people at OpenStack you think this was the only way of doing virtualization and have done since VMware came on the scene many many years ago those of you who are older than most of you look may possibly remember mainframes before that and it turns out that a lot of mainframe technologies specifically IBM had containerization features just because Hardware emulation was too difficult to do in those days because mainframes were phenomenally complicated things so per vision I'm giving you is the one that I think appeals to most of the age demographic in the room which is you've never heard of containers up until they suddenly got incredibly popular last year or just about the year before now so what containers are about is there about virtualizing the subsystems of the operating system itself so instead of working out how I do a hardware description to bring up a completely new operating system we work out how we take each of the services the operating system provides you know networking file system so on and provide them in a way that's fully virtualized will so in effect I can bring up different copies of exactly the same operating system but based on the same colonel so the true difference between all of these is that containers only have a single kernel running underneath them hypervisors always have multiple kernels because there's always a kernel running inside the hypervisor itself even though VMO would tell you this is not true it is there's always some sort of operating system running in the host and then you boot up an entire new operating system including a new kernel in the guest to do all of the features that you want this sort of virtualized operating systems that you're bringing up to do this difference between single and multiple kernels is actually one of the reasons why container technology is embraced in the service provider space but not embraced in the enterprise space so if you think about the problems the enterprise was thinking of back in the very early days it was sort of things like dev tests but it was also heterogeneous environments in those days so if I'm sharing a single kernel I cannot bring up two instances of an operating system that do not share the same kernel now back in the early days when VMware was around this was Windows and a tiny bit of Linux and it is impossible for Windows and Linux to share the same kernel so with containers you could never bring up Windows and Linux on the same box this is why the enterprise really didn't like it this it back in 1998 was a huge problem this is what killed container technology for the enterprise service providers embraced it just because their problem wasn't really bringing up Windows and Linux it was we have a large modulus group of machines and they all run the same operating system so to us it doesn't really matter we actually provided Windows containers as well and they were perfectly happy with one set of machines installed with Windows to run Windows containers one set of machines installed with Linux to run Linux containers so this inability to bring up different operating systems was seen as an Achilles heel of containers back in 1999 and when I say different operating systems if your operating system is good enough like Linux I can still bring up different what you think of as operating system so I can still bring up Rell and Santos and slash all on the same kernel because it's the same Linux kernel underneath it so all it cares about in this operating distinction is that the kernel that I'm running that I'm sharing will actually support the operating system and to be honest in the early days this was an Achilles heel of Windows as well the reason it works well for Linux is usually because the kernel ABI is such a fixed and strong thing that any modern kernel can run almost any older operating system that was released before it so we have on Linux almost no compatibility problems bringing something like I don't know rel five up on a three point ten kernel it can easily be done just because the ABI still supports it on Windows the situation is very different because there's a lot of interplay between the actual user space and the kernel space of Windows so there's a lot of sort of swapping that goes on and it means that it's impossible even for the next generation of Windows say going from Windows 2002 to 2003 you can't actually bring up containers if you go to Windows 2002 system you cannot bring up a Windows 2003 container usually because there's something wrong in the kernel and it breaks so main difference is single kernel for containers with virtualization sitting in that kernel to support multiple operating system functions being brought up with it hypervisors multiple kernels and obviously one of the immediate VAT advantages to you in the enterprise of running a single kernel is that it solves a lot of the patching problem for virtual machines because the virtual machine image of a container system does not actually contain a kernel which means that you don't have to patch any of the vulnerabilities in that kernel or do anything else it also means that if I use something like Kay patch or any of the other patching technologies like Kay splice I can apply a patch once to that entire kernel and immediately all of the guest benefit because they're sharing that kernel it's actually a fairly significant advantage to service providers and it means that some of the problems in the enterprise of image drifting and image teaching don't exist in containerized systems but that's not the main thing that containers give you one of the really big things they give you is elasticity this was also why they were more important to the service provider space than the enterprise space so in the early days of enterprise hardware budgets came out of your ears went above and threw about ten storeys beyond they did not have a problem with buying more hardware to do more stuff in fact in the early days of the enterprise the reason virtualization came along is the CIO is actually struggling to find stuff to do with the hardware so adding virtual machines was actually just something to do with the hardware and then provide extra services based on top of that with container technology the advantage of service providers is that when you squeeze these sort of containers down into very small constrained systems the performance under that squeezing is far better with containers than it is with hypervisors and it's partly to do with the size so I'll show you some diagram for that later on but it's also to do with the fact there is only one kernel when you start to put Linux or any hypervisor system under extreme resource pressure the host starts trying to steal pages of memory from the gas this is a standard thing that happens when you do that with a hypervisor it turns out to be an unstable system because both the kernel and the guest and the kernel on the host are trying to do reclaim to solve the resource starvation problem and this the way they do it tends to cause them to fight with each other over it and the result is they tend to debt they they don't quite deadlock the system but they make it bog down and go much more slowly than it should with containers because it's a single kernel and one kernel is used to being put under resource pressure when you actually put it under resource pressure it just does all the things that a kernel naturally does and it solves to the best of its ability in that single kernel that resource pressure is or starvation and this makes containers the reason that lean and the last elastic is because under this resource pressure they can still deliver performance that was required from service providers so this manifests itself to the service providers is this wonderful thing called and city and in the early days effectively because of this resource constraint problem we could pack onto a single physical system three times as many operating system containers as we could hypervisor systems and the objective of a service provider because they're only selling you this box for ten dollars a month for the login the more logins they can pack on one box is the difference between profit and loss and this is why container technology was essential in the service provider space so if we look at it this is showing roughly I can't really use the pointer this is showing roughly what might be able to use a mouse on here this is showing roughly what I said there's a hypervisor kernel here if you can see it and there's another kernel and the guest here if you compare it to containers there is only a single shared kernel in the system so this is the actual operating system container coming up from in it all the way on up and then obviously the new use case is application containers so there's the application container sitting on top of that one of the advantages for application containers is not only can they share the same kernel they can also share the same versions of all the operating system subsystems in it and libraries as well so this is sort of the docker and our KT rocket use case and obviously just in terms of which looks better from which stack is less fat it's this one it's the container stack there are just fewer boxes in the stack so just from our how much did I have to put in to get this to work containers aleena and in fact hypervisors this is sort of a hypervisor images typically gigabytes a container image on the other hand especially if it's an application container image can be of the tiny order of megabytes so this is why container technology was very attractive in the first instance because the lightness just made them denser and far more elastic but it was far more than that having a single kernel manage all of the resources solve all of the problems that hypervisors had because it's effectively to two kernels not even trying to cooperate just communicating with each other over a hardware interface it works if you have enough resources but when you put put a hypervisor under resource pressure it suddenly falls down a lot faster than it does with containers the other thing that's really useful about containers is their scaling properties you're probably all seen with hypervisors like to take memory away from a hypervisor you need to do memory ballooning so you inflate this balloon glow cooperatively in the guest and then pull memory pages out of that balloon into the host the very act of inflating the balloon tends to annoy the guest especially if it's under pressure itself with containers the mechanisms for actually controlling resources the SI group things the knobs already exist in the kernel so I can take memory away from a container just by writing a couple of variables to a single file in the group system so it's as easy as that I don't need to inflate a balloon I don't need to do anything else that container will respond within milliseconds to microseconds depending on how fast your machine is and this can be done with any resource that it's controlled by the SI groups it can be done with CPUs it can be done with memory there's no I because the operating system that's running on a container doesn't actually see physical CPUs or physical memory I can do it effectively through the kernel subsystems themselves and kernels are very used to doing this so containers have much much better scaling properties and hypervisors if I built a very strong very powerful very scalable system it's actually much easier to do it with containers than it is to do it with hypervisors and obviously the shared kernel system as long as the kernel is a good one it makes container resource decisions much more efficient than hypervisors because there's no two kernels fighting each other over the resource decision there's a single kernel arbitrating all of the resources in the entire system and it sees everything at the correct granularity so a hypervisor only really sees when it wants to control memory resources it sees pages and it doesn't even see what the LRU listed the guest kernel is which page is going to be reclaimed next so under memory pressure just takes a random page out of the guest that happened to be at the bottom of the LRU list instead of at the top then the guest going to evict this page and then it's going to come to the one that the hypervisor just took away and the hypervisor is going to go through some horrible thing where it tries to swap that page back in to the guest again whereas the Colonel that's arbitrating containers not only sees all of the pages that everybody's using it itself controls the LRU list because the container operating system only starts it in there'd it has no kernel piece and it also sees all of the objects and how they're being used inside every container so it has much fuller information when it comes to making resource decisions and this is why the resource decisions tend to be made much more efficiently inside containers so now we come on to the pain bit the Linux container API I really like this so containers and Linux are controlled by things called C groups and namespaces realistically that's all you need to know to control the kernel container system and for reasons that are best buried in history the control planes for C groups and the control planes for namespaces are radically different from each other it just so happens that you wouldn't believe this but open-source development is a lot about personalities and one personality developed the namespace isolation and a different personality developed C groups and they could not agree on the API so we are stuck inside the kernel with two separate api's this is part of the pain so don't worry I'll be showing you all of the differences later on well if I have time which hopefully I will there a point about this kernel thing is that everything that claims to be a container system like our open visa head like docker or like LXE is all orchestrating this C group and namespace system one of the key things about this is that the kernel API is the same for everything so LXE openvz docker any other form oh I mean I could have put up a dozen different container systems on top of this they all at a base level talked to exactly the same kernel API there is Method in this because it came from an agreement at the kernel summit in 2011 which was actually driven by us and it's sort of it's one of the useful success stories and open source so I went to the kernel summit I just recently joined what was then parallel as now Odin my job was to get virtuoso openvz upstream into lummox and instead of just trying to push it upstream by brute force and effectively forcing it as a parallel sub system into the kernel which had been done before this is what happened with KVM and Zen Zen was forced into the kernel as a completely separate sub system from KVM so even today when you choose a hypervisor Zen talks to a different kernel api from KVM and we have two completely separate systems to support that instead of doing that for containers we came to an agreement at the kernel summit with all of the in kernel which was LX ec groups and out of kernel which was actually virtuoso on one hand and the Google container technology on the other and we came to an agreement that we would actually merge all of our implementations into a single ABI and what this effectively meant is we had to have a show and tell of what do we have what do we do that you don't so we'll just shove that straight into the kernel what do you do that we don't and then for the things that we all do separately who does it the best so that's the API will adopt so nobody came out of this with the complete agreement they want because obviously what I wanted is that everybody would just agree to use the openvz ABI that didn't happen so what we got is this hybrid of C groups and namespaces and our namespaces exist and openvz so they were a fairly easy port C groups have a parallel on-site openvz which is called bean counters so effectively we are back we our agreement was that we would abandon the bean count as an open visa fully adopt C groups and as part of that adoption we would add all the missing pieces that are causing resource performance problems in LX c and additionally we had additional security things and namespaces and so on so as part of this Agreement the entire kernel got a kernel level enhancement of the container system that actually gave us the ability to bring out fully secure fully isolating fully resource controlled containers in 2015 that program is almost entirely complete apart from one particular pea which is met kernel memory accounting for the entries and inodes I can see people's heads already begin to spin so this is a very esoteric area of the kernel but if you're using containers for service providers the reason you want memory and these things controlled is because the kernel will just give you as many entries and inodes controller control by file once you open a file usually you get one I know a bunch of den trees there is a trick you can do on a service provider where you make a directory change into it make another directory change into it and just keep on doing that recursively and what that does is it runs the entire kernel out of I nodes and entries so if this is not controlled anybody buying root of the service provider can immediately run the entire kernel out of these resources which mean that everybody else's containers on that system crash which is why it's important to us to get this fixed but that is the only piece that is missing for the rest security and isolation is already upstream in Linux from about the I think the the kernel which has almost all of the safety features I you could say it's 3:12 but you'd certainly make a safe bet that it's 3/16 and for the enterprise kernels which is you know based on 310 they've back ported most of these so a rel enterprise kernel by the way does not really look like 310 it looks mostly like 400 in today's world but we organized all of these interests to confirm converge on a single unified upstream ABI partly because it was really the only way of getting everybody to agree to do this because open source is about cooperation but also because in my opinion strategically the balkanization of the virtualization subsystems set us back in Linux by several years and allowed VMware effectively to conquer the enterprise so we spent so much time fighting over what the hypervisor ABI would be before finally putting both in that we lost an awful lot of time and actually improving hypervisor subsystems and Linux and we would determine the same thing wouldn't happen for containers so in that sense it was very successful slightly less successful is I hope that by leading this effort our name would be up in lights as the people who did this and unfortunately if you ask most the people in the room who the container people in exile it'll say docker and they would say parallels and Odin so that was a slight failure and of course it led directly to the ability of docker to run upstream containers in fact docker for a long time had no kernel team I believe they're just spinning one up primarily because they relied on the kernel enhancements that not just us I mean it wasn't just us it was us plus Google Plus a lot of guys in all of the distributions canonical bun to su se and Red Hat so if we look at these C groups what a C group does is it controls resources within the kernel so there's a C group for controlling IO this is how you actually do io partitioning between containers there's one controlling the amount of CPU you use this is how you partition the work in amongst containers there's one for devices which is used mostly by containers that want to bring up hot plug things like USB there's a really important one for controlling memory which is how we make sure that when one container exceeds its memory allocation it asks for more memory it doesn't take it from another container unless we're trying to unless we've actually authorized it it actually gets tipped over into a swap situation and it starts to bog down so this memory C group is really really important and there's another one for networking to make sure the network packet bandwidth and everything is also per container and then there's this weird thing called a freezer which is was mostly used for suspend and resume realistically all it is is a C group where you put a bunch of processes in it and then you put them all to sleep simultaneously without having to worry about all the resource dependencies between them or if you're thinking if you've got a producer and a consumer running in separate processors the problem is how do you put them to sleep because the producer would notice the consumer has gone away and the consumer would notice the producer has gone away if they're really tightly coupled so the freezer was the way we did that so we put both these processes into the freezer and then we suspend both of them at the same time instantly and neither one has a chance to see that they lost the other and then on the other side there are these things called namespaces namespaces are a pure isolation layer inside the kernel most of the time resources only belong to a single namespace within the kernel so we have a namespace for networking which means that we can take Network Devices and place them into separate namespaces within the kernel one network device can only belong to one namespace within the kernel we have the IPC namespace which is because system five IPC has to be virtualized as well because you can't see the message queue from container Y and containers aired without an information leak so there has to be a separate IPC message queue for instance for each container so this subsystem had to be virtualized there is a mount namespace because the file system tree of each container should be different it doesn't have to be but it should be so this allows us to actually put a separate root filesystem into each and every container if we wish to or by using the wonderful Linux thing called bind mount we can actually move portions of the root filesystem also into containers using exactly the same piece of technology there's a PID namespace this is primarily to satisfy init systems if you're bringing up an operating system container if you go back to the old diagram I showed and it has to run inside that container it's amazing how annoyed annoyed the Anette process gets if it's not running as PID one so we had to virtualize the process subsystem and linux so that PID one was available for all of these separate little inlets to run inside operating system containers it the PID namespace doesn't just do that so it was designed because that was a problem but now it actually isolates the process tree of each container from the other so if you're running in separate PID namespaces you can't do PS in one container and see all of the processes in another container which is also important for resource separation between containers because to do otherwise could possibly cause an information leak there's a UTS namespace based on nothing more simple in the fact that each container needs a separate host name and for reasons best known to UNIX people hostname is set by a system call also it was a system call that had to be virtualized and then there's a user name space which is really really important because this is the way that we actually pretend to be route inside a container without actually being the real root of the system so early on with Alexi one of the real problems was that if you ran route inside an Alexi container that version of route could actually break out of the container and become route in the host which is obviously has devastating consequences if you can actually achieve this and the fact that early Alexi containers were leaky meant that it was factually fairly easy to do this breakout and so the user name space was invented primarily to allow you to bring up fully unprivileged containers this the the user name space is effectively the way we do security in virtuoso today we have a slightly different system but what we use was merged into the user name space and so it's all available to us today so for those of you who are falling asleep if you think this was the pain we're just coming to it because this API is completely toxic it is very difficult to use but because I'm Who I am I thought I'd give an example to you of exactly how bad it is so I said there are namespaces so on my own system this is a should this is an open SUSE system running the 316 kernel so I've got most of the namespaces and C groups that run here if you what ah let's see I actually I think this is about as big as sorry this thing will go however see if we can bring up again named terminal sorry this is probably a system D problem oh no it's because I'm running rude there isn't it which is the font one in here it's what okay is this more visible so here are all the namespaces that are running in the system namespaces are represented basically as inode so all of these numbers here represent the effective eye notes that the namespaces belong to these are the six namespaces that I actually have in the system and a simple demonstration I can actually enter into it so if you look at my ID here I'm myself on this computer I'm running as not the root user but my own user ID on this computer one of the interesting demonstrations you can do is I can just enter a new user name space using unshare the - our option means do all of the PID and GID mappings that make me route sorry and there's me as root so using a fully unprivileged non su ID executable I can become route just by taking advantage of the properties of user name spaces on the system now if I do an ID you think I'm rude entirely in this route in this user name space if I do an LS - L on proc self.namespace you can see that the number of and this is where it becomes really painful if you look at the user name space it's just one up from the bottom that number will change from the horrible eight digit number it is to another horrible eight digit number it is but I think if you look at these two you see all of the other namespaces is the same and I have a different namespace for the user name space so all I've done is enter a user name space I'm sharing everything else I'm not namespace separated on everything else so if I do PS it's arias is obviously falling off the bottom of the terminal see if I can bring it up if I do PS on this I can see all of the processes running in the system because I haven't been to the PID namespace however one of the interesting things if I do an LS - L on my home directory this is no longer owned by me this is all owned by the root group so if I do an LS on something the root would have ordinarily owned in the system like let's have a look at the shadow past shadow passwords for instance they're now owned by nobody now in theory root can open a file owned by nobody but my fake root inside the container cannot because all that the that the user name space is done has remapped my UID this is actually controlled by the mapping files are called a UID map and PID map that's not LS them let's count them so I'll show you what they contain so I can actually what sir they're mapped to the the ID for nobody in my system so we've actually done a numerical UID mapping for this and that you are a numerical UID mapping is actually in that inside the proc self map because if you look the way proc self map is constructed it has the ID that I am you just can't do this kind of okay first number is the ID I'm mapping to which is 0 the root one second number is the ID I'm mapping from which is my I'm a user ID 1000 on my laptop and the third one is the range so it's how many IDs go up here so I'm only mapping one idea mapping myself to route if I had 10 their ID map ID 0 through 10 to i ID 1000 through 1010 to ID 0 through 10 so it allows for range mapping inside it and these files can be written to separately as well so I can actually echo interest so if I want someone to be bin let's say it's going to be u 1001 so I'm going to map IDs 2 & 3 thank you musical accompaniment is always good I sorry I need to be route and I'm not real route let me get real route back actually just pretend I've done this it's I've got 7 minutes left and I'm not going to give you time for questions unless I do this so have your heads exploded yet because this can get an awful lot worse so I'm afraid you're probably going to have to put up with something rather small for this because for my next trick what I'd actually like to do is show you how C groups work so I can show you the interface from here which is the big very big terminal but I'm actually going to need the root terminal to do stuff so you look at what my PID is heretofore 196 so if I go to at CSIS sorry this FSC group this is where all of the C groups in Linux live so if I do an LS - L you see they look to be about 12 of them but in reality several of them the ABI for C groups has been over the years so all of the linking in this directory is actually the ABI changing each of these C groups are separately mountable in Linux so if I do a mount system call you'll see or each of these C groups separately listed I'm going to be a wimp because if you look inside any of these C groups you see all of the control systems for these C groups the only what the simplest control system is actually in the freezer C group all it really has so so I even I'm going to wimp out here and show you a really simple C group rather than a really complex one now the way C groups work if you changed write change directory and actually let me so I've just exited the man name space and actually I am going to become route because I can't do it without being route on this so the way you control C groups is actually via file system calls mainly make directories so right at the moment if I do a cat on tasks this I'm standing in the route C group of the freezer every process in my system is actually a member of the route C group for every C group so in order to actually move processes into different C groups I first of all have to create them so if I actually do a make directory for a test C group this is now a sub sub C group from the route C group there are no tasks in this at all so it's completely empty and it also has if you look additional files which is freezer state so the freezer state is what's actually controllable in the C group so this C group is currently thought I'll actually going to bring back this it's actually rude but it doesn't matter so this is a shell that has process ID 2 1 1 5 4 so I can move this process into the C group simply by echoing it to tasks so this process is now inside the C group there's nothing happening in that terminal at the moment that's because the secret if I cat it back you'll see it inside tasks and now what I can do is I can actually freeze the C group by echoing it to freezer state this by the way is why these things are so horrible if I go here you can see I'm pressing enter on the keyboard this process is now frozen any set of tasks I put into this freezer would now be frozen and I can actually get all my key presses back simply by unfine if you do a PS on this you can actually see that this process looks like it's running so it's stuck in disk weight and the reason for that is because so it's not it doesn't it doesn't look like if I've done a kill stop you would have seen the stop signal there it doesn't look like necessarily like a stop process so it's very difficult to tell from PS whether the process has actually been put inside a freezer or not most of the time they would actually account this to disk wait so it depends on what tools you're actually running but remember the way I'd usually be using this especially if it's in a container is I'd have done a separate PID namespace so I can actually render this process invisible to all the system tools that are actually doing that accounting because accounting is also broken up into namespaces and things I was also going to do a fairly clever demonstration with the network namespace but I think I'm running out of time so for everybody the IP net namespace command is actually the one where you have the most useful thing because the network namespace actually allows you to build multiple networking systems on the same box that you're actually running so it gives you a way of actually playing with networking protocols without actually having to be multiple separate systems and so I was actually going to do a VF demonstration but I'm afraid I've run out of time so we'll just go back to how I was going to put that for the demo other necessary tools which was to match hypervisors we also need migration so we have a migration project in the interest of time I'll just skip over that because it's it's basically a way of migrating groups of processes and it matches what vMotion does for hypervisors and I'll get on with the conclusions which are that thanks to a lot of upstream work containers are here to stay the native container control plane as I hoped I showed you is excruciatingly Lea complex and remember I walked you in that demo through two of the most simple namespaces and cgroups on the planet there are there are much more complex namespaces and C groups that you could have played with but nobody's head exploded in this room so the chances are that this isn't actually as bad as most people think it is it's just slightly excruciatingly painful instead of incredibly excruciating ly painful but that's also not an excuse for not using them so what I'd like you to do now you understand the basics the only things you really need to know to manipulate all of this is where to find the C group into phrase under sis is sis FSC group how to use unshare and there's actually another thing called Anna center that I didn't show you that allows you to enter namespaces and effectively with those two three three things you can manipulate any namespace you like you're actually going to find us much more complicated than you think and if you want to manipulate only the IP namespace than the IP IP tools to the IP route to the IP set NS command is actually a very very convenient way of doing it it actually has its own really convenient control place for doing this if that's what you want to play with so it's not as bad as I've been making out but it is still pretty horrible so with that I just like to say this presentation is done by using impress with buttocks oppa it's all wit written in html5 and css3 which makes me a web developer rather than the container developer and I'd like to say thank you and entertain questions so does anyone have questions about the horribleness of ok back so the question is what is the benefit of running a container in a VM and the answer is there is no benefit the reason why people commonly claim you have to run a container inside a virtual machine is to benefit from the security properties of a virtual machine but as I've just demonstrated we can actually run containers natively when properly set up in Linux so that the security is all present so it's perfectly possible in Linux with a recent kernel and a good orchestration system to actually run containers that are fully secure and fully isolating but remember that the poster child for containers is darker docker began in the 3.8 kernel that kernel did not have a lot of the container security features so in order to get some of the earlier systems to work there was no choice but to run them in hypervisors but nowadays there is a choice you do not have to run containers and hypervisors so for service providers for instance we run a system where we actually use nested containers to run docker the only reason we use nested containers is because the current Vogons service providers is actually providing the end customer with access to the full dock and control plane rather than the service provider so if the service provider controls darker they can deploy to all of the containers the user applications but if the user wants to control darker you actually have to virtualize that docker so you run it inside two separate containers so that answer the question reasonably well okay how much longer do I have guys who timekeepers ok next question then so the question if I understand it right is about effectively same page merging inside containers can they do that the answer is yes it is perfectly possible to do this so in our commercial virtuoso product we actually have a same page merging algorithm that goes and finds page pages that look like they belong to different devices in the page cache and merges of them into one using knowledge that's actually obtained from the execution pattern of these containers so this is one of the things that's actually been plaguing docker for a while because as you bring up the cascaded namespace and docker you bring it up with a different instance and a different device it actually sprays copies of the same page throughout the page cache and we have technology that can actually bring them back together again but that technology for us is actually an add-on you'd pay for in virtuoso but other people are working on the same technology and open source we actually have a project with clear containers where we're looking to unify the reclaim techniques that KSM uses along with Dax which is used to control file back to memory and actually anonymous memory communicating the LRU list between the guest kernel and the host kernel which would actually give us a para virtual memory interface that would solve a lot of the hypervisor performance problems that I alluded to earlier so there's an awful lot of work going on in this field ok so the question is do I think virtual machines will be around and extinct by a container so working for a container company you might think have a slight bias in giving this answer so I will try to be neutral if that question were asked to me a year ago I would have said yes it's perfectly possible containers with operating system containers can do almost everything a hypervisor can do the only thing they can't do the use case of completely separate operating systems brought up on the same machine as going away because as we know everything in the cloud is becoming homogenous but if I look at what's happening today there are certain companies that are very well-funded that don't have contain a technology and a feeling burned and left behind by the Container bandwagon so they have a vested interest in pushing hypervisors to match the container use cases and obviously I'm thinking of VMware with their secret containers project Intel is doing the same thing with clear containers that's effectively hypervisor technology that's being pushed towards container elasticity and Microsoft is doing the same with hyper-v and nano so there are a lot of that vested interests trying to make sure that containers actually have the ability to do all of this sorry I think we're almost completely out of time so I think that there will be a use case for hypervisors that can match operating system containers if I want to make a bet nowadays I tell you that probably the operating system container business will be subsumed by hypervisors just because they lot easier to play with they have a lot of advantages and if they match containers on density why wouldn't you do it but because containers are a virtualization of the operating system there's a lot more use cases we can use those individual virtualizations for that hypervisors cannot match so for instance if I look at Apache it has a thread pool that thread pool is the source of most of Apaches exploits because you can break out of that thread pool using a sequel injection attack or a CGI attack so one of the things that we can do is actually make sure that every one of those threads runs inside a container so Pachi is actually managing containers for its thread pool instead of threads and that means that if you do a breakout in that Apache thread you cannot actually get out of the so it allows us to do much more security isolation and past systems that's a use case for container technology that still cannot be matched by hypervisors so I think both will exist there will always be use cases so you've got two minutes to get to your next talk so I think I should probably say thank you very much there

Info

Channel: Open Infrastructure Foundation

Views: 25,480

Rating: undefined out of 5

Keywords: OpenInfra, Open Infrastrucure, Open Source, Containers

Id: YsYzMPptB-k

Channel Id: undefined

Length: 48min 58sec (2938 seconds)

Published: Thu Oct 29 2015