Understanding the Difference Between Virtualization and Containers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone my name is Adrian Otto and I've been with OpenStack since the beginning I currently serve as the project team lead for the magnum project and today I'm going to talk to you about understanding the fundamental differences between containers and virtual machines and the thing that I want you to remember about this when this session is over is that something special happened in 2013 a very simple idea happened that changed everything before I get to that idea I'm going to tell you about another one in 1873 toothpaste looked like this it came in a essentially in a jar and he it was in a powder form he would shake it out onto your toothbrush and it wasn't until 23 years later than in 1896 toothpaste came in a tube for the very first time now the idea actually came from the observation of extruding paint from a similar tube and the individual who observed this thought well look if if we could administer paint from a tube we could do the same thing with Colgate all right we could do the same thing with toothpaste and I just think that would be better and it turned out they were right it was better in fact it was so good that Colgate by 1908 made it their marketing slogan that they couldn't improve the product so they improved the tube in 1962 Colgate opened a facility called the Colgate Research Center and this is a place where they would work on making better formulas of toothpaste making different products trying them out seeing what works good see what works best and in 1978 somebody who worked in the Colgate Research Center as a lab assistant came up with a really interesting idea she said I have an idea that if you do this which will cost basically nothing and pretty much no time in order to accomplish will double the sales of the product and she approached the management with this idea and said why don't you allow me to try this idea will change Colgate our sales will double and you'll give me one percent for a year and they hesitated at first but then they thought well if nothing happens to sales were out nothing so why not and so they entered a contract with each other and once it was signed they they put a piece of paper in front of her and said okay document the idea what is it she says it's six words make the opening twice as big and that's exactly what they did they made the opening twice as big and as predicted they started sewing twice as much of the product and that story would be good if it were actually true but it's not that's a fabrication partially a fabrication for the purpose of highlighting the point that very simple ideas can be extremely powerful now there is actually some science relating to toothpaste that is true that I would love to talk about more there's something called the toothpaste tube theory and it means essentially you can't keep squeezing a tube beyond a certain point to get any additional benefit it's also used to describe human behavior in legal negotiations and there's also a second meaning to the toothpaste tube theory that says there's diminishing returns after a certain point so if you have a bounded system where you have pressure continuing to increase and increase and increase eventually something that's going to go kaboom that's the idea behind the toothpaste tube theory but the idea that you really want to hear about is something relating to containers which I'll get into in just a moment now y'all came here under the promise that I was going to explain the difference between something you understand very well and something that you understand a little bit less this difference breaks down into three main categories containers can be more efficient they can perform better and have different security characteristics than virtual machines I'm going to spend each of these in detail but before I do I will remind you that virtualization technology is nothing new it's been around for longer than I've been on this earth and it became commercially supported as open-source software back in 2003 where's n source became the first open source hypervisor and has evolved since KVM came around about 10 years ago as part of the mainline kernel in version 2.6 dot 20 at a linux kernel and there have been many others containers have also been around almost as long as me something resembling containers showed up in linux or excuse me in unix back in 1979 with the introduction of assist call that allowed you to create new routes we call these seed route sort routes depending on your pronunciation and bsd added these in 1982 there were more container like structures that showed up in FreeBSD around 17 years ago things called FreeBSD jails this was in the spirit of what we currently conceived as a container and there were a bunch of others around 2013 google had something called LMC tfy let me contain that for you an effort that was later eclipsed by something even more compelling so in 2013 a software called docker was born as an open source project it was introduced by a an organization called dot cloud that was in the platform as a service business before entering the software business there were innovations numerous innovations since then but 2013 is where something really interesting happened this is that idea that I've been alluding to it is the concept of a docker image or a container image now the image is the thing that was missing from that entire history that I just covered okay all of the container innovations that happen before 2013 did not include a way to encapsulate all of an applications requirements and dependencies in a portable lightweight bundle before then what you had to do is create a disk image okay and install what appeared to be an operating system file system which then you would attach the container instrument on top of and those things weren't very portable they tended to be very large which made the creation of containers relatively slow until this innovation so when I talk about docker I am NOT talking about the company docker Inc I'm talking about the open source software formerly known as docker engine now known as docker Community Edition so let's explore what containers are made of so the first thing that all modern containers in Linux have in common is a concept called a C group a c group is a feature in the kernel itself and although I'm highlighting this as a Linux C group there is an equivalent for this in the in the Microsoft operating system as well which was recently added as a version 10 of their operating system this feature allows you to group a set of processes that run as a related unit and that group of running processes can be controlled with respect to how much of the hosts they are allowed to consume in terms of memory in terms of CPU utilization in terms of how much IO they do both over the network and to disks and to other devices and C groups may be nested meaning that a C group can be a parent of another C group this concept is important and I'll explain why a little bit later the second feature that all modern container systems have in common is the idea of a namespace so a namespace is another kernel feature that allows a restricted view of the system so instead of showing you every aspect of a running system it shows you a more narrow perception of the system so you get the illusion that you're running on a system that has fewer interfaces right if you are just starting up a Linux box and logging in for the first time no containers exist on it right you're in the the route you're going to see all the running processes you're going to see the full view of all the file systems you're going to see every network interface every Bridge every tunnel interface every things would be visible to you but if you create a container and you enter a namespace that view can be restricted now there are a bunch of these there's one called clone new FS this is the name of the cystoscope or the chroot syscall and when you call this you specify a new file system path that becomes the new route okay so this is the most basic if you understand chroot you essentially understand all namespaces because they all fundamentally follow the same concept except instead of limiting your view of a filesystem they're limiting your view of some other resource so clone new NS is about the view of the filesystem there's a UTS namespace this has to do with what your hostname is so it allows one container to have a different host name than another so when you call you name in a container you can get a different answer back out there's also a namespace for inter-process communication so semaphores and shared memory segments wouldn't it suck if you had two containers side by side one decrement to semaphore and all of a sudden the behavior of a neighboring application changes as a result of that because they both change they both chose the same name for the semaphore that would really be awful so having namespace for those you can name your semaphore is the same thing right or named or use the same constructs without interfering there's another one for process IDs so if you have a pin namespace that's unique to you and you start a process in it it's going to be PID one you start another process it's going to be PID two even though there's already a set of processes it's running on the host of one or two okay these things are essentially processes that are mapped through a simple mapping in the kernel to allow you to have the illusion that you're you've got the only processes running on the machine there's also one called a user name space which is the same sort of thing as a pin name space I'll come back to networks in a minute the user name space allows you to have the illusion that you're a privileged user when you're inside the container when in fact you're non privileged user with respect to the host so outside you might be paid 1,000 inside you might be or excuse me you might be UID 1,000 outside and UID 0 inside that's how these mappings work so it's a way of restricting the security exposure of the processes of running within the container there's a network namespace which allows you to control what interfaces a given container can view okay so you can give that container say a zero and only give it that and not show it anything else or you can give it you know a bunch of interfaces that are that are bridged and these namespaces all these different namespaces that I'm describing can also be nested meaning it's possible to run both with C groups and namespaces a container that has another container inside now this is one of the differences between virtualization and containers a performance difference that I'll talk about some more but it's important to recognize that this nesting is possible in order to appreciate that difference when we get to that ok remember I told you there was something special about this this concept of the container image about this being the different thing that makes containers special all of a sudden starting in 2013 so what is a container image if it's not a virtual file system right it's not a file system it's not a virtual hard drive it is actually if you look at actually how it's composed it's essentially a tar file that has some additional metadata attached to it one of the important pieces of metadata that is attached to this is an indication of what container this is derived or what can what container image this is derived from so much like I just explained that namespaces and C groups can be nested container images can also have a nesting relationship so you might have a base image that is say whatever and a bunt distribution right a bunch of distribution is my base image I might have a container image that says yeah based on that this other stuff too and it's just that additional stuff that is in that tar file that makes up the container image and the piece of metadata stating what it's based on is the important part yell with me yeah okay so you can have you know in this relationship right a container can have an arbitrary number or a an image can have an arbitrary number of these dependencies back up so you can have a the concept of a base image a child image a grandchild image we can keep going down and down and down it down down I would argue that once you get down more than about two or three levels it probably doesn't make sense to continue doing that and there are some limits in some versions of say for example docker that would limit that I think they're woods limit at one time that was something like 40 or 60 of these you couldn't go any further I think that since been lifted but if you can't describe your system in less than say four levels you're probably doing it wrong and this same hierarchy that I'm describing maps to something called the docker registry so let's talk about that so if I raise show of hands how many of you use git I'll give that like 90% of the audience uses git if you understand get you already understand how the docker registry works because the semantics are the same you pull in order to get a copy of something out of the registry you can actually change it and do a commit and then push that back up into the registry to save a new version of it and it's maps the same hierarchy that I was talking about before right this idea of a an image being derived from another and when we talk about docker files in just a moment this will start to become even more clear now if I go into this audience and I start pulling you individually and asking the question what is a container what is a container I'm likely to get a number of different answers from you maybe as many as 10 15 different answers I would like you to converge on this idea of what a container is okay it is the amalgam of a Linux C group Linux kernel namespaces a docker image generically referred to as a container image and the related lifecycle and all of those things together make up a docker container if you're missing the namespaces it's really not a container if you're missing the cgroups it's really not a container if you're missing the docker image you could argue but I'm saying that what we believe today is compelling about containers don't make sense without the container image it is the differentiator so where do babies come from that's an easier question to answer then where do containers come from there is a confusion between what is it docker file and what is a docker image so I'm going to clear that up a docker file is the imperative instruction for creating a docker image so if you know what a make file is write a make file is an instruction for compiling a binary write the input is to make is the make file and the output is an a dot out binary right you can think of a docker file as the same thing right it's like the make file it has instructions in it which I'll show you in just a sec that describe how to build the thing and when it's done if it's successful you get a container image out the back you get a docker image so this is not to be confused with a like an orchestration artifact or a pod file for kubernetes right these are declarative descriptions of a deployment of an application that's not what a docker file is for a docker file is just instructions for building a single image and see what they look like so this is a simple docker file it says we're going to start with a sent oh s six environment we're going to label it with my name to show that I'm the maintainer this is only persisted in the metadata for the container I'm going to install the Apache server I'm going to expose port 80 and I'm going to add a script into the container image that is going to execute by default when I start the container unless I specify something else that's all this says so from indicates what this image is what this container is based on and there is such a thing called a scratch image so you can define a new base image if you want called scratch and put a statically linked binary into it right which would be a environment with nothing underneath it or you can define a container based on some existing environment now here's another misconception I want to clear up when people first start to understand how containers work they make the assumption that whatever operating system is running on the host is the operating environment that your container is going to enjoy this is not true they will all share the same kernel and I'll explain this in a moment okay they'll show you the same kernel but they can they can all have different environments I'll explain why this works but I could just as easily have a an image this one is going to run on a CentOS environment I can have another one that's built on top of a bun too and I could run these side by side on the same kernel one application believes it's running it in a bun two environment the other one believes it's running in a sent to s environment and they work happily side by side on the same host it does not matter what Linux distribution is running on the host I'll say it one more time just for dramatic effect it does not matter what operating system is running on the host what matters is what's in the container image and what it asked for okay so here's the build command docker build - T web server so - T me is just tagged this build with a name just like I would tags you know a git branch and then dot means build whatever is in the current directory so this current directory would have the doctor file in it and it would have the start de sage script in it and that's all that's necessary in order to build this container so what if I wanted to make a child image based on the one that I just created I would say from web server install say a lamp stack and put in a different start script and so when I run this container it's going to cause the other one to be loaded and of course whatever the base image was underneath that and so now I've got this chain of three right I've got this one the parent which is the one that I just built a moment ago and then the base image which was the sent to last one so this is how does hierarchy works and when I build this I tag it with the word lamp stack instead of tagging it with the word webserver and the reason why I would want to do this is because now they're sharing a common base right so I might have one that's a lamp stack and another one that's a node.js stack right they can still share the same base image which means fewer fewer base images are cached on each host as I'm starting up new containers so it'll use less storage they'll be faster to start and those are some of the reasons why I would want to use containers to begin with right I promised I was going to explain these three areas and depth efficiency performance and security let's get to efficiency so many of you have seen this diagram before except there's a slight difference between this diagram and the one that you've seen before I have taken docker out of the out of the one on the left and the reason why is because docker is not actually in the execution path of the application in the same way that a hypervisor would be in the execution path between the application and the hardware in a virtual machine so if I run an application within a container okay and I once the the process has started its performance characteristics are exactly the same I'm going to use exactly in air quotes okay I don't like using absolutes but for for sake of describing how it actually behaves it is exactly the same as it would be on a bare-metal machine because the things that make a container different pertain to the maintenance of the namespaces and the cgroups that affects the process startup and the process teardown but it does not affect how that process can interact with the kernel while it is running so if I do something like a an open Cisco right and I read from it should I expect a performance to be slower just because I'm running in a container what do you think yes or no no I should expect it to pay the same as if I weren't in a namespace and I did the same thing because once my process is running the fact that it's in the namespace is not changing the execution path between that running process and the equipment that it's running on top of of course that's different in the case of running in a virtual machine so if I'm running in a virtual machine I've got my own kernel then I've got a either hardware-assisted or software emulated conversion alright I'm something that's imitating a machine and then underneath that you've actually got the hardware so in a virtualized environment should you expect the behavior of your application to be different or slower or worse in some way than it would if you were running on the bare metal of course right it would be different so this is how when people say containers are faster sometimes they're describing that well they're faster to start up than virtual machines because of the nature of how the container images worked right this layering that I described that hierarchy another reason why they say containers are faster is because there's nothing interfering between the running application and the hardware itself in the way that virtual machines can interfere in many ways with your access to the hardware now some may argue ok with things like PCI pass through you know mapping out an nvme storage device with the file system on top that you don't have the same performance drawbacks running in a virtual machine and so it doesn't matter but the truth is those behave differently from one kind of equipment to another you can't generalize that it's going to be as good as bare metal on every single machine that you run it on whereas when you're talking about a container I can offer you an assurance that it's going to behave the same way regardless regardless what the different equipment is ok so it's a stronger argument for there is a more Universal performance benefit running an application within a container versus running it in a virtual machine now the next difference is a security difference does anybody recognize this what is it this is Castillo de San Marcos in Florida it's a fortress it was successfully defended for many many years how many men or women or soldiers let's say soldiers do you believe it would take to successfully defend this fortress against an external attack just a guess four soldiers could do it how many thinking let's just be crazy let's say we need fifty to a hundred soldiers to defend this ok now what if the problem looked more like this that's a much bigger attack surface right what kind of a soldier force would I need in order to defend against something like this right we've got a multiplication problem here and the same problem exists when you're talking about defending against hostile workloads that are sharing the same hosts when those things are isolated in virtual machines that's one risk that's why it has one risk profile ok when those things are separated only by containers that is a different risk profile a much more complicated risk profile which I'll get into in a minute in the hypervisor world the number of things that compose the attack surface between two virtual machines on the same host is a very short list it's shown here on the screen this is a relatively small attack surface is it possible that you could have a running virtual machine and another running virtual machine and have processes inside one escaped through the through the hypervisor and get into say the memory space of a neighboring virtual machine is that possible I sure hope I see nodding heads yes that is it is possible but it's also relatively straightforward to defend against because the attack surface narrow by comparison the Linux is call interface is not quite as narrow there are 397 system calls in the Linux Cisco interface since version 319 of the kernel that is like the multitude of Castillo de San Marcos that I showed you that is a much more difficult attack surface to navigate okay if the only thing between two neighboring containers on the same host is something that's got 397 sis calls you need a different strategy for how to defend against escapes than you need when you're defending against virtual machines contending with each other it's a fundamentally different problem so this is a security difference so I want you to remember that the barrier between containers running on the same host the security isolation barrier is not as significant as the barrier between neighboring virtual machines on the same host you can consider it like a dashed line instead of a solid line for this reason and this is an attack surface argument now there are ways to make containers as secure as virtual machines or nearly as secure as virtual machines does anybody recognize what this is holler it out nobody knows what this is not a house key louder what is it this is a bump key okay a bump key is a an exploit for a vulnerability in almost every lock that we use on our front doors to our homes and offices today here's how it works you enter the bump key all the way into the lock you back it up one notch you twist it very gently to the side and and you hit it with something like the handle of a of a screwdriver and what happens is all the pins that you're lined up against I'll jump at the same time which allows them to line up with the shear line and for you to turn the lock okay this is a fundamental security vulnerability in the physical lock this is the exploit to that vulnerability now it's only vulnerable because the strategy that we're using to secure the lock or to open the lock is a key you can change the design of the key and if you do if you put millings into the side okay now you have pins that are in the side and pins that are in the top if you put a bump key in and jump all the pins up now the lock is not going to turn because the shear line is only lining up with the pins on the top and not the pins on the side okay so if you fundamentally change the game you make the lock more secure you need to do the same thing with containers so you've got a bunch of techniques available to you in order to change the game in order to limit the attack surface between these neighboring containers so one of the most common ones that you'll see in practice is using a mandatory access control policy so using SELinux or app armor is a way to produce in mandatory access control policy this just says you are allowed to do nothing with this kernel except for the things that are allowed by the policy which is different than the way the system normally works which is that you specify things that you're not allowed to do so if you think about it conceptually like a firewall policy it's like a default deny policy for interacting with the kernel now the problem with using Linux SELinux or app armor as the the only security mitigation strategy is that to make a useful policy it's got to allow an awful lot of stuff right because applications do a wide variety of things and for it to be generally useful for applications that policy that default policy would need to be very very permissive and which makes it not very strong so in order for this technique to be effective it needs to be tuned on a per application basis this is why we don't have a single selinux policy that works for all applications right because the whole idea here is that it's default deny not default allow now you can also use something else called SEC comp or secure computing mode this is originally designed for batch processing applications where you would start your your batch process up you would open the file handles for the for the input file that you're going to process you call set comp and it allows you to only call read closed and exit and no other syscalls if you make another cyst call besides read closed or exit you're going to get killed by the kernel immediately now it just turns out that SEC comp has been since expanded to allow you to specify a policy of which syscalls you're allowed to to execute so now you can say okay my application requires this interaction with the kernel that can be specified at the time that your application starts your application can call set cop directly to set it's secure operating mode and if you do this on all of your neighboring workloads it makes them much more secure you can also nest containers so remember I keep harping on this idea of these things can be nested these things can be nested this is one of the reasons why it's important containers can contain other containers now when we think in the world of virtual machines convertible machines contain other virtual machines sure yeah you can do that and why don't we do that we don't do that because there's a huge performance drawback to doing that because the hardware virtualization assistance only works for the first level not the additional levels down below so if you were nesting virtual machines you're probably having really bad performance outcomes that drawback does not apply to containers for the same reason that a running process on bare metal performs roughly the same as a running process on a in a container a container within a container also behaves roughly the same way once the processes in that container have started so you can nest containers put SELinux policy on the one that's at the top right and then create other containers underneath and an escape now needs to be multi factored right a multi is an escape exploit needs to be multi factored in order to successfully escape not only from from one container into the next but then from that container into the host okay so you're making that escape necessarily more complicated which reduces the risk of exploit if you're using docker there's a plug-in interface called off plugins and you can do things that limit your clients access to various features in the in the docker server itself so if you don't want to allow something you can for example if you don't want to allow them to run privileged containers you can do that there's a feature in the kernel called ASLR this randomizes the way that the address space memory address space is allocated on the host I think by default now most most distributions are doing this by default but this makes it more difficult to do escape exploits and you can also do there are features within the hardware itself that both improve the performance and make it more difficult to perform these kind of escapes now this won't protect you against defects in the hardware itself but it will help you in terms of defects within the kernel for example so in this talk I outlined three key differences between virtualization and containers I talked about their relative efficiency I talked about their relative performance and I talked about the relative security so in general they tend to perform the tend to perform better you can usually stack a lot more containers on a host than you can stack virtual machines on a host primarily because of the way that they work right a container says you're allowed to use a maximum of a certain amount of resource okay but it doesn't pre-allocate that resource so if I'm memory bound for example and I create a lot of virtual machines that I'll have overhead of memory that's pre-allocated I'm probably allocating a whole bunch of memory that I'm not using whereas if I'm using containers I'm only saying you're you're only allowed to use a maximum of so the oversubscription rate is much higher so from an efficiency perspective more work ends up happening on the same host if you're doing less virtualization work write less mapping going on between the the running processes and the equipment there's less tax on the equipment so it runs faster and then I talked about security and it being fundamentally a different problem with respect and neighboring containers and neighboring virtual machines and although there are techniques to making that interface between containers more secure of which I detailed a number of them there's not just one magic bullet that's going to solve all of those all of those concerns so you need to keep that in mind as you implement these now when you choose which of these technologies to use it's not an exclusive choice of I'm only going to run this thing on bare metal in containers or I'm only going to run this thing in virtual machines you can still put containers into virtual machines it's a perfectly valid use case and if you care about that that additional security isolation you're still going to get benefits from having the comport ability of the container in edge having the you know micro-services capabilities but you may choose to use both of them in combination so that you can reap bet both of the benefits the only time you need to feel like you're compromising here is if you've got an application that's exceedingly performance sensitive and needs to be exceedingly secure and that's where you're going to need to use those advanced techniques I'll take your questions so there's a microphone in the aisle fear for your questions howdy-hi did I miss the part where you talked about how two things can be running under different distros and feel like they're running under their own yeah after the same distro I did promise to explain that so when a container starts it's sharing the kernel with the with the host right the container image specifies what the layered file system will look like when it sees roots into it right so essentially it's going to take in my example right I showed a sent OS example it's going to have a file system of the CentOS distro plus the Apache server that I added right that's going to be laid on as a second writable layer and I can interact with that as if it's a sent OS host I can create another docker image that's based on say a bun to and start it and it's going to be using the same kernel at the same time now some people believe that just because a distro is set up a certain way that it requires a certain kernel it turns out that this Linux Cisco interface is a stable API and because it is a stable API since roughly version 1/3 of the kernel version 3 319 and later for sure this the the interface is not changing API is not changing so those libraries like your G Lib C for example that runs in the a bun to environment has exactly the same interface to the kernel as the one that's been sent to us they are literally identical so because that is a stable API it does not matter which currently are running now some may argue with me they may say no there are actually differences in the kernel and to some extent that's true you shouldn't expect a an application that was designed for today's kernel to work on one that's built 10 years from now ok that's an unreasonable expectation but in terms of what's running today the general the general answer that question is they are compatible and I have for example five or six different operating environments all running on the same kernel at the same time and yes this does work does not matter as long as you're running a modern kernel that has all of the features of all of the applications that you're running they're gonna work you're welcome yes so I had a excuse me a discussion with somebody a while back and they were making the point that we're trying to that somehow containers because of the file structure you spoke up are have a greater affinity to the DevOps model and that would be ultimately one of the things that will accept you know to accelerate the take-up of containers do you see any truth in that well look what DevOps cares about is getting through CIC be fast right so any tool that's going to accelerate the process by which you go through your CI that's going to be considered valuable and docker has been shown to greatly accelerate most people CI because the containers start more quickly they use fewer resources and they have a once you've created the image it behaves the same in test as it behaves in production that binary artifact that you create both is what you tested and what you deploy into production assuming you're following best practices it's super attractive for that reason so yes I would say it would definitely take on to that audience quickly all right thank you everyone for attending [Applause]
Info
Channel: Open Infrastructure Foundation
Views: 58,036
Rating: undefined out of 5
Keywords: OpenInfra, Open Infrastrucure, Open Source, Containers, Virtualization
Id: rfjmeakbeH8
Channel Id: undefined
Length: 45min 53sec (2753 seconds)
Published: Tue May 09 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.