Kubernetes meets Linux - Vishnu Kannan (Google)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
everyone I'm really excited to be here Brennan gave a great great introduction to watch conveyances and like how the future is going to be with containers I'm going to spend some time talking about what containers actually are like there's a gap that I see in the community there people used to version machine and they're like used to their applications and you know they hear about computers and they hear about all the awesome things and they're like wait what is the container like what the hell is a pond what do we do with it like what is all these things like I people don't really get that right away so the point of this talk is going to be about trying to like sort of demystify what exactly is a container and like what is the port and like how are you going to move from your current world into this new equivalent world so a little bit about myself I've been working with the next containers for the past five years I've seen sort of the Linux world transition from what containers used to mean in the past to validus today and it's been an awesome journey and the journey is ongoing so if there's anything that I want you to take away from this talk today it's going to be that career is this both using standard Linux features as it is today chris is getting into the Windows land but it's a process it's happening for now what Corrales is is a Linux based orchestration system and so understanding Linux is going to help you understand Canada's so that's the key thing and I really think that everyone who's using cameras out there unless they're using a - service they should probably have some understanding of water Nexus and how Carroll is this consuming Linux and the third thing is cluster management with Linux or uncurious is actually a journey we're not there the final destination we're actually like making rapid progress really in the last three years it's been like amazing progress but you're not there yet so I'm going to talk a little bit about what are some of the shortcomings that we have today and what can we do to include that in the future so before you get into the technical aspects also I want to say one more thing often what I see is that people assume that free open-source software means you also get 20 formation support you go to get up you download a software you take some videos you run it and then you hit some issue and then you are like I'm going to go together I'm going to I'm going to find an issue the next day it's going to be fixed that's not true that's not how it is like I mean the economics even though like democracy is awesome with open-source software that is economics and so if you want like support you probably have to use an ionic service or the other option for you is like actually understand the technology spend time understanding of spend time understanding the dependencies for the technology and that's how we can actually make better use of any free open-source software including converters so a very brief high level overview been it already talked about this a lot but I'm going to be very focused on like the pods in the container aspect so the whole idea behind kavadis is that you are building a container so you hear the term container I'm going to talk a little bit about what exactly it means it's at the end of the day you wanna learn Linux processes maybe one or more of them and lots of them together and then you package them in the form of a container image and then you also have a runtime manifest because you want these applications to be in the same way irrespective of which cumulus cluster they're running under so if you're running on a clear structure that's on your laptop or you're like deploying in a cluster that's somewhere and like like Google Cloud or Azure cloud or Amazon or your own data center doesn't really matter it has to behave the very same way has to be predictable and so what you do is like you combine your container images that represents your application and then and then like add a runtime message to it and then you and then you submit that combination to occur in its master or the API server and then what API server does then of components an API server does is it finds an appropriate worker node for running your application and then the worker nodes then goes and fix the workloads that they have to run then it sets up a consistent runtime environment so it takes the container images bring your applications down to the worker nodes and then it starts them up expensive Linux processes such as appropriate sandboxing container sandboxing and then it also makes sure that the environment that your application see is consistent across linux clusters so what are containers and pause you have heard this term probably a lot of times today already and you might have heard this in the past - like what is the container and a I'm going to start with water container is actually before I talk about pods because positive entually an abstraction over containers what is the Linux container I'll ask this question two different people I have gotten like completely different answers containers mean different things to different people for some of them it's like just a fancy packaging scheme some of them it's about like runtime isolation for some of them it's about like a container is little in expecto energies actually I'm going to try to learn try to actually like explain what a container is today at Google for example five years back or like four years back containers used to mean just a combination of two tin the next technology it's like you bring your application and then you set up like truth and then you set up you set up some some control groups I'm that supported and that was a container but today that's not true today it's like containers employ a lot of different Linux technologies they also provide a great packaging solution like for example some features that containers provide include improving utilization because if showing the same the next corner so you also get like better utilization and then since the startup signs are really really low so you get like new deployment pilings imagine like scaling up your application from like zero instances like thousand instances as soon as it's Christmas like it's Christmas Day and you get like a huge sale and then you serve this scaling automatically for you but like literally no downtime and that's sort of that sort of like application deployment primitives are made possible with containers needless to say also get like a simple spackling scheme that like a friendless was showing in a stock you no longer have your dependencies built into the base initially running you no longer have to like deploy well or you know what I have like deploy Ubuntu on every single node you don't have to care about the the base list or you're running you only care about your application under the dependencies with the assumption that your applications are built using standard Linux api's so in short containers let you focus on an application unjustice dependencies and forgetting structure the infrastructure is someone else's problem like the problem doesn't disappear it's just that it's someone else's problem and you keep throwing machines at solving an infrastructure problem essentials for and that sort of art curators does so let's talk a little bit about some of the common the next technologies that they use to boot can't yesterday one common technology that you would hear often is an overlay file system so trying to like try trying to describe here this picture here that do you have some base in the next distribution that you want to run your application on you don't really like ideally you would not want to care about that that that base Linux distribution yes so overlay fly system lets you exercise so in this picture here I have three containers and first container is actually running a java application and the second container is on the last two containers or basically nginx and all these three containers are actually based off of a debian container image in that like it doesn't include a kernel it just includes all the use of space utilities and the libraries that are shipped with Debian by default so no he can like start layering applications where like you can you can have you can have distributors providing you those basic utilities and packaging solutions and then you can start building your applications on top of that so you get like a stacked file system in that like you have the Debian image that's being provided by Debian for example and then you add like more packages that are Debian packages and then you can stack all of these together and then put like a writable layer on top of that and then what you get an end result is that though you have a unified file system and to the end to the application it seems as though like it's running it's running on like a regular Linux file system right it doesn't really know that there are many different layers underneath and some of those layers are shared and that like those layers are cannot be cannot be changed by the container to it like it gets a runtime environment that's consistent and another cool part is that now we can start distributing these layers independently you can say that my container image on the case of container here here has these layered dependencies like has a debian the java runtime dependencies but it doesn't have to care about like downloading all those layers online and or distributing them so that's all taken care of by the container distribution framework and that like if there are three containers that are sharing the debian-based image here in this case they do not have to download the base layer three times like you can start sharing common pieces of our data across containers so a lot of technologies or a lot of Linux features or technologies exist the try to implement overlay filesystem concept the unfortunate side effect is that each Linux distros chooses their own default based on the use cases that they're trying to optimize for so you have to like choose the one that's most appropriate for you or you just like try to choose the one that the distro or just go with the one with the distro provides view there are pros and cons for each each overlay technology that you use so let's again another place that you got to understand what you're building on top of so know that I talked about overlay images it's probably a good time to like touch base on water container dimensions like I mean we talked about layers and so you're starting starting to see layers in your container image specification so example here is like container images also called docker files the image manifest so here in this example we're saying that I'm going to start some debian which itself could be comprised of many image layers then you're saying that I'm going to install Java Runtime on top of it and then I'm also going to add my application that's on my localhost into my container right and then I'm going to set that said what my container is going to do when I run it so it's a continual image manifested aquifers are a combination of both your your image your application data as well as like the one-time one-time manifesto like how your containers have to behave when you get when they get installed and they're started so you get you also specify an entry point where you say that okay when my container runs this is what I wanted to do in addition to having the data you're also specifying how it should behave so moving on the other technology that containers use is called control groups this is a feature that Google originally introduced about 10 years back now I think the primary purpose back then was to like wasting a lot of CPU that a group of processes can consume but since then control groups have become popular for a lot of other use cases control groups now apply like different Linux features across dupes of processes like basically grouping processes and then you can do whatever you want with that one pool use gives is that no in a container you no longer have to have like sessions of processes rather you just have control groups and you can track all processes that belong to a container and if you want to clean up a container you go and look at its control group and you go get rid of all the processes in there so it's like that's that's one like completely different use case it was not originally meant for control groups but I like control groups are enabling that feature primary use cases for contribute in addition to killing containers is if restricting that one of CPU that that each container can use it's also restricting the amount of memory that each container can use this guy oh that each container can use the other really critical technology that's being used for containers with all directional faces so the easiest way to think of relaxed namespaces is to like think of having a virtual kernel context for each application right you're running you're running multiple applications on the same post but they want to behave as though there are like different holes you don't want you don't want them to see each other you don't want to have like any coupling between any containers running on the same post and you also want like give them some some extra parameters like for example you want each container to have its own hostname so you get like you get like a virtualized Linux kernel API where you get like virtualized process trees you get like virtualized network interfaces because your own like routing cables such like local to your container virtual host and then you get like your own dedicated users and groups which might or might not have any relationship to the users and group IDs on the host you get your own like local filesystem view your your filesystem is local to you your container cannot see someone else's file system so there's like they're all like in their own isolated jail and then you also get like a control API that's also watch life security so this is something that I often find people not dropping their head around like I've been saying all along containers sorry in this case it shows pot I'll get to them future containers apart they all like share the same the next corner so the side effect of that is that if there is the kernel vulnerability it's possible for one container to go and compromise other containers right or if one of the containers gets compromised then if it's root on a host then like it's it's basically able to go and compromise every other application that's running on a host it's also the aspect of like what else can you do want your route but leaving the left side if you want to have like if you want to create a safe environment but like you can have containers and ports like sharing the same Linux kernel then you've got to make sure that you restrict the amount of kernel API access that each container gets you got to make sure that you don't give more privileges than what is absolutely necessary for each container and there are several Linux security technologies that exist you got to understand them just don't assume that the default that you get out of curators or like out of any other authorization system is going to be adequate spend time understanding it that's actually going to be helpful for you in the future another term that you would hear with containers is this is this concept called volumes volumes is actually in this number if you go and talk to a cig if you go and talk to a storage person in the Linux world they're going to say like volumes are bags lock storage right but that's not actually true like with containers volumes could mean anything it could mean like block storage could mean a file on somewhere either on the localhost or it could mean a file that somewhere on some like remote server it's like any data that's not within your overlay file system so all a file system is isolator that's very closely tied to your container image and everything that's outside of that is called a volume and some of the reasons why you would want to volume is that you want like sharing or you want some persistence of data so when you're up when your container dies it's the data that it cares about is still down network I mean what good is a what the next application with Cara taco the network today so like networking is like very critical for the fundamental building blocks for containers unfortunately networking has had a lot of features over the years a lot of different ways to build the same thing and so needless to say with containers they're different ways to achieve networking the one that I'm describing here is the most common the most Rivlin where for setting up networking it's that you deploy Linux bridges you use like virtual Ethernet interfaces use some IP tables magic basically there to give a quick run-through of like how things work is there like you have you have virtual Ethernet interfaces like you have they're showing that parish where like if traffic flows from one end of the parent it automatically gets transferred to the other and vice versa so what you do is like you place one end of the power inside the container and then you place the other end on the host when you connect all of them together with the bridge what happens is that because through the bridge now all containers can talk to each other within the same course now if you want to go across the host then they were then able to like set up some IP tables and some network address translation where the packet then goes over to the host network and then it gets it gets out and based on your your underlying network fabric which changes little bit with pods but this is how like container networking books in general so now that we talked about containers let's go talk about pods a little bit like hopefully it was somewhat here like what exactly is a container and I know it might be a good time to actually understand what a pod means so pause a people might ask like what is the pod while why do we even need the concept of a point for most scenarios maybe you don't even care about a pod and that's okay but there are some scenarios where you might start carrying about pods for example you're like say your viewer and a washing machine boil today and you're running your your my sequel server like this is one one situation very alike friend was describing how like a overall like cluster wide monitoring service then that steak-like takes care of all applications but then like what happens we have a specialized service that needs its own monitoring like or you're like adding some more like special monitoring that's your application or having like a very special logging agent that's trying to like aggregate logs and understanding parse logs from your application and I'm like what if we have say a friend 10 and you only keep updating its content based on based on like some other process or like you trying to download content from some other source so what do we have use cases where you have your primary application and then you have like some secondary applications that are trying to help run your primary application those are scenarios where thoughts come into play like you want all these applications to behave to get be deployed together and be able to see each other and they're like sort of running as one functional unit so this is very common the VM Berlin that like you're running a virtual machine you can have like one or more of these applications running together they can see each other and they can see each other sources they can they share the network space and so forth and so what if you want to move everything that's within your VM into into the container Berlin into the equivalent world and that's why I like both come into play so you get like Co scheduling so every container within a pod is scheduled as like one atomic unit you get composability you can pick up like off-the-shelf containers and you can add your own cool logic on top of it and you can combine them and then call it a micro service so essentially like pods let you build real-world micro services so how is the pod different from a container a pod shares all the next namespaces so remember the virtual machine energy that is talking about so all containers in a pod share the same virtual host so they share most of the Linux namespaces accepting more name space and the behavior so they are on the same host they shared data so there is a concept of pod level volumes which could be tied to the pods life cycle or it could be tied it could be outside of the pods lifecycle and anything in in any sense like it's pods have some common data that can be shared across containers they also share a common network interface so this changes with criminals a little bit not like every port within a cluster is expected to talk to each other so you don't know longer need like fancy routing and that like every part is part of the same address space so they can all like talk to each other so that changes how networking happens a little bit but in essence a big container within a part shall same network interface and they have the same IP port range and let's similar to how like up these applications of behaving in a VM world there's also in addition to this there's also a control group jail across all containers imagine a scenario where like you're running some memory back volumes like Memphis then your container dies and what happens to all this memory that's tied to the volume so with pause you get like an another level of isolation where two points on the same node or isolated against each other and they cannot like break out of there their container in jail in any means so now it's actually time for a demo the point of this demo before I get into the demo sir in this demo I will actually show the Linux technologies that I talked about and build a container from scratch right and to add some flavor to that I'm going to create this container from within equivalence point so it's actually like two levels deep so I'm already running inside a container I'm going to create a container from in there and the cool part about building this demo into equivalence point is that now I can run this demo on any kubernetes cluster and it's going to hopefully we have the same way and at the end we also explore how the sandbox differs so just give me a second here and whatever I'm going to be pulling up my terminal and here we go everyone able to see this probably not hopefully no we can see it okay so I've scripted my demo and that like I'm not going to stand there and keep typing every single command but that lets me speak because I'm not really good at typing and speaking but I'll end up a it's a live demo that's running on a criminal's cluster that's on one of the Google Data Centers so I'm creating a cubist namespace here it's basically a virtual context I like I can tell all cumulus concepts or Cunha's artifacts into a single namespace context and then once I believe the namespace every other artifact within that namespace gets deleted so it's really useful for a demo so here is a cumulus pod manifest so remember in the beginning I was talking about by like you take your application in the form for container image and then you specify some of its runtime environment attributes and then you submit it to the Kerberos master so this is of that specification looks like in reality and then I also created a container image but I like prepackaged or the Linux utilities that I'm going to be using for my demo because like if I'm going to stand here downloading all of it that's going to take too long so this is the container image I'm not expecting you to follow every line in there but that's the sort of whole like image files look like so moving forward so I'm going to create my base board that's going to serve as my test environment so the base part has already been created I'm going to see the base pods are running and they're running because the images are already there or my test nodes now I'm going to copy over my demo script from my laptop onto the pole that's running somewhere and some Google datacenter and once the part I mean once the demo script I seen copied over and what I'm going to go ahead and start the demo script this demo is based off of an Ubuntu container base image and I've added a lot of Linux utilities on top of that so here you can see that within the pod it's it's it's always open - and in reality like all containers are sharing the same Linux kernel so the next color stays the same just that the distro here seems to be able to but if you want to see the distro that's that this point itself is running on inside Google so actually running in a Google container optimized to us so your basis so no longer matters for you your applications behave the same way the next step is let's go ahead and like create the container sandbox the first step for that is like creating the next namespaces I'm going to be using a Linux utility called unsure but I'm like creating a new set of like Network process UTS and ICC and on namespaces so if you want to see what my current namespaces are in essence are just like identifiers unfortunately there's no like fancy names associated with them so you see a bunch of numbers here and refill the namespaces for my quad right now let's go ahead and look at a new namespace that i had just created and if you see here for some of them like for the one new name face that is created like network for example you have a new identifier right so that means that you have a new network context or like a new virtual context within the linux kernel and so now we can move processes into this this new virtual context and then their view is restricted to that virtual context next step I'm going to set up networking I can go in and like set of networking and like individual bits but I chose to use this nifty to call our container networking interface or CNI and CNI lets you like build different sorts of container network primitives so in this case I'm asking CNI to like set up a Linux bridge and allocate IPs for containers from within this subnet range so I'm going to go ahead and set up networking for the container and networking further for the virtual host that I just created and the pointer to the virtual host is through the proc file system in the neck so I'm saying that use the network namespace that's associated with that specific process because that process is the one that's anchoring all the new namespaces that I have created so like like my virtual sandbox has gotten a new IP which is 10.10 dollar a day is I'm going to see what network interfaces exist so this is within my pod in my fault I have all these different in network interfaces I have my demo Linux bridge and I have the canonical into your interface that's how my port talks to the rest of its world and then now I have a weap interface if you remember the network architecture no you have the beef interface for the container and then now if you want to see the network interfaces that's there within the new container sandbox you see just one because you added just one inside it so that that's one other dimension of isolation the next step is I'm going to create the create an overlay filesystem I'm going to restrict the filesystem view for my container sandbox so I already have a busy box image that have prepackaged into my into my test environment I chose busy box it was really small there are other options that are small too so please have it in the form of star ball and exploding it into a base directory here next step is I'm going to create the writable layer if you remember the overlay files can slide like us showing this like a a writable layer that lets containers think as though it has an old whole file system unlike make changes to the underlying file system so I'm going to create a writable layer for that and then next step is like this forget about directory is just an implementation detail of the OL a feistel I'm using for this demo and then I'm also going to be creating a root file system which is Valley unify file system view will be presented to the Container so I'm going to go ahead and create the overlay file system I'm going to create this within the container context so this overlay file system is only available within the container and no one else no other containers can see this so here we go so we have the watch we have the overlay filesystem inside the container next step is I'm going to I'm going to try to create a file within this overlay filesystem what we expect is that whatever mutations we make to the overlay filesystem is only available in the writable layer and then my base container image stays intact so that's going to be that's that's what this part of the demo is about in there like showing you that the changes that you made the new file that you created is only available in the writable layer that you added but it's not going and getting propagated down to your base image so the next step is to add control groups into the mix so I'm adding I'm creating a new control group here called tests I'm using this this utility cause CG create create this new container so I new control group then I'm going to show my current control group you see we see that like control groups are hierarchies they are like sort of like directories and like Google is already creates a whole bunch of control groups to achieve a lot of like resource isolation policies but for the sake of this demo assume that like this is where your your pod is currently running under and then I'm going to move myself into the new control group that I just created and then at the end of this my control group should have changed to be the new control group that is created since they are hierarchical it is a sub control group within the existing control group next step we almost done final step is like create a profile system inside my container sandbox because that's like a really popular in the next API for most applications so I've gone ahead and like created a profile system but before entering the sandbox also I want to show how my process view is outside of the container sandbox now we can see a whole bunch of processes here what we expect is that once you get into the container sandbox your views like restricted just the processes within that sandbox so I'm going to go ahead and enter the sandbox and then let's look at the processes within the sandbox so you see here that like now you get like an isolated process view and similarly if you want to go look at what wants are available those are also isolated and if you want to go look at what are the files that are available and accessible from this sandbox it's only the ones that were part of that busybox image so if you write like replace with busybox with your own application like a java application then all you're going to see is what your application is supposed to see and that which is part of the container image and then since we set up networking your container should also be able to talk to the rest of its void so that's that's the demo part and let's get back to get back to the rest of slides remember I said like cluster management is in the next journey and there are lots of future opportunities the thing is that like google try to run or Google is running continuous at scale at like really massive scale I think the number is like over two billion containers per week and data is like really old at this point so it's like every single application inside Google is being run in the form of containers so that requires changing Linux in this that might not be appropriate for the rest of Linux users so the next it says is moving from is has to transition from it's like desktop and like it's embedded words into this like high-performance servers and like like container world but you have like hyper flexibility in terms of policies on as to like how you want to utilize your note and also like enable more primitives for like virtualizing more and more parts of the Linux colors such that again you can run you can run more and more applications in containers some of the aspects that have to improve include like name spacing the next CPI is like for example the proxy PU in phone and then for that's like a very common API doesn't actually work inside containers so it's it's like a pitfall for many users because they're not aware of the fact that like not all API is are actually namespace and similarly there are some more like security and and resource isolation limiters that have to be improved but and that's part of the journey isn't key just so you can become part of the journey no because it's not happening in open source world so to summarize cumulative split using simple and Lextran relaxed features and understandings index is what I help you understand converters and cluster management will still acts as a journey so if you want to go learn more about this I would recommend like creating containers from scratch you to like run C rather than like doing the way I did also like try to identify the minimum privileges required for your your container this is a good exercise for you because then you will actually start understanding the next security and you would know how to run containers at scale and production curious is open as when I was talking about careers like literally everywhere it's like starting the supports a lot of like diverse applications it's running on diverse hardware like running or not very pious to like high-end servers it's running on Windows now unlike it's also running on Linux SWAT star so become a part of the community like get to know the people behind behind the project get to know the get to learn more about the technology before you like start deploying it in production thank you so I'm going to be hanging out here for some more time for the rest of the day so feel free to find me and ask me any questions [Applause]
Info
Channel: Container Camp
Views: 5,654
Rating: 4.9736843 out of 5
Keywords: container camp, container technology, containercamp, kubernetes, google, linux, pods, infrastructure, containers, google borg, vishnu kannan
Id: Slce9Nu-NB0
Channel Id: undefined
Length: 32min 36sec (1956 seconds)
Published: Fri Jul 21 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.