Lab: Linux Container Internals - Scott McCarty & Marcos Entenza Garcia, Red Hat

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right is everyone here for the Linux container internals okay if you're not here we're gonna fly to Boston so you're stuck on this airplane alright so this presentation has the wrong names on it I realize that these are my partners in crime from Red Hat Summit a while back but this is a lab that we have run before so we're running it again here and we'll run this again at Red Hat summit actually in May or June or whatever that is but um so how many of you guys understand it have played with docker good all right that's good because this class should go deep how many of you understand and have played with kubernetes good that's more than most of the time that I've done this so crap making noise all right so so the way this how many of you have laptops okay good because you to go along with the labs you'll have to have a laptop and I'm and as everyone able to connect to the Wi-Fi is there anybody that doesn't have Wi-Fi access okay so that's good if you didn't raise your hand that I'm assuming you know what you're doing on how to connect to Wi-Fi we're gonna do like we're gonna attempt to do three of these four labs these labs are available online so even depending on how far we get I mean you can complete them yourselves afterwards but I'm going to go through the presentations and present some stuff live we don't so so we don't have any more yeah we're out you don't need them though you don't actually it's not critical so some of you got lab guides I was hoping to give them to everyone that came but we didn't have enough I didn't know how many people were I had no idea that we were going to actually move to a bigger room and then have more people so we got moved to this room and now there's more but it's not critical that you have it the one the people that do have it you'll have stickers for putting these in your in your lab guide and then you can take him home but but honestly I mean all the drawings and stuff are available online so you don't need them and I'll show you how to get to this presentation and how in fact you'll even build a watch a recorded version of it if you want with with a system that we're gonna use called cata Kota so so not to worry you will be okay you know it's it's not critical that you have the lab guide so we're gonna go through probably I'm guessing Mike guest says we're gonna end up getting through three of these labs I'd set a four because we only have an hour and a half and so what we're gonna do is we're gonna do a small presentation and then go into labs and then we're gonna walk around and help anybody that has problems but you shouldn't have problems it's a it's a it's a dedicated environment that has virtual machines that are configured in a very specific way so nothing should break and the tool is actually really cool and really easy to use and so you can click through if any of you how many of you have done tutorials on kubernetes io or the kubernetes Docs so it's the same system that they use it's called catty coda and so it's very cool it's like this interactive environment where you can like read some text click on something it types it in and it's a it brings up a terminal and you can actually type commands on it it's a like a pre-configured virtual machine so before I kind of jump into the presentation are there any is there anything I missed any questions before I jump in say that one more time yeah the lab guide is available online I'll give you guys all the links too and also I guess the easiest way is probably here I will just bring this up not that I'm trying to get more Twitter followers but if you if you want if you want me if you want to follow me I will post all the links here I have no idea why this is doing this yeah everyone has my phone number now if you recorded that but uh alright come on my my name is father Linux so so if you want to follow me on Twitter I will post all the links here and everything that we're gonna do here is available online so the lab guide you will not be able to get stickers online you can make your own stickers if you want but but I will not be sending people stickers they're expensive so but we'll have the lab guide and then the catacomb environment so I'll just show it to you real quick the the catacomb Armand so so it's under my profile at Father Linux but you'll see here it's like cata Kota calm / father of Linux and then it's under the intro OpenShift there's these four labs we're gonna be going through the the the one two three and four so these are the lab guides for the container internals or these are the these are the actual interactive labs that we're gonna go through today does that answer your question hopefully I think okay cool any other questions oh there you go you want to see it bigger yeah how do we do that but it will only enlarge it won't enlarge the URL I'll do this let's do this I got an idea how about that all right so we'll add them in here as we go so uh so actually here we'll do this so anybody that needs to type these things in all right so so hopefully that helps and then well as we as we figure out things that maybe aren't documented that well I will oh yeah okay that didn't work there we go all right so if if we don't have any other questions I'm going to jump into presenting any other questions okay so I want to try to get through at least three of these so um oh actually here well so these are probably not right actually so I will send out the right ones afterwards and these are older so I'm gonna jump right into the actual presentation so we start with architecture in this lab I'm kind of an overview and resetting what you understand until about containers and then we move into like a single host tool chain so like what a single container host looks like and then we'll move into a multi host environment what a multi host environment looks like and then we'll move into a distributed systems environment where you're actually troubleshooting things like you would do in production and so it's kind of a progressive setup where we kind of reset let you know understand really the way container internals work the way containers get created and how the how to troubleshoot them and the idea is that this should help you really be able to build better container images and really run containers in a better way then we'll move into the host tool chain so that you kind of get a feel of the lay of the land of what uh the tool chain looks like because it's actually a lot more complex than people think I think you run a single command and docker and you think it's easy and then when you go to architect your own environment you have to learn a lot more stuff than you think you do and.and honestly a lot of it is just bridging some gaps so that's kind of the whole the whole idea of this presentation is really to bridge some gaps so first and foremost and you'll see this again in in in the lab as you go through it but the Internet is wrong so if you look at all of these architectural drawings they almost all show like a blue this is not in color but there will be a blue line that says docker and it shows containers running on top of docker and it's incorrect like containers do not run on docker docker is a demon it's an API daemon that accepts requests from a user it then translates those requests into actually it talks to a bunch of other demons which we'll get into deeper and then eventually this comes out to be a system call called clone instead of fork or exec that creates another process on the Linux system and so most people kind of think they either see one of two things they see the blue line and they go oh well if I just run docker get docker then my doctor containers will just run because they run on docker but they do not and then the other drawing that you'll always see is like sometimes you'll see that docker runs on you know a Linux system and that the containers run but you will never see the other processes running side by side so you will not see other user space demons running side by side you will not see regular processes running side by side and in containerized processes they're really all equal they're all just user space processes so there's two main ways that this is basically wrong all of these drawings and they all lead to the conclusion that I can just run docker and and that's where the containers run they run on docker they do not docker is essentially a daemon that makes it really easy to run containers but it's not actually what runs the containers so at the end of the day containers are really Linux processes they're not you know they're really two things there are Linux files when they're not running because when you pull them down and export them they're nothing more than tar files and when they're running they're nothing more than processes that just happen to be as I always say fancy they're just fancy processes that have a little bit more isolation and they are sandboxed in a way that that they look they have the illusion or the the you know extra isolation to where they look as if they are running on their own system but they're not and so at the end of the day they're really you know what what usually interact with is user space libraries to create these containers so all definitions of a container that we have now alexei lxd docker rocket these are you know cryo so we have cryo I don't know how many of you know about cryo but it's a runtime that works in kubernetes all of these have their own definition for what a container is and so for example if you're running docker and rocket and you do process list and docker in a process list and rocket they will not show you each other's containers because those are user space definitions that each of them hold and they keep track of but at the end of the day they're using like Lego blocks in the kernel to create a container and so a container is this word that we use but it's not really a thing it's it's defined at a higher level and so you either have a docker container running or a cryo container running or a Runcie container running or a run v container running which actually is a vm etc etc so it's important to understand that like basically all of these libraries are kind of so so this drawing is really important to understand the user comes in they talk to they either create a you you know normally you would type a command on the on the shell and a process gets created right it could could be created with cgroups or su Linux or maybe not it depends on whether you have those things enabled but with a containerized process you typically come in and you go through one of these libraries you either use Alex C or a system to UM spawn or Lib container or libvirt and this is if you're a programmer right maybe you would do some cryptic you would like program some code that would actually talk to one of these libraries that would create a container and then the typical typical technologies that would be used are something on lines of namespaces C groups and selinux and secrets and SEO Linux have been used for a long time what really changed with docker was an easy way to access the namespaces so using the clone system call instead of a four core and exec and so in a normal shell when you type a command and you hit Enter it either execs into another thing or it Forks into another you know process and so does everyone understand like basic unix internals of fork and exec so if you need more I will I will answer any questions so now now like going a little bit deeper into what most people have probably you know been exposed to is when docker four and a half years ago got famous it made it really easy to create a container and so then we all started to internalize this concept of oh I just use a container but really like this is what it really is so I don't have my glasses on but but uh oh they're there all right so so like if importantly I don't have a laser pointer but but if you look at this box the big box you know you'll see it's dr. D container and run see so really it's three different processes that are firing off to actually create a container and docker D is a demon container D is a demon that runs and then run C is actually just a process that gets created that actually creates the container run C is what actually talks to the kernel does the clone syscall which then creates the namespaces and run C also is smart enough to go create the C groups and NSE Linux actually depending on how things are configured lib SC Linux creates you know a new context and it's hundred setters so there's run C is actually responsible for that communication between the kernel you know between the user space and the kernel but there are actually two other demons at work and so like most people don't understand that this is what's happening behind the scenes and really all of these things are running in user space and you'll see container D actually not yet this is actually a future as of the last I checked a future architecture the logic for pulling container images is actually embedded in dr. D right now but we'll be moving the container D if it hasn't already and then running a run C is just all it does is it takes a config file config dot JSON and a directory and you literally pass it a file in a directory and it fires up a container and whatever is in that config file which the root the runtime dr. D and container D basically create entries in that config file or in the case of cryo it's the same thing it uses a config dot JSON and then makes run C basically just gets called with this config dot JSON in a directory and it creates a container but I don't think a lot of people I think there's probably a very small percentage of people that have real good clarity of how that works and so then you're building upon this even further you know in this one I still kept the docker D this is what a typical kubernetes environment looks like right so so now we've added some more user space daemons we've added the node which is responsible for going and talking to dr. D and we've added a master which is responsible for talking to the node and it for a fact I've added system D to show you well system D actually starts you know the doctor daemon OpenShift node ovitch of master and etsy d and so those are the four things that you can figure on a system to actually start and that's kind of the entire tool chain you know that's what that's what a fully configured kubernetes environment looks like or an open shipped environment case which is just a distribution of kubernetes that's what it looks like in its full glory and then again it starts containerized processes and then there are other regular processes running on the system and then the containerized processes use these you know technology in the kernel but these ones don't necessarily and again container D goes and pulls the image or cryo I have a drawing where I kind of show cryo replaces dr. D and container D and imagine that as one box and cryo goes and pulls it uses a tone library to go pull the images and then expand them onto disk but that's kind of its full glory does that make sense to everyone any questions about that because that's actually alright a nerdy suing the latest releases of docker for sure it has container D but they're moving functionality between content dr. D and container D so dr. D is becoming more just the API endpoint and then container D is doing the logic of pulling the polling the container images down expanding them onto disk and then creating that directory that runs C needs to then go create a container sit at one more time that's correct run C is basically Lib container nowadays so it is basically live container yeah no other problem with this a lot of confusion is is that this is changed very quickly over time and so four and a half years this has changed immensely the original architecture was docker just did everything docker D did everything and we slowly broke them out into different pieces so like Lib container was the first iteration it actually yes I believe that if I remember correctly it just used Lib Alex C the first at first it didn't even have Lib container so it actually used an existing technology to go create to go talk to the kernel to create those containers then they broke out the logic and kind of made their own ones because they wanted flexibility and I don't remember all the architectural reasons why created Lib container then at some point the world changed and we wanted to create what was called the OCI where we want to have a standard runtime for all a standard way that we pull images and a standard way that we explode those onto disk and then a standard way that we run those and so like I have some other drawings I won't go deep into them but but that what drove the creation of Runcie so he's so dr. Inc separated off run C as its own thing contributed it to the open containers initiative and now run C is kind of that middleman it knows how to take that explode that config dot JSON on disk and the directory and then turn it into a Linux process in a in a very specific and particular way that's governed by that o CI standard and so that's important because all the runtimes are basically using that now so cryo uses run C and dr. uses run C and I don't know if rocket uses one C I have it I haven't kept close to it I believe it does or can anyway so does that make sense any other questions all right so all right so now I want to kind of show you this is what a full you know this is more of a here's how it would look in production right so you would have multiple masters so here's how here's here's I'm gonna walk through this so this is how a user would come in and create a container write in a kubernetes environment its multi-node so you wouldn't just come directly to a node and create a container you come to the API right and in an open shipped environment the way our installer works is it creates an H a proxy that actually will load balance between multiple masters those multiple masters keep track of all the different nodes with that CD and actually a TD keeps track of everything in the career day's environment and a node just happens to be another object inside of kubernetes and then each node looks like this so each node would have its own doctor D its own container D its own instances of run C running there would be multiple ones for each for each containerized process and then some of the nodes would run a registry server you know maybe one or maybe multiple ones it depends and in this scenario this is an old drawing I used NFS the new ones use Gluster I think out-of-the-box eases Guster now and and but but either way it's the same concept at the end of the day the backing storage for the registry and OpenShift and whether you whether you use open shipped which kind of is opinionated about how to create all this stuff and has an installer that shows you even with regular kubernetes this is the stuff that you would have to set up yourself to really create a production environment you would have to set up your own basically kubernetes gives you sort of this this and this you know but like it doesn't necessarily give you that and that so like and it doesn't necessarily give you a way to add leet nodes and like like some niceties basically and then also OpenShift does things slightly different way which we'll get into in deeper like like any Linux distribution kubernetes is you know kind of like Linux in a lot of ways that there are a lot of different distributions of kubernetes and so like where you put config files and you know things like that can be different between distributions and so Red Hat's OpenShift is kind of similar and then finally I just want to walk through like I think this is another point of confusion people don't understand I tried to do this kind of super drawing of like what is openshift versus you know what is governed by the CN CF and then the OCI which is actually part of the scenes part of you know all these are part of the Linux Foundation but but you'll see you know container D fluent D kubernetes and the technical Oversight Committee which is a group of people that's not a piece of software are all part of the cloud native computing foundation right and so they help kind of govern and manage you know these projects and there are a lot of technical people that are involved in this TOC is some from Red Hat some from docker from core OS from from a bunch of different companies I mean all basically every company you probably heard of are involved in kind of driving those you know I mean there's a ton of people that have joined this at this point and then and then the open container initiative which is another important thing that you should know they govern the runtime spec in the image spec and they also release a piece of software called run C which is an implementation of the runtime spec and the runtime spec is what takes you know basically knows how to take that directory in that config file as I mentioned turn into a container but there's also governance around you know how you communicate with a registry server and what that format of that container image as you pull it down which is actually a bunch of layers it's actually not an image it's actually a repository of a bunch of images how you pull that down and how you expand it on disk and present it to you know Runcie basically and then OpenShift has you know it's up streams are essentially all of these things right there's Kevon elasticsearch open V switch which we use in rel you know as kind of the underlying we create a flat network that's virtual in an open shift environment that's again something the Installer does but there's more you know there's more to setting up a kubernetes environment getting to this requires more than just you know just kubernetes like you know there's more there's the Installer there's the configuring the network there's a lot of things there's logging etc so does that make sense to everyone any questions all right so with that I'm going to get us into I want to kind of I'm gonna get us into the lap we're gonna do the first lab and for those of you that have the lap guides you're just gonna do a real quick thing that everybody else won't know you're gonna put your stickers in and now you should know which order those stickers go in because I've presented it to you and you should understand that architecture and so hopefully you'll all get it right I'm gonna check as you guys are going out the door and if you get it wrong you won't get a gold star but but I'm gonna give you the URL so so this is the one that we want to get into so we want to get into katakana dot-com / father Linux actually if you can just get there and you can click through so I'll show you like if you can just get to cata Kota comm / father Linux you'll end up getting to this page and then you can just click on introduction to open shift for developers and then you'll get this page and then you'll see very obviously we have lap one two three and four we're just gonna do lab one and then we'll break we'll probably do like we ran a little bit longer than I wanted to so well we're about right where I want to be we're gonna give you about I would say like 12 ish minutes maybe 15 minutes to get through that one right 15 will do 15 I shall set a timer so you to get into this environment you have to create a username and password which is described in the in the lab guide but here let me just show you real quick I'll log out so that you can kind of see and then I'll start the timer so if you do not wait here's what will happen if you go to catechu TOCOM slash father linux so you'll go here it'll show you this you know you can click on intro to openshift it'll show you this and then you notice it doesn't show that this is completed anymore you click on this scenario and it will bring up this page and so you're welcome to just use a throwaway email address and create a password or you can use or get to sign in you know you could sign up for free but this takes two seconds if he does use github I I just use my github because I was already developing in cata Kota and it's pulling stuff from github anyway so for me it made sense to use github but you could sign in with anything and then once you create an account here this actually gives you access to anything that's on cattitude coded comm and they're all free tutorials so and we can walk around and help anyone that has problems with that but I'm gonna start a timer for like 15 minutes for the first lab and then we'll walk around and help people but it should be very active oh and one other thing so okay actually before I turn you free one other thing I want to run through is I'm gonna sign in here and then I want to show you something so we're gonna skip the video we don't want to do the video on the first page because I basically just presented that so right here's a video if you get home and you want to do this entire thing you can go through you can watch the video it's about the same length as what I just presented and then you hit start after you watch the video and then it will bring up it will bring up a virtual machine on this side and on this side is kind of where the lab information you know where you read through this and then I just want to show you that it'll take a while for this to start but like each of these commands you can actually just click on them and it'll actually automatically put them into the terminal and type them so you there's no typos or anything honestly this should go pretty quick these first ones are pretty easy and so although it looks like some of you're already getting in because this is taking a while I don't know I've never seen it take this long yeah so we're gonna have to wait see how fast katha code is see it one more time yes open yeah Oh ship this the question is is open shift open source and yes it's like everything else Red Hat does everything is open source so the upstream of open shift is called open shift origin and open shift origin is built off kubernetes and all of those other things I showed so like qivana and all these other projects it's think of it as a distribution that pulls all these tools together very similar to like the fedora pulls things together and then openshift open shift container platform is built off open shift origin an origin is very similar to fedora it changes quickly and pulls all those things together and kind of you know proves it out make sure it works and then our enterprise distribution is open ship and this this lab here if it ever comes up is build off open shift origin although now I'm getting worried [Music] yeah yours is working okay has anyone got in okay good so only mine if not just refresh this use I'd fire remember right I think he's using Amazon on the back end so it should try a different VM if like this doesn't work Red Hat's paying for the cost of it so I don't care just create another VM no mine's really hanging no huh let me try this completely from scratch this is a good stress test for summit no I walked out I think it'll work now we'll see I've had to do this before doesn't work it's not working for you either I can't get any alert it did come through okay I was worried about this many people connecting do it once there there is some on the networking yeah in there in a later lab I think under the single host tool chain we dig in a little bit I have more material I want to add actually around the way kubernetes does it cuz it's it's interesting and I think it's important to understand and know but I don't haven't added it yet I will eventually add it you can you can you can bug me on Twitter and say hey go create that content and under social pressure I will do work this thing is not that comfortable how many of you are still waiting to get in and a decent amount for seconds for you what let me try another one I just don't see if my browser I know it's not the Internet for sure it's not that he's asking for five months what five minutes for what for what it was supposed to go till it's supposed to be an hour to half I don't think this is right just when I thought the lab couldn't go any worse we get kicked out yeah that's true can always get worse if there was a fire like right here I mean the big fire not a little one I would run away but oh here we go this one got through so probably just reloaded a couple times if you're still waiting if you're impatient cuz I'm the other one I got got through that's what I'm thinking yeah what it takes two times to do what with the container yeah that that could slow it down but I don't these are all separate virtual machines in Amazon's so it should work like it shouldn't be broken but oh well well I got one through but either way I want to show you I'll show you guys for those of you that are dyin but like basically what we do what we're gonna do here is like you know you literally just click on the command and you know then it'll basically type it in you'll see this is pulling the image down it's gonna run in it is definitely going slower yeah so we're all getting crushed so apparently having this many people is tough for cata Kota yeah yeah that's high probably yeah it's because it hasn't completed yet you probably it probably somehow so there's an interest yeah but I fixed that so it is the right script and it should be working on watch mine well it should be the path but it should work because there's it there's a script at the beginning that I copy it into user bin what I'm thinking is is since it's so slow the in the the the there's like a I could show you in this but there's there's basically now the problem is is it's not done configuring itself yeah on the back end it's so slow some of the steps haven't completed yet yeah that's what I think's happening so here you can see it right here in intro OpenShift in container internals lab 1 this file right here is what configures that VM as it comes up and so like since there's so many of us doing it you see it copies this script in the in the user bin I think it just hasn't gotten there yet even though I put it before all these ones so it may be the get clones or not well I don't know what's happening they're not done yet yeah yeah I was fearful that this would happen yeah can you make the can you make cata Kota work butter no that will not help because they would have to cut and paste this and connect a bunch of people and it would be a nightmare no the problem would be we'd have to describe to everybody how to get in and it would end up not working yeah they're all on different networks and who knows what and then to the commands are all pro like these commands are configured for only this environment I mean there's a ton of configuration to this environment on the back end there is a ton of configuration of this environment to make this all work like right none of these commands would work on a generic oh but you've cluster I'm thinking that might be the case I'm thinking maybe we just have everybody get out of cata coda would we let's have a vote would it make more sense for me to just run through them and show them to you as opposed to having everybody try to do them is it anyone massively against that all right yeah yeah I don't know what you mean see that one more time no yeah I think I think the problem is it's so slow that the configuration isn't finished yet because there's so many of us into it so I don't know hot I guess everyone hit logout like get get to the logout part and hit it because it seems to kill the VMS then kind of I think so let's try that lets everyone log out all right is everyone logged out did you click on logout don't just shut your laptop because the VM will just keep running all right now let's see if this works this is an experiment it worked so beautiful all day as I tested this part is so fast ah that looks better it would but I have it so so I have this under my profile I have two versions of this lab I have one that is like a github repo but you would have to create your own openshift environment and it it walks you through the same things it's the older version but you have to like basically set everything up and make it work right it's it would it's a decent amount of setup to get out and OpenShift working the way you want it you know to make it work for this lab but this all when it when it works is beautiful because it just all is already configured the right way and you don't have to mess around honestly if you do this interact if you do this by yourself later it will probably just work because there won't be 50 of us all connecting at the same time although this still is not it's still slow we may be falling back to plan C which maybe I can present more material and we can interactively discuss it and then and then we can do the labs we can do the labs later on your own and then if you have anything in the lab that you don't understand feel free to email me or tweet you know whatever whatever you know basically call me a - you have my phone number now remember it's +1 to dial the United States no collect calls though please yeah it's dying I don't even know what it's doing anymore it's definitely dead I'm gonna go I'm gonna go complain alright so so alright so you know what we'll do well dig in to the next lab and you know we'll talk through some of this stuff so I have enough material that we can probably fill out I mean when does this go till it goes till 45 right after so five forty-five all right yeah that's just about perfect actually now that we've killed a bunch of time so so container image is the next thing where I dig in actually actually it's single host toolchain really but but um and container images so I get this question all the time and and I have different versions of this drawing that are easier and more complex depending I'm hoping that since you're in a container internals class most of you are pretty technical you can handle the harder version of this but at the end of the day you know like I would argue in old you know in a traditional environment whether it's actually virtual or not you can you can add or remove this bottom piece if you want but you know most of the time we optimize for agility just with the application right I mean you know it's dependencies in the kernel we optimize for stability and those are two competing engineering paradigms so you know Red Hat for example with fedora moves very quick every six months new version no problem well it's ten years thirteen years we back port changes that's a ton more work right because it takes it takes a ton of work to back port changes but the nice part is every time you run a yum update in rel it works not every time in fedora doesn't work you know not every time in any operating system I've used actually there's two that I've never been burned by I'm a Red Hat person but I'll fully admit I've never been burst burned by soos Enterprise Linux I've been burned by every other distribution of Linux that exists they're mostly optimized for speed not for you know for stability but but in a production environment you know typically you would just you would you would use something like rel or sent to us or soos or something and it would move pretty slowly but and it would backport changes but but you know only the application in a containerized environment since now this is basically what we have in the container we have the application and all the OS dependencies which you would run through in the lab and see when you do like LEDs on binaries and look at all the dependencies which a lot of people that have forgotten UNIX and Linux probably don't remember you would see how these OSS dependencies get bundled in in the operating system inside the container image but then in this one I show kind of hot here's like multiple container hosts they can still run as vm's it doesn't matter right if you run OpenShift out on AWS there's still VM so it doesn't matter either way they could be bare metal or VMs and so this is the part that we're gonna dig into so if you kind of think on an operating system is this stack of stuff there's user programs those user programs are linked against libraries interpreters you know depending whether it's a Python script you know it relies on an interpreter if it's a C program it relies on other libraries those libraries G loop see a lot of people don't understand this this is a nuanced thing that I think we've forgotten about over time is G loop C is the standard set of interfaces that we use for system calls it defines essentially what the the set of API calls that we can make into the kernel is those those are all documented so like file open file read you know the system you know fork exact all of those and a lot of you should know that if you've programmed you've probably used these functions maybe never fully understood that they're actually part of the interface to the kernel and those are called system calls they're special they're different than like in dotnet where there's other higher-level things where it's like get stream and like all these other functions that are not part of the core you know piece that you know you on the colonel for these are published and and and documented in julep SIA and so only the ones that are in G Lipsy are considered the public interface there are other secret ones that I don't want to call them secret non documented ones those are liable to change they're not necessarily you know governed by the same kind of stability requirements as G op C although leanness does beat people up if they change anything in that syscall interface but but a lot of people I think have forgotten this you know like this is knowledge that it's kind of common in the linux world or at least deep you know people that really understand unix that come from that background but we've forgotten it with containers and we just think oh well I'll run a container it subtracted it'll just work and you're like well it depends you know when you run a web server a web server probably uses we had a discussion two days ago bird of a feather where we talked about how many syscalls does a web server use I don't know 20 30 50 it doesn't use the 300 that are part of this interface you know I think I found it in rel it's like 381 or 387 system calls that are documented and so so you know we don't use most applications don't use every single system call but if you think about what an interpreter is or even a bash script you can actually execute any system call it's what we call a turing-complete problem so a turing-complete problem means that until the program is running it can run interactively and literally imagine if I wrote bash bash is essentially turing-complete problem because the inside of bash you can execute anything you want and so anything that the user's imagination can come up with they can even run undocumented sis calls they can you know they could write a little C program that basically puts the kernel in two different modes I mean you can do anything you want so we call that a turing-complete problem when you all you're running is a web server that is not turing-complete there's a finite set of sis calls that that code can make but when you write a form in that in that web server that then allows somebody to type commands in and those and the web server tries to run those commands you now introduce a turing-complete problem and so so like there's a balance of like of like most of the time things just work if you put them in a container but you do have to be careful about mixing and matching user spaces and kernels and also as workloads expand beyond web servers you'll end up with a problem of there are certain things that access /proc and slash sis and expect things to be in a certain place and so if your app is funky or aka not a web server like say it's an HPC application that needs slash sis for something because it's trying to access something in some funky way it's some legacy thing you could end up in scenarios where you're you have very incompatible container images from the container host if you're not using you know like like rel seven on rel seven and so people kind of need to think through this entire stack as they're building their applications so going going further more into the into the actual like image side there's a ton of nuance that I think again a lot of people haven't captured all the way in their mind we do a doctor run and it just seemed super easy right you go doctor run rel seven bash and it just works its magic and you don't really think through what's actually happening the OCI and docker you know basically docker which became OC I basically the image format kind of has the concept of these tags and there's a bunch of different layers in these in these repositories we refer to these as container images we say the word container image all the time but it's not an image it's actually a repository and it's basically layers and tags and these tags we usually use these tags to represent a version of the software because it's a natural thing to represent in the layers and the tags but that is not necessarily true we could have two different configurations we're gonna have configuration a and configuration B and configuration a could use version 3.0 and this could use version 3.0 the the concept of using these layers and tags as versions of the actual software inside of the container image is purely like a de-facto standard it's what most people do it's not necessarily what it's mandated to do and it's part of it it's not part of the image spec in any way shape or form so this will come up as you roll this thing out to developers and people start to argue about what you should and shouldn't use tags and layers and images for you know essentially repositories and how you should break them down in fact we had this internal debate at Red Hat when we were building our our registry and people were trying to figure out how well do you put rel seven in a separate repository and actually I'm getting ahead of myself but so so the next piece of it that adds to the complexity right this is what a URL looks like you know you you pull this registry texas-style right comm /rel 7 / revel : latest so latest points to whatever the dot release of rel is and and then rel seven is the major release and it's part of the repository or the namespace I'm sorry and then you know and then of course you pull all the images from Red Hat's registry but these are like their arbitrary definitions they're not mandated by anything like like and so you have to remember that when you go to architect your own systems because because it seems so easy when you go to pull it like instinctively you just look at the URL and you understand what that means but it's not intuitive when you get somebody in your group that goes and uses it for something else and you go why did they I'll give you an example I go to docker hub and rels are sent to us 4 5 6 it's I think it's 5 6 & 7 or maybe 6 & 7 are all in the same repository that seems like insanity to me like that seems like something that you should never do like in my opinion you want separate major releases in different namespaces because you don't want to run docker run cent OS and one day it's just sent to us 8 and everything you had just broke because you had no idea that I was gonna roll to version 8 on Tuesday like I mean that's crazy but it's how it will work right now like the way it's configured on docker hub when said to us 8 comes out your stuffs all gonna break so like you really have to think through this stuff and really understand what these things mean and it's so easy to use but it's so hard to design a new system using the same tools you have to understand it like five times as well as you understand it you know when you first start using it and this is all from internal arguments and debates that we've had the next thing that that like kind of getting deeper with images yeah so these are all different tags that represent different image layers but actually something that don't show on this is you can actually have like image layer image layer tag image layer image layer tag image layer image layer image layer tag there's a bunch of unnamed image layers in between the tags there can be I should say you get all those layers pulls all the layers down and they're all different they're essentially blobs of data and they all there's a JSON file called a manifest and you pull that manifest down the docker daemon looks actually should say container D whatever library it uses to pull those basically looks at that says oh I need this image this image it crawls this and basically says oh I actually doesn't even crawl it it actually looks at the JSON file and says Oh to create this tag to build up all the layers I need for this tag I need to pull this this this this this and this is and it pulls us all down and that's what you're seeing that little bar go across all that magic is happening in the background for you then once it gets on disk it's actually exploding it out into a directory creating a config file handing that off to run C and then it gets ran and exploding it on disks uses what's called a I'm getting ahead of myself because that's actually the next thing but that uses what's called a graph driver and that's that translation between all those image layers into a single directory on the system and all that's happening magic right you don't even know how that happens but it's happening every time not the whole history only the path that builds up to your tag that you've pulled because there can be dead-end branches and actually sadly I show you that in the lab so you can have a tree structure of stuff and if you if you traverse down this tree structure well if you pull a tag that's down this tree structure docker is smart enough and so are all the libraries that pull images they're smart enough to only pull that only pull the set of things that you need to get to that one they don't pull the other ones necessarily so like you only pull the layers that you need to build the tag that you've decided that you want you can also randomly pull all there you don't have to pull a tag that's another thing people don't realize you can pass it at this big ID and it will just pull only that thing and I'll pull down to an image and then when you go to run that it can be completely broken that's another thing that people don't realize you tags are a way for you to communicate to the end user as an image builder here's how I think you should use this image or this repository you're like if you want to use version 4.0 call the four dotto tag if you want to use version 4.0 one use that and then latest always points to whichever the latest one is which again kind of insinuates that hey we should be using version numbers with these tags but it's not necessarily the case and then any image layer in between could be a half-baked image that kind of half works like you know if you ever look at a docker file every one of the lines basically builds a layer and you could literally like you know if you kind of look at a docker file and say it says hello you know puts a user in there adds a user adds a user and then does some other stuff you could pull like here and the softer is not even actually copied to where it needs to be in the image yet so you could pull down a half broken like shell of an of a container trying to run it be inside of it with bash you know and look around and it won't work it's like a non-functional image that's very possible with docker a tag is just tagging to a layer that already exists yeah you those tags are basically for specific layers that already exist the layer and the tag are two different things basically every layer has an ID a tag is a named idea that's a one that you expect a human to use you're essentially communicating hey this is a layer that I expect you to use does that make sense it's kind of like github I mean it's kind of the same thing you wouldn't just pull except that we we you could pull broken things from github in the exact same way if somebody makes commits and you're like halfway but they need to make a few more commits to get to a point where it works again and then somebody pulls like just you know check you know just know and now we always pull like again the latest one which should point to something that actually works Oh another question yes I can I'm sorry hopefully that was enough description that your honor I don't know what he asked anymore I forget the question is yeah yeah I have I can actually demo it too with Doc vis it's in the lab to show you with a tool called doc this alright so let me repeat the question the question is not a full question by the way but I will repeat what you said you know and what I think you are asking we said there's not multiple namespaces correct so alright so if what he's asking is since there are not multiple namespaces like there's only one if you go to like docker hub like like how would you represent you know like other things right so that's again one of those things it's kind of like a de-facto standard like if you go to doctor dot IO there's only one namespace if you go to Red Hat's registry server there's only one namespace there are a lot of registry servers that allow you to create multiple namespaces internally which creates a whole nother cluster of problems because now you know developers or whoever architects may want to create arbitrary meanings for each of these namespaces and and you're right they're not there's no debts left on your shoulders to figure that out which can be a pain in the butt I mean I've ran into that so does that makes I mean does that no no yeah well no layers is different registry or namespace having multiple namespaces this way is different than this way I mean we have I mean any to any doctor registry has can have different namespaces you just can't have multiple layers of namespaces where it's like /r l /r l 6l 6.4 /r l I mean you can't keep adding slashes but you can have different namespaces this way so you can have you're gonna have red hat comm /r l7 red hat comm / r l6 you know and we have that now and so does every registry has that dr. Deyo has that by default they only have if you go to doctor dot IO like um it's hopefully this works I thought that's how you did it maybe I'm doing wrong hub dot doc calm slash sent to us I know it's I think it's you need to thing in it I think you need I'm gonna just go to it cuz it'll be using me farting around but uh so if you look at the URL I thought it need that slap it needs this underscore so this is a default namespace right and what that means is it means basically go to the repository I just and and just you know like like like always use that same repository the problem with that right there it is that look at these tags so like six six seven six eight seven one your when this rolls to sent to us eight you're screwed like you're getting sent to us eight so like in this scenario I really recommend using a tag because if you don't use a tag you're you're in deep doo-doo does that make sense to everyone you should do that anyway but you're really in deep doo-doo because sent to us is only compatible with in you know minor version it's not major version but they could if they chose had they could choose to have not just a default namespace but the but the but but have multiple namespaces that define different major pieces of software again personally I think that's a bad architectural decision they wanted simplicity publicly in your private registry you probably care more about clarity than having weird easy-to-use things like that for marketing purposes you know sorry all right good yes correct yes exactly and if you do it with red hats registry you're less screwed although I would still argue never do this because it's a bad idea you will still probably get screwed but but you are less screwed because we have a rel seven namespace and so the worst thing that whatever happens if used the latest tag you would go from rel seven four to seven five and mostly as most people that have used rel it's pretty stable like like that user space doesn't change that much there's an api api compatibility guaran guarantee it's not a guarantee I always get yelled at not say that but but essentially we publish a document that says we attempt to maintain a bi api compatibility and all the you know in all of the Group one I think they're called or users based tools there's we have different layers of user space tools but all the core ones like GFC things like that are very very stable so the chances of an app breaking with this or even if used the latest tag are way less than then if you have major versions in the in the repository but this is something to think through yourself you have to decide how you want to architect that if you're building your own environment but it's it's more than I think again it's so easy to type docker run but then you when you go to build this stuff yourself you just willy-nilly build stuff without thinking and then things break in strange ways because you didn't think through it so I I'm just highlighting problems that we've seen you know basically any other questions alright so and not the next thing that always comes up going deeper into images is people always like like this is one that I run into every customer I talk to they always ask so like now the developer just controls everything right and you're like well not exactly I mean if you think about it a user space in an operating system is a bowl of soup or it's like stew of soup right like and we had always argued about what should be in the soup you know we would argue like should you have garlic should you have more pepper or should you have salt developers and and sis admins argued about this I know that's my I hear this stupid thing I think that's my uh no I thought that was my thing they high-pitched noise but it's not but but basically you know it's always been a collaboration and user space and so if you think through what configuration management is it's automation to help you configure the user space if you think through what yum and rpm are they're automation to help you configure the user space and getting more and more advanced tooling right so so rpm is kind of the first one then you have yum which manages dependencies think of rpm as the set of things for them to transfer you know for the package maintainer to transfer knowledge to you right here's how you should you know install this thing and run it and then you think of young as here's all the knowledge of what it requires to install itself then you think of ansible as here's the stuff that we want to add to it that actually makes it run the way we want and so like there are other tools to build images docker files become that but really this is all about user space collaboration and controlling what's happening in that user space Joseph because you put it in a container does not change that at all everything has changed and nothing has changed you still end up with this problem where we're historically middleware people want to do crazy stuff they I mean every job a person I know is like pulled down a tarball ran it works and you're like okay that's not that doesn't really make me happy as a sysadmin like I'm kind of like that's crazy but they're like yeah I pulled down six different JVMs they're all running in like /user like why did you put in user you're like that's she's going like ophtho at a minimum or like in user local or something whatever either way this thing still happens even in a container you're still going to get into this argument and then you know and then the app developers are just like whatever we have a war file we don't care right so like the nice part is with a container image you at least now are speaking a standard language right like if the operations team says hey we're gonna pull down the rel seven image we're gonna modify it in a certain way add some stuff like whatever the stuff is security stuff scanning tools whatever they want to add to it maybe just simple things like Lib SSL or whatever um you know they're gonna add that stuff because this is a this is a concept of d ry like do not repeat yourself right so like if you're gonna have G Lipsy you don't wanna have different versions of G Lipsy and every single middleware build so like if you have a ruby image and a Python image and a pearl image you don't want them to all have the same copy of you know different copies of the same stuff you don't want them you don't want to install like the SSA libraries like three different times with three different versions and have everybody have their own version if possible because that creates massive problems and it's dr y it's a dry problem so now at least whatever the operations team goes and builds then maybe they do with a docker file maybe they do is something like ansible container which helps build images in a way very similar to dr files except using ansible code which again is a collaboration of changing how you change the user space they you know maybe there's like rel seven - poor build and that's actually what I create like when I have some demos that I do maybe that's what the operations team puts in the registry server lets everyone use that says this is the single source of truth we've added all the stuff that our company wants like all the magical stuff that records if you type commands or does whatever weird stuff that your operations team wants that makes your life either harder or easier and then hands it off to the middle or team in the middle or teams like well we pull down our tarballs you're like well make a doctor file that pulls down those tar balls and then also oh by the way the fact you're pulling that from an external red you know web server is not gonna fly so like make sure you copy all this stuff locally blah blah blah but at the end of the day you get to now negotiate that stuff and at least it's all in a like it's in a language like either ansible or docker file or whatever you end up using to build that container image at least now it's codified like it's at least codified and now you have a single and then again the next time you that maybe you create a standard J via a standard you know web application server that's Java based maybe you create a standard Perl image is a standard Python image and then you have these experts in like either databases or Ruby or Python help craft that image the way you want it and then everybody that has a ruby app uses you know that version of it or you can branch as necessary and you should use layers as necessary but you shouldn't think that just because you have layers like the entire problem of collaboration with other people goes away like that doesn't go away magically with containers you don't just let the end developer build whatever they want into the thing and then just like do whatever they want then have 50,000 different permutations of images in your environment you don't want that like you know that's still a bad thing so and then the other the next thing that people always get confused about they're like well okay we'll just create 50,000 permutations let developers do whatever they want and then like how do you solve this problem so so say you say you create 50,000 different images all with like different versions of lib SSL in them and then one day the security team says oh by the way we need Lube SSL it version whatever XYZ and you go okay well developers have fun patching your 50,000 different images like I mean like that's never gonna fly right like it's never gonna fly in a production environment so at some point you hope that you have this model set up so that the operations team and the different middleware teams I call Perl and Python middleware which a lot of people get mad about but whatever Java people really get mad about it but uh but I consider anything that kind of sits on the OS that doesn't exactly do anything yet that needs you know an app to run kind of middleware you know basically at some point you're gonna want like a cascading version of this where maybe you have ten different types of middleware but when you update the core build all the middle to get rebuilt and all the applications get rebuilt and that is really painful for people to understand I mean but you have to crystal-clear understand this like you have to be ready to rebuild this stuff at any time and it has to have cascading builds so every time okay so the question is would all of the tags get rebuilt if you change the core image and the answer is no so you would end up building a new tag that new tag would be a new version right so like so like you wouldn't go back and Pat I mean you could set up an environment they would do that but that would be insanity and the chances of making that work 100% is probably near zero so so you know imagine if you had a five-year-old application and you're gonna try to rebuild every version of that application off the new image I'm almost guaranteeing that there are time constraints in there where things will get misaligned and they won't work you would prop the way I've done it is always just built the latest again like so build a new version so like it'll be four dot 0.0 - 135 you know how RedHat does build numbers we do like if you have bash you know 2.6 dot 5 - 275 or if you look at the kernel it'll be like 2 dot you know or three dot 10.7 or whatever I don't know what it is right now you know - and it'll always have a build number I would just use a build number is what I would do and then always roll the build number forward and then this gets into the infrastructure that's next is necessary for CI CD you have to be able to do that you have to think through this before you can get to see ICD it is something that OpenShift can do as long as you use built configs so I have a demo actually I was gonna do it in the lab there's a one where I pull down a github repo that I've built that shows this working you use cascading builds so built configs are an object in openshift which are another kubernetes object basically and they have literal triggers and they interact with what we call image streams and it's a lot for me to explain without drawings but but basically image streams are kind of a spiderweb and any time an image changes it can trigger other things to happen and it's a way for like event-driven automation to happen inside of openshift and so whenever the core build gets built it will send off a trigger say hey core builds been rebuilt go rebuild all the other things that depend on this core build after you know all those things get built all the applications that are built on top of those get built and you can cause this cascading wave of images to happen of course you would never want that in production you'd want to do that in a Devon environment where you rebuild all the images make sure everything works have smoke tests have CI CD that ready run the app run all the smoke test for the app make sure all those apps work but people don't think through this like like they think I'll just move to container images and it'll stay static forever and I'll be fine but that's not the case in fact it's it's gonna be more painful if you haven't thought through all this stuff because because you're gonna end up with a bunch of different images that are all stuck in some specific state for a long time and and they could end up with security issues and nope you know two years later the developer doesn't know how to rebuild it yet and if there's no tests you're gonna be in a really bad state you know you're gonna be like firing up a version of the container manually patching it than saving it an exporting and trying to put you're gonna end up in that same crafty you know crufty way that that you've had on the old UNIX system that nobody wants to touch yep all right I know your question so I'll repeat it so he's saying so if you build a new version of the core build how will it and you tag it with a new version and the developer did what they should do which is use a specific tag CEO Alexei was 1.0.0 and you roll the core bill to 1.0.1 and the developer has specifically used 1.0.0 and you the cascading build will never happen right well it'll happen it'll just rebuild with the old version because the built config will trigger a new build and all those cascading built will have them but you'll never get a new version that is true that is a problem so you have to you have to think through that so the core build will definitely want to use definitely want to use a tag to only pull the specific version from Red Hat that you want right but maybe the middleware team should use the latest tag and maybe the developer should use the latest tag to cause those cascading Bilt's to happen and so if they don't want it to happen then they can call attack but then they're on their own right so now it becomes policy you're like okay fine if you don't want to use latest you don't trust the internal supply chain of software go figure it out yourself sorry this guy's first I don't quite understand that the question yeah I I kind of understand what you're saying there is where what image are you pulling can you give you the actual example okay is it something you pulled from external but basically you're extending an existing image and when the base image change all your limits its engine is two decades so you are using our base in material stand in your container with your own data whatever you were to apply then and when this base image change you need to rebuild it it is the change this is the case okay show me so I know what your talk I think I I think it clicked for me I think I know what your question is so I'll repeat it then you tell me if you think this is it so whatever configuration file they used in their docker file they didn't deliver that with the docker file all you have is a docker file and you see that they're pulling some external configuration file but you don't have access to that stuff okay because that's one I've seen we hit we actually have that problem with software collections where we don't actually deliver to you some of the missing stuff so if you go to rebuild the software collections yourself you will be in a world of hurt because you don't actually have all the stuff that you need to rebuild them that's a common pattern problem that I've seen yeah yeah no worries we can chat offline to you so that's the end of that one I mean that's that is images right so like I tried to tackle some like pretty deep problems but like the idea is that like this stuff is so easy to use right but like you have to go back to all the UNIX stuff that you understand like the Linux stuff doesn't change you still have a dependency manager like yum you still have package manners like rpm you've now added new tools like docker and bilder which is when the Red Hat has ansible container things like this to build new images but all these business problems that you have which is basically how do people collaborate to build a user space the way you want is still it still exists inside of container videos sorry go ahead yeah yeah in fact I can do I can demo some of it if you want is it working fine all right sweet there it is okay mine never started hopefully this will start now yeah oh yeah it's way faster so now you guys get to see what it's like now alright see that's how it happens normally which is actually really nice actually you know what I don't the problem is the question you asked is like what happens when you pull an image I want it I have it in a lab I can't quite remember where it's at is the problem I think it actually I know I know it's not here hon it's in it's in I just walked through the lab to stuff so I know it's there so so this this lab actually goes exactly through what I said so how can say the image isn't pulled yet but what's pulling image and we'll do a little experimentation on it just to show you what's happening see how much faster that is oh god this is better don't all get in it yeah in fact if you get into catechu did you notice at the beginning the videos are right there they're embedded inside oh I don't think I added 3 & 4 yet I think twos there I think 1 & 2 are there but I don't think 3 & 4 I will add them soon there yeah yeah all right so what happened oh this is some weird thing I've been seeing lately you gotta go back I don't know why it's doing this so so long story short so she you can do a couple things you can look at the history of an image with docker which shows you kind of like what this is basically showing you kind of what got created in the doctor file and that now you're gonna find that with Red Hat images we do what we call squashing them so you don't see all these layers created we actually have a tool called image Factory I think as well it's called and it basically takes the exact same thing that it does to create an ISO image and our VMDK images and our AWS images and it just creates a docker image and you know so it's squashed so you won't see much if I change this you know you'd see a bunch of things um you know what's sad is I'd go through that what you're asking it like seeing the different image layers I do it in this exercise and the challenge is it would take a long time to build right now so like you have to build a docker file you know you have to use a doctor file that has multiple lines so that will create multiple layers and then what I do is I show you the tree structure you know and I could show you that if it pulls one tag if you pull one tag you'll see the thing go across and it will pull down all the image layers that it needs to get to that tag but if you pull a different tag like say you pull if you pull tag like 7.4 or whatever and then later you pull 7.5 you'll see it pull some more information down because it's pulling different image layers to kind of traverse the tree you know down to a different version you know a different version of the tag does that make sense I can't really demo it because I'd have to I'd have to do a lot of stuff to like really show you it that was your question right yeah all right I'll get through real quick I will try to go through the third one that was as far as I was gonna get anyway was number three and then and then you're always well I'm een this is all interactive so you can do it yourself yeah when you push you push everything well you push all the image layers that you've that you've created so so when you think about what pushing is so pushing is pulling an image down pulling a tag down technically pulling all the image layers that it's necessary to create that tag actually then you add some stuff and then you only push those differences back and actually you don't even have to push those differences back you could push those differences to a completely different repo you could actually retag that image so tagging in docker actually changes the name space and the repository name and the tag name so that's actually something that I just thought through for the first time ever the word tag the directive when you create tag again these are all these are all arbitrary definitions for things but the docker tag command allows you to retag that image with different namespace a different registry server you can literally real able all that stuff that I showed you like like I could show you know if you pulled out an image you can you could like do docker tag this thing then give it another name and change the registry the name the namespace the repository and the tag and then you can push it somewhere else and then it has to push all of the layers yeah you can cuz I've done it so when you pull from one registry I pushed to another you pull it down you retag it and you push but it pushes all the exact same layers to this other registry server yep exactly does that make sense yep yep exactly there's a lot of these defaults like that that are ambiguous that you don't realize until you go to do it another one that is really common that I do in the lab is the name resolution you could type dr. pol said to us what happens it uses the default namespace it uses doctor dot IO and it pulls down the image that's easy until you go to do something else how do you set the default for your own registry better than that if you pulled like the namespace slash the repository it actually like there's problems we're actually like it won't find it if you unless you you like you won't normally it would pull the latest but it doesn't know to do that because now you specify the name space and the repository and so then you have to actually specify latest and so there's like all these weird arbitrary like resolution problems so violet and that's something I forgot to tell you guys just always use the full URL because if you don't you will end up in a world of hurt at some point it will cause you pain because you won't know exactly what you've got it's really nice for doctor pulls when you're playing but when you go to build something real use the full URL because you will end up in URL hell it's in the lab I show you the different order where things break it's insanity it's like DNS that doesn't work right you like have no idea you're like you're like well if I you know DNS is very specific and you know exactly how it's gonna resolve this doesn't have that that's that's a very good point and the same is true with tagging tagging can be the whole thing or it can be just a tag or it could be the repository in fact I'm guessing you'd end up in namespace resolution problems if you don't specify the whole URL I would guess just taking a stab off the top of my head I wouldn't even try it let's put it that way it's dangerous so so so this is the next step so I showed you the first step right I dug into user programs and interpreters and what's inside the container image but now I dig into like the system calls in the kernel space right so now we're going down into the container host so you can really think of the top two gears as the container images and the bottom zoo gears as the container host and really like like it doesn't really change because of containers it's the same tools and tool change you're using I I always call you know the container images are fancy files and the processes are just fancy processes so if you do a file open with a process right like if you write a bash script you open a file you can a file whatever cat uses just call open you know then does a reread whatever the heck I don't even know what just call it uses you could stray sit and watch that you know but if you do it in a container it does the same thing so if you like cat /xe Red Hat release or Etsy hosts you know it's doing a file open on that reading all the data out and then and then and then and then putting it you know it basically writing it to the terminal so like so like that happens same thing happens in a container right the only difference is you've now created with a different selinux context in a C group with a different name space and what people don't understand a lot of times is that we this came up very crisply in the birds of a feather we did two days ago is the clone Cisco allows you to choose which namespaces you want when you use docker by default it uses all the namespaces that it's configured to use you know so it will use the process ID namespace the network namespace the I forget what the one is for time and you know all right and and and network and I'm sorry local host name or host there's it sorry was it UTS that's right it's a weird one UTS so the UTS namespace um but you can turn each one of them off when you do a clone sis call so you can create a container that a container again what is the definition of a container you can create another process that is only in the network namespace of that container but isn't isn't limited by its cgroups isn't limited by its selinux context isn't even in the same process ID namespace if you do a PS it'll just show everything on the box and you realize oh wait a minute I mean a container is just in my mind that's not a real thing right like it's it's just a construct it's a user space construct but yet we use the word as if it's real so really it's just a fancy process and you can decide how thick you want this around this right like you can you can isolate this thing in different ways you can isolate just the network not the network just the process ID table not the process ID table and there's a bunch of different namespaces they can use I don't remember how many there's like seven or eight or ten or something and and then so when you start multiple processes you know this is what it looks like right so like this is the global PID names but you know the global PID data structure right so there's a process ID table inside the kernel and if you think about what a process is you know it just adds another number to the process ID table when you do PS it just shows you all the information in that process ID table it's just like doing stat on a file right like it's just dumping in the context of that the contents of that process ID table and inside of a namespace a process ID namespace it just creates another index you know it's just a different list of information that that's separate from the global process ID so it's like no difference in a file like you understand the difference between having stuff in this file and in this file right like it's it's just it's just essentially you know I a sandboxed version of the process ID table and then in a Red Hat world we use C groups in s verts set comp SELinux we use all of these things to like add further isolation but again those are arbitrary constructs they're not that's not a standard not every Linux distribution does that someone it's distributions use a farmer's sum don't use anything sometimes they use some things like su you know LX c Alex D dr. rocket they're all left to choose which ones of these technologies they want to use when they create a container you know there's no definition for that and one last thing I should point out the execs ve so like if you do a PS and bash and you strace it that's the that's the system call that creates the PS or or the sub command and and you notice like if you do it if you do a PS you know like it execs into that PS and then it returns so it exactly ease into that and then it returns with a clone sis call it create you know in this one you might just have a process ID that in this particular example I'm only showing that it creates a process ID namespace you know and then and then that clone sis call this would be like a commit this would be like a little block of C code that you wrote yourself that just creates a process ID namespace to kind of show how this works and then this would be like something like a docker container right like where we're like this one it would create all these namespaces you know so these are all the global ones and then these are all the namespace ones so you see you can create a namespace around all these things does that make sense to everyone you kind of get to pick and choose the different data structures in the kernel that you want to basically virtualize at that moment as you start the process and in fact something people don't know where the clone syscall there's actually no way to list easily all the namespaces that are on a kernel like you can't just like get namespaces and it just like shows you all the namespaces that are writing that that doesn't that doesn't happen that doesn't exist in the Linux kernel I tried to mess with Erik Biederman and understand how or what as a UNIX admin for a long time it made sense that I should just be able to see the namespaces like that seems like something that I would it seems like it should be a data structure right but it's not there's not a concept of a container in the kernel so there's no way to just list all of them there but there are there are actually names ways they get created it's just and you can actually add like you can add another process to that namespace like you can actually add other processes so what people say is like well how do I get inside of a container well you don't actually get inside of a container you just add another process to the same namespace as the other one and so like that's a really mind-blowing concept for a lot of people so when you do a doctor exec or a kubernetes coop cuddle exec and get into the container you're actually just starting another process with a clone syscall and adding it to the same to the same namespace as the other one and then you just happen to be in that namespace but what most people don't realize is you can actually use programs like NS enter so namespace enter and only enter part of the namespace so like you can like just enter like the network part and be able to do network traces and things like that but not be limited by its C groups not being limited by its selinux rules not the alidium it limited by its process ID you know what it sees and you can see other process IDs and things like that and that's pretty mind-blowing for a lot of people yes it it essentially just creates another process in the same namespace and in the docker world by construct when you do the exact it chooses all the same namespaces because that's the thing that's the most illogical as a human being right like when you want but you can actually disable like with with a clone sis call it has the potential that enable and disable different things so like NS enter for example is more granular and you can control which namespaces you would enter with a docker executives enters you into all of them but that's by construct that's not necessarily it's not necessary you know and so then here is a full rundown of what it looks like because I forget I think people have forgotten this stuff so when you cat you know when you can't a file you don't even think about it you're like I just can't a file boy I've forgotten the hundreds of lines of code that you know whatever cat is to do this so actually probably you're probably exactly more than that I would imagine there's probably more than hundreds of lines when you get into the file system the VFS layer and all this other stuff that's happening but this is what it looks like right you do an open cisco open syscall talks to the virtual file system layer virtual file system layer has a driver for x FS x of s has a driver for whatever the block devices and it finally accesses the blocks right but there's nothing different with a container the only difference is is that the mount namespace is virtualized so now the list of mounts looks different than the UNIX system itself it's a virtualized list of mount points and it just happens that VAR live maestro in the container is mounted on you know some other volume but it still has to go to the VFS layer the XFS layer in the block device right so like it still uses the exact same storage subsystem in the kernel there's really no difference it's just a fancy process and I remember I will admit that like a few years ago a couple years ago I had a crisis because because people will ask me these crazy architectural questions and this drawing didn't exist and I didn't have it in my head and I was like how does that work and and we all end up in these crises because there's this around storage and network and this this exact problem is true across all the things that containers you so like network storage Ram you know CPU like like we have these crisis's where we're not quite sure how it works and then you you kind of have to go back and be like wait a minute let me think through this again I know how processes work in a Linux system so I know how this works I just need to make sure that I think properly and explain it to people around me and then I go the full bore and show you Oh like so this is what this is what happens on ER so these are two different rel seven systems one this is another one I think people forget these are image layers and you know you pull basically when you do oh you this actually will hopefully answer your question because I just realized I have this drawing I forgot so so say you pulled down you know like like my skull right you're doing an h2 PT connection to a registry server you know you pull all the image layers down they get pulled down they get cached on the host right the host then smashes all those layers together using what's called a graph driver the graph driver either lays them out on there's there's two main ways that happens in Red Hat Enterprise Linux and there's honestly I probably argue two main ways that the universe is doing it there are a bunch of drivers for graph drivers but the two big ones are probably overlay two and device mapper and those are the two that Red Hat uses we're moving the default on - we are default has historically been device mapper but we're moving to overlay and so like basically long story short there's a couple different ways in a UNIX file in a UNIX or Linux system to basically map a bunch of image layers into something that looks like a single directory because at the end of the day that's what you need you need something that looks like a single directory for Runcie to be able to go fire up a container like that's basically what's happening and so that thing that takes all those image layers and Maps it on a disk and makes it look like a single directory that's a graph driver and so that like this word graph driver never made sense to me and I had one of these crisises and I had to go study you know lips are container storage and understand what was happening so you know pulling the image layers down is one thing that's a library that needs to know how to do that and then the next you know that's lip store container / storage which is a library that like bilder and scope EO and all these Red Hat tools use and then and then the graph drivers are what exploded on disk and then and then what people don't understand is you explode them onto disk as read-only layers and then when you look it in overlay filesystem and you look through it it's read-only right but if you like create a file in that file system it creates a copy on write layer and what most people don't understand is is that the not this layer is always read-only to disable this copy-on-write layer you have to pass docker the - - read-only flag and most people don't do that because things break and randomly is when you do that and so I had a guy yesterday asking me this crazy question he's like we're having this problem with metadata not being fast enough inside the container and I'm like why and he goes he goes I don't know we're writing a bunch of files it's like fifty thousand files that uses Yocto or something and I was like I think and it even took me a second for me click and I had to snap back to like what I tell everyone think through the unix-like basics of this stuff and I'm like but but I'm like you're bind bounding it right like you're writing through like he's like no we're writing into the container like well it's using a copy-on-write layer so every time you you know create metadata you have to like basically you're making a changes copy-on-write layer which is way slower than if you just did a bind mount so like a bind mount this basically is the same path that you would use if this is the same code path that you would use if it was a regular process right but like in a docker container when you use a bind you know you're essentially creating a mount namespace that then makes it without the dash - read-only option you're now creating the copy-on-write layer and this copy-on-write layer is slow by default because we we're we are trading off slow writes for convenience and branching and so now if you run for version again I'm showing three versions of this container they only have one read-only set and a bunch of copy on writes and so we're now saving space and making this easier to use but it's much slower does that make sense fun wait I don't know who is first Babu okay that is correct which leads me to another thing that people so your question what your comment slash question is if you use this bind mount and then I run a container and then I'm like I just pushed you know I create a new version of the container I push the registry server that data is still stuck on that node right like it's it doesn't go with it which is a good thing and a bad thing the problem is is now you need to think through basic dr4 that bind mount like it's the same thing that you've always had which is and I'm actually working on a talk for this it's basic dr recovery its transaction replication file replication or block replication there are three options that is it that's how the universe works its basic unix you know again you know v file replication block replication or transactions and so if it's a mysql server it should be able to know how to you know copy its transactions over to another server and so then you shouldn't care because if you're running a container here and a container over there that's doing a replication you don't have to worry about the container layer but if you're doing file file replication maybe use something like Gluster with geo replication for those by mounts okay now I can ship that data off asynchronously great or maybe I'm using s RDF with an EMC system than it already does the block replication so like you still have to think through that stuff hold up before you he had question yep okay so your question is and again it's a weird and this is exactly why this stuff's so hard cuz like it's hard to even ask the questions you're saying so is this the last layer of an image so like if you pull down a container image is this the last layer the read/write one the white one the short answer is no kind of so like it's not the layer in fact this is this is an ephemeral layer that gets deleted by default so like it's gone if you do nothing now with a docker save command you can actually export that it does become a new layer and then when you push that if you tagged that thing and push it it will become the next layer and so you can do that that's correct and I think we're getting kicked out but that's it anyway so we're good [Applause]

Info

Channel: The Linux Foundation

Views: 7,686

Rating: undefined out of 5

Keywords: containers, open source, openstack, open source summit, Cisco DevNet, linuxcon, Open API initiative, technology, Google, API, unikernals, cloud, linux, embedded systems, sdn, open community conference, IBM, cloud computing, apache spark, containercon, open source community, decoding, devops, containerization, quantum computing, cloudopen, Intel, kubernetes, technologists, nfv, open daylight, openvswtich, docker, system containers, red hat, CPU performance scaling

Id: KawKGsLR1V8

Channel Id: undefined

Length: 99min 18sec (5958 seconds)

Published: Fri Oct 27 2017