Running Containers: Behind the scenes with Docker, Podman, runC, Umoci, Namespaces & cgroups

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey what's up and welcome back to the fifth and final video of this mini series what exactly is a container now in this video we're going to focus a lot more on answering exactly that question we're going to look into the details of what exactly makes up all the magic that is containers so more specifically we're first going to look into the roles of container engines and container runtimes we're then going to build and start our own container using just the container runtime we're going to start to appreciate exactly what it is that container engines can do for us and then finally we're going to take a much deeper dive into linux kernel features such as namespaces and c groups alright let's get into it okay so last video we pulled our container image down onto the production host from the ibm cloud container registry let's now start by running our hello world web app container image and then dig a little deeper to see what's happening behind the scenes to make what we know as a container so let's check the image name again alright so immediately we're reminded that docker is not installed on this machine so if you've been following along with the series you already knew that we do however have access to a very similar tool called podman podman is a container engine just like docker and it allows us to run containers on a host amongst other things pretty much just like docker podman is a very useful and important tool for troubleshooting issues with containers running in a kubernetes environment where docker and therefore the docker cli may not be available which is fairly common now with the rising popularity of the creo container engine which is actually designed for use with kubernetes so in terms of capability podman is everything docker is and more so like i mentioned in an earlier video you can create an alias for docker pointing to podman and all your docker commands will continue to execute just fine now this isn't a deep dive on podman so i won't go into a whole heap of detail about the differences because there are a few but i will show you one of the biggest differences in regards to running containers on a host so you can see why and how a tool like podman is useful so i've got two tabs open here on the top we have the production machine with podman installed and on the bottom we have just a machine called docker and it has the docker daemon installed now let's spin up two containers on each host we can just use the small busy box image from docker hub and tell it to sleep for an hour okay and let's check the running containers now okay so everything looks pretty much the same right the only real difference is that the registry address isn't specified on the docker host but we already know that's because docker defaults to the docker.io registry now if you look closer you'll notice that it's not actually the same so let's take a look at the process trees on both hosts so here you can see that at the top which is where we're using the podman tool each container has spawned from its own con mon which is short for container monitor process down the bottom where doc is running you can see that both containers have spawned from a parent process called container d so this is probably the biggest difference between podman and docker podman is demonless whereas docker isn't so if i restart the docker process and then check the running containers again you can see that the containers are now exited so if anything happens to the container's service all containers are negatively impacted if i come back to the podman host you can see that if i kill the parent container monitor service from any of the running containers the pod remains running you can see here that the container process itself continues to run and we can prove that even just a little more by getting the container id and executing a shell inside that running container so you can see i can still access the running pod now if i kill the pod itself there's the small issue of an orphaned process because the exited pod hasn't been able to report back through the container monitor process so it looks like the pod's still running but i guess that's better than a non-running pod so in short podman doesn't need a daemon process to operate like docker does so it's a useful way to query and troubleshoot containers that have been started by other oci compliant runtime engines like creo okay so let's continue to use podman and start our hello world app with the podman run command so again what we're doing here is running the container in detach mode with the dash d option we're mapping port 7000 on the host to port 3000 inside the container where the web server is running we're setting the background color of the app to yellow using an environment variable called bg color and then finally the container name is set to hello world okay we can check the browser to confirm that the app is running as expected and there we have it so as always it's that easy to start our containerized app we've seen that a couple of times now but you probably still have at least a few questions like what actually happened behind the scenes to get that to work how is the container isolated from other containers on the host and also the host itself or maybe even what actually does the mapping of the host ports to the container ports in my opinion these are all valid questions and i believe knowing the answers can really help with troubleshooting container related problems when things don't really take the happy path so let's start to answer some of these questions in terms of what happened behind the scenes it ultimately depends on the tools used to start the container so we've been using docker and podman throughout these videos so i'll stick to just these two but the good news is that regardless of how you get there the flow is always something similar to what you see here when it comes to starting a container the goal of the container engine is to pass something called a runtime bundle to a container runtime so in this case that container runtime is run c run c is actually the ocis reference implementation for container runtimes and it was originally donated by docker to the oci back when it was called lib container anyway what i'm pretty much saying here is that each container engine will have its own features and functions its own ways of doing things regular container engine things like providing an api for human or container orchestration software to use or providing functionality to pull and push images preparing container storage like setting up the copy on right layer and prepping any mounts when you start running a container and also sorting out the networking for the container as well but no matter what preparing this runtime bundle to pass onto the container runtime of choice is something each container engine supports in the same way because it's the container runtime software that's actually where a lot of the magic happens in terms of making a container into a running environment okay so we know now that the container engine generates a runtime bundle but what does this runtime bundle look like and how do we actually get a runtime bundle well first we need our image available in oci format so we just want to get that hello world version one image into oci formatter so from the last video we know that we can get that pretty easily using scopio and we can get that from local storage too rather than having to download it again okay now that we have our oci formatted image we can generate a runtime bundle just like the container engine does so for that we can use a tool that's recently been incorporated into the oci called umochi so as per the umachi site the umachi unpack command allows us to take an image and extract its root file system and configuration into a runtime bundle which is what we now know that run c the container runtime is expecting so you can see there's lots of files in our runtime bundle if we go into the runtime bundle we can see that there's a config.json and the config.json contains a lot of the necessary config to start the container you can see here that we've got the args we've got environment variables we have the capabilities of the container on the host and we also have some namespaces listed that we'll talk more about in a moment now you can also see that the root file system has been extracted and this obviously contains all the files that are made up from the various layers of the container image okay so let's now use the container runtime that's called run c to run the runtime bundle and start the container and there we have our container running so you can see that the web server has started inside the container on port 3000 now we can't access the app like this because a function of the container engine is to actually do the networking setup for you and i skipped that part because i think it starts to become more of a virtual networking video rather than a video focused on containers also the specifics change a lot between the different container engines that you may end up using too so instead i created this slide that points out at a high level what each container engine does to network the containers that it starts so first of all we see that when the engine is installed it creates a bridge on the host with a new network address for all the containers that will run on that host then when a new container comes along it creates a new virtual interface and it connects that virtual interface to that bridge the container engine then creates a new network namespace and then it adds the virtual interface to that namespace so this namespace is then later associated with the new container so this is effectively like plugging in a nic that's already connected to the host bridge into the container so there's connectivity between the container and the host finally the container engine assigns the virtual interface inside the container an ip address on the host container network and then it configures the default gateway to hit the bridge and of course it brings that interface up so yeah the container engines are definitely handy doing all this behind the scenes for every container that you create and you can actually see some of this with podman so let's get a new tab so if you take a look at the interface config you can see that there's this podman network and cni by the way stands for container network interface you can also see that there's a virtual ethernet device that would be assigned to our running container we can have a look at this by checking the interface config of the running container and we can see that the interface inside the container is running on the same 10.88 network of the cni podman zero network if we were to stop that container we would see that the virtual interface would be removed okay now amongst all of that i mentioned namespaces a couple of times and i realized that you may have no idea what that means at the moment but just park that for one more minute because i've got some good examples coming up that we'll explain for now i just want to finish up on the topic of the container runtime run c so we now know that run c is most often the tool that will actually run our containers regardless of the container engine that we're using so it's also a good tool to check on containers when you're having issues with the container engine itself now we previously stopped one of the containers that we started running with podman so let's just start that up again so we can use run c to perform some checks on our running containers directly now we can actually use run c to look at all the containers started by run c by running the run c list command here we can see that we've got the container that we started directly with run c and we're also seeing the container that was started with podman we can also have a quick look at the processors running inside a container with run c you can see we've got our next application running if we want to we can also get a shell into a container using run c and you can also view events directly from the container now by default that will display events every five seconds but you can get them once off as well with the stats option okay cool so now we know what a container engine does to start a container calls out to a container runtime with a runtime bundle that it prepared after preparing some storage and network resources the next question would be how does the container run time then ensure that the process that is the container is running in an isolated environment its own hostname network file system and all that sort of stuff the answer to that is that it uses a linux kernel feature called namespaces so namespaces as you would expect by the name limit the visibility of system resources so to give an example if you're in a custom network namespace that only has one nick assigned to that namespace you'll only be able to see and therefore interact with that one nick even though there may be 10 other nicks that exist outside of the namespace on that host so first of all we can read more about the different types of namespaces available by looking at the man page for the ns enter command you can see here a short description for each of the available namespace types we can look at the namespaces associated with the process with the list namespace command first we need to get the pid of the container process and then we can list the namespaces so our container process is a member of these namespaces we have the c group namespace we'll talk more about c groups in a minute the user network and mount namespaces we then have the uts namespace which handles things like hostname the ipc namespace that deals with system message queues and finally the pid namespace which gives the process its own pin tree so it's these namespaces that make the container feel like it's a separate functioning system they're created new for each new container and they mean that each new container can only see the system resources that have been placed in its own namespaces it's also really a combination of them all because one here and there really wouldn't give the same effect for example at the moment we're in the hosts namespaces we can see them by listing the namespaces for the process id one and we can see that we have different network namespaces so if i look at the interfaces on the host we get four different interfaces but if i enter the same network namespace as our container things will look a little bit different you can see that i can only now access the network resources available in the container's namespace so we now only have access to the container network interface and the loopback interface now you can also see that we're still in the uts namespace of the host machine because our prompt is still showing production let's move into the uts namespace of the container as well and there we go our hostname has now been updated to the container id okay so we can exit out of those namespaces so that's just an example for getting into a couple of namespaces but an easier way to enter into all of the namespaces is with the dash dash all option or the dash a for short this time though because we're also entering the mount namespace we'll need to specify a shell and there we go we've entered into all the namespaces of the container and we can see the containers file system again okay so hopefully that gives you a better understanding of the power of namespaces if you're familiar with the term pods perhaps from kubernetes or podman 2 for that matter you can probably see how different container processors can be joined to the same namespace to share the resources of that namespace anyway that's enough on namespaces the other significant piece to the puzzle is control groups so control groups or c groups for short can be used to limit the resource usage of a system resource like cpu memory disk and network you can see the system c groups available here and more specifically we can see the c group settings applied to our container so what we can do is set resource limits using the container engine so in this case obviously podman which will then create these c group configs for us and impose the limits on the container let's start up another container on a new port with a different color background so it's easier to tell the difference and also limit the amount of memory the container can use to one gigabyte and the cpu quota to 0.5 of a cpu okay so we now have another container e7e started so what have the dash m and the dash dash cpus options actually done and how can we see it work well in terms of what it's done we can have a quick look at a few things first of all we can look at the memory we can look at podman inspect okay here we can see the memory configured for the container which is in bytes so that makes up approximately one gig if we have a look at our other container that hasn't had any memory limit specified we can see that we get a value of zero now let's look at the c group so i'm going to open up another tab because it makes this a little bit easier so first let's just check out the host so this is the c group that the host abides by for the limited amount of memory in bytes it's just a really big number that effectively means infinite let's take a look at the same thing inside the container and here we can see the inside of the container the exact same file is showing that we have a one gig limit so back on the host we can actually see where this container limit is configured you can see there is this machine slice directory and this directory houses all the c group config for each one of our containers that we spin up with podman we can go into our memory limited container and we can have a look at our memory limit in bytes again so you can see this is where the one gig is specified on the host okay now we can look at the same sort of information for cpu as well so we can see that our cpu limited container is set to 50 000 which is the 0.5 of a cpu that we specified if we take a look at the container that doesn't have cpu limited we can see that there's no limit imposed now again let's take a look at the c groups first we can see the c group of the host itself we can see the host has the configuration of -1 meaning that there's no limit set let's take a look at the exact same file in the container and once again the view from inside the container shows us that the host believes that it only has access to that 50 000 or 0.5 of a cpu and lastly we can just take a quick look at where this configuration is specified in the context of the host and there we have the same 50 000 limit set so you can see that if i wanted to increase the amount of cpu quota available to the container i can update that config and we can see that it is reflected inside the container now i'll just change that back for the next example and finally we can check that these limits actually can't be breached so when we built the container image we included a tool called stress ng that will help with this so we've already got a shell into the container down the bottom so let's test the cpu first this time we'll just check and confirm that this machine is configured with only one cpu and we can see there's only one cpu core available now let's run a stress test in the container trying to use all of that one cpu keeping in mind that we've specified that only 0.5 of a cpu is available when we actually created the container so you can see that the process is limited to pretty much exactly 50 of the cpu so what happens if i stress test with say 2. so again all the running processors in that container are limited to the 0.5 cpu that we set so they evenly share the usage cool so let's now take a quick look at memory 2. now the difference between cpu and memory is that it's a lot easier to keep a cpu in line without getting the container killed off but if a container was continuously asking for more memory the container engine will out of memory or oom kill the container anyway let's give it a go and see what happens so i've now got the top command running with the focus on memory in the top panel so here we can see that we have about 3682 megabytes of ram available on the host but we know that our container is configured to only use at most one gigabyte of that now before i continue i should just mention that i've had to increase the cpu allowance from 0.5 of a cpu up to a whole cpu because i found that when doing the memory stress test the container wasn't able to get full use of the one gig memory with only half a cpu so let's start the stress test we can push this a little more by running two workers and that's a better example where we know that the stress test is trying to gain access to 4 gig of memory but we can see from the resident size which tells us how much physical memory is being consumed by process that it's capped at most one gig alright and there we have it that wraps up this mini series now i hope that you enjoyed the series but most importantly i hope that you learned something new so if you did please consider hitting the like button and also the subscribe button so that you can stay up to date with the latest videos that i release awesome thanks again and bye for now

Info

Channel: Ryan Hay

Views: 987

Rating: 4.9200001 out of 5

Keywords:

Id: uMGPJ1qrDFw

Channel Id: undefined

Length: 23min 0sec (1380 seconds)

Published: Tue Sep 22 2020