Docker Internals Session 2: Docker Storage Drivers (Copy on Write, UFS & Docker Images)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] hey guys welcome back in the previous session we did a deep dive into the fundamentals of docker containers including C groups and namespaces now in this session we will be covering the next important concept of docker internals that is a docker storage driver now it seems that building an isolated runtime environment or containers just solves part of the problem without docker storage drivers launching containers would be so slow and inefficient that it would have been unfeasible to use docker at the scale we see today docker storage drivers are without a doubt one of the main reasons for its success as a containerization technology now to launch any application inside a container you do require a long list of OS libraries and dependencies now ideally you will be required to make a separate copy of all those libraries for each of the containerized process that you launch now this process is not only highly inefficient but also unfeasible as launching containers will take so much time that people will see no advantages in switching to docker over traditional virtual machines now the docker storage driver basically figured out a way to avoid making these full copies using mechanisms like copy and write and union file systems now the idea of copy-on-write is not new which is also sometimes referred to as implicit sharing and shadowing and has been historically used as a resource management technique this technique is heavily used in ramp resource management and is also used in some storage and snapshotting technologies now docker uses this technique to solve the problem in hand that is will devise a mechanism to effectively share files and libraries among containers now during the initial days of its development when docker was still called dot cloud the docker development team where on a continuous lookout for a copy and write system that will fit the requirement now I believe a ufs or the advanced union file system was the first copy and right storage driver to be used to build docker containers however at that time none of the mainline Linux kernels inherently supported any copy-on-write functionalities and hence this had to be externally patched and managed by the docker team which was a bit of a tedious and messy procedure however once docker grew and its popularity increased the Linux distributions acknowledged the importance of integrating docker with their OS and hence started to contribute and develop copy and functionalities into the Linux kernel now over time many copy-on-write storage drivers were developed and supported by docker namely the au FS or the advanced union file system better FS device mapper that is also used in l VMs overlay FS and ZFS these storage drivers can mainly be categorized based on whether they operate at the file level or at the block level as you can see in this chart now the default storage driver today for Red Hat and sent to a7 systems is the overlay - as you can see here in the output of docker info which is what we will be covering in this session now the overlay FS is the modern version of the union file system the diagram shown here is a great visualization to help you understand how union file systems work now in union file systems there is a layered approach to storing files and directories again which are still basically stored on your native OS file systems like XFS however the job of Union file systems is to create a merged view from the top where it cumulates all the underlying layers into one unified view overlay FS is also a type of ufs or unified file system and involves individual files and directories as branches or layers it then allows a virtual merge of these layers to provide a unified view remember that these underlying layers can be on different volumes or even in different file systems now I understand that at this moment you all will be having a lot of questions the first one being what is the use of such an approach well don't worry things will start to get clearer as I show you all some of the practicals for now just know that this layered approach is essential as it allows us to share the layers across containers now before we move ahead I want to introduce one more important topic in docker that is the docker image a docker image is nothing but a reference to the union of one or more read-only layers a docker image is essentially built using line by line instructions where each line is actually an instruction on how to build a layer now docker provides us a tool to build these images called docker file which we will see in a few moments docker file is basically a text that contains the line by line layer creation instructions now before I move on to my Linux terminal I want to show the docker hub which is a cloud-based docker image repository this is used by the docker community to create test and distribute container images here you will find docker images for almost all well-known Linux distributions and applications now by default docker will automatically try to download all images from the dogger hub over Internet if not available locally here I have a fresh Center 7 installation where I have installed and started the docker daemon currently I do not have any docker images available which you can see from the docker image LS command now I will download the official Center seven docker image from the docker hub the command for which is docker image pull the image name and the tag now docker image tags can be taught as a type of versioning technique for docker images every image that you create by default will be assigned a tag latest however you can put your own tags to convey useful information about that particular version of the image to your users you can see here the list of tags or versions of the Centaurs image available for download now I have downloaded the sent to a7 image and I can confirm that using the docker image LS command on a sidenote docker CLI commands usually follow a basic syntax of the keyword docker followed by the docker component options and then the command or the action to be performed along with its arguments by docker components I mean the basic building blocks of your docker environment that is docker images containers volumes network etc you can always see the list of actions available for a component using the help option here I use the help option for docker image similarly the docker network and so on now that we have downloaded a base center 7 image I'm going to build my own customized docker image on top of it before I do that I want to show how docker has created our base layer in the backend for the image that we just downloaded for this I will go to the directory slash var slash log slash docker which is a default docker storage directory now remember you are not supposed to ever directly interact with this directory never try to change or update any file under this directory as it is fully managed by docker or else you risk corrupting your dock environment now I only move to this directly because I want to show you all what happens behind scenes in Dhaka which you do not see normally when dealing with docker containers now here I will move into the overlay 2 directory here you will find a directory for each overlay layer that is created layers are identified by applying a hash algorithm to their contents and hence the long weird name now this layer is the one that was created with the sent to a 7 image that we just downloaded if I move into this directory you will see a directory named diff and a file called link the link file is nothing but contains a short shortened symbolic link identifier for that layer this shortened identifier is just to avoid any long name limitations with the mount option the important thing here is the diff directory which consists of all the files and directories that are part of that overlay layer which in this case is all the center 7 libraries and directories now the name diff will start making more sense when we create a new image layer to create our own docker images we will be using the docker file in simple terms a docker file is just a text file that contains a list of instructions on how to build a docker image as mentioned earlier these instructions are written in a layered format where each line of the instruction is converted to an overlay file system layer in the backend now the standard format of a docker file is instruction followed by its arguments now docker supports a list of instructions for the docker file which you can easily find online I will cover these details in later sessions for now I will be using only the from and the run instructions to build our image here on my Linux system I have already created a sample dock of file which is basically just a text file named docker file now a docker file must start with the from keyword which basically specifies our base image on which we will be adding our own layers in this case I will be using the send to a7 image that we just downloaded the next instruction is to run these two shell commands that will create a file which says hello inside a directory named slash test1 now to create this image let me save and close this file and execute the docker command docker image build - t - name and tag the image so let me name it test image 1 and then provide the path where it can find the docker file there you go our image has been built you can see that each instruction is executed step by step and an Associated overlay layer will be created in the backend if not already exists you can confirm that our image is created using the docker image LS command now if we take a peek inside our docker overlay back-end directory you can see that a new layer has been created for the image we just built if you see the diff directly inside this layer you will find only the test1 directory which we instructor to be created because each layer only has to store the difference from its underlying layer just like what is shown in this diagram now here you can see a new file called lower which actually contains a shortened name of the layer below it which in our case is the center a 7 base image layer the overlay FS driver will use all these information to build a unified view at the end now remember that layers themselves have no notion of belonging to an image here Murli a collection of files and directories an image however stores the collation information of its layers which you can see in the docker image inspect command with the image name that is test image 1 which shows a different overlay directories that are part of the image also docker image history command with the image name we'll show you the layered instructions that went into building that image now let me create one more docker image called test image - that contains three lines of instructions we will use the same docker file the first two instructions will be the same however I will add a third instruction to create another directory called slash test-tube let us build this image docker image build - T let me name it test image - there you go note that something interesting has happened here where it says using cash docker uses something known as a build cachet or layer caching this is to avoid making duplicate layers during build and to optimize the process of building docker images docker layer caching mainly works on the copy run and add commands what this means is docker stores a historical cache of previous images and compares the instructions with the new image during the build process to see if any of them is being built with the same exact instructions in the same sequence if a match is found docker uses the existing layer instead of creating a new one in our case docker uses the existing layers for the first two instructions which is shared between both the images and goes about creating a new layer for the third instruction if we peek inside our overlay back-end we can see only one additional layer created that contains the test2 directory in its diff the other two layers are shared between the images and are not recreated here is a diagrammatic representation for better understanding where I have shown the first two layers shared between the images this mechanism greatly optimizes our build process now this obviously has some drawbacks in cases where the output of the run instruction is not the same every time you execute it for example in case of a get pool where you need to pull a new code every time docker will preed the instruction as repetitive and will reuse the existing layer instead of creating a new one here is an example I created just to mimic such a scenario in this docker file I have put two instructions want to use the send to s7 base image and the second to create a file with some random text in it let me build the image named test image 3 now remember the random text generator will generate unique random text every time it is executed hence technically it should create new layers every time so let us try and build another image with the same docker file named test image for as you can see docker treats the instructions to be repetitive and uses the cache information to reuse existing layer instead of creating a new one here is a diagrammatic representation of what was expected and what we got instead now there are ways to work around this problem one of them is to use a no cash option which will force docker to not look up the billed cash and to create new layers for each line of the build instructions now the design of docker files plays a key role in using the full potential of the docker build cache and also affects the overall optimization of your builds as a thumb rule repetitive instructions that produce similar layers must be placed at the beginning of all docker file and the non repetitive ones should be placed at the end as much as possible this ensures that there is an effective sharing of layers across images also remember sequencing is important when you write daka files as docker will recreate all layers after the first mismatch is detected now this session is not intended to be a docker file tutorial which we will cover in later sessions with good examples the intent of this session was just to make you understand how the layered approach of overlay drivers help in optimizing docker image builds now that we have completed building our docker images it is time that we launch containers from these images now in the previous session I explained what docker containers are from an OS perspective that is they're just processes running in an isolated runtime environment created using C groups and namespaces now to launch a container docker first creates this environment and then mounts the overlay layer with the docker image being the shared read-only layer at the bottom plus an additional writable layer on top specific for each container any operation at the container level is only stored at the topmost writable layers for those containers if container specific writable layer is lost once the container itself is deleted so does the data in it this diagram is a good representation of this which shows two containers launched from the test image to which we had created earlier both share the same read-only image layers which is then merged with a separate writable layer on top and mounted inside the containers now let us try this out on our Linux terminal similar to docker image command the CLI command to manage docker containers is docker space container here the help shows all available actions to launch a container the simple command is docker container run now I will be launching the bash process inside the containers which requires an interactive terminal hence I will put the option - I T for interactive terminal then the image name that is test image - and the process that I want to run there you let me now launch one more container of the same image in another tab we have now launched two containers which you can confirm from the docker container LS command move into my overlay directory you can see that topmost container specific writable layers created for each container you can identify them as they contain the merged directory currently these layers have empty divs as I have not stored anything inside my containers there are also layer directories with a trailing iron ID which I failed to mention before these are nothing but container specific read-only layers that are created to initialize some mandatory files and directories for each container like the host file host name etc remember till the time we launch the containers these layers are nothing but individual directories stored on your native Linux file system only at the time of launching the containers the overlay driver kicks in to create a unified merged view of these layers to be mounted inside the containers now if we peek inside the merged directory of one of the containers you can find the merged view of all its underlying layers here you can see the accumulation of all the directories and files from the underlying layers which includes the Centaurs base image directories along with the test1 and test2 directories similar to what we saw in this visualization now if I check the mount output and filter only for overlay you can see that it is this merge directory that is mounted inside our containers with the filesystem type as overlay now any file operation done inside our containers will be limited to our topmost writable layer let me just create a test file inside one of my container now if I check the diff directory inside my writable layer you can see the file is now available there now the last and most important concept that I wanted to show is the copy-on-write operation which I mentioned many times before but didn't explain what it is that is because I wanted to show you that in action rather than just giving a theoretical definition now it has been already mentioned that all file operations at the container level is maintained at the topmost writable layers and the containers cannot modify the underlying image read-only layers this is to ensure that the changes performed inside a container is not propagated to other containers that share the same image however what if there is a situation where I need to update a file that is part of the underlying read-only layer this is where the copy-on-write mechanism kicks in where as soon as you try to modify a file or directory that is part of an underlying read-only layer in this case let us consider the file test-1 slash file dot txt that file or directory is now copied to the topmost writable layer of the container so that the container now has a local copy of that file for its operations in one of our container let me try to modify the file slash test one slash file dot txt that is part of the layer two let me write hello Karthik and save and close the file now the copy-on-write operation must have taken place in the background which is invisible to us users this is evident because from other containers that we launched the contents of the file is still the same and has not been updated even though they both share the same underlying image layers now inside our back-end overlay directories if I check the writable layer of the container you can see that a local copy / Testament file dot txt has been made available now as a thumb rule you should design your docker image layer in such a way that there should not be a situation where your containerized processes have to update or write to an underlying image layer this is because copy-on-write operations on relatively larger files can be resource intensive and might impact performance now it is important to note that rideable container layers have the same lifetime as that of the containers and will be lost when the containers are deleted and they are rarely used as a means of storage most of the time you will see mounted persistent and non-persistent docker data volumes that store data outside the containers using one or the other means details about which we will cover in later sessions however understanding the docker storage layer is a key part in getting familiar with the docker internals and is functioning also it will surely enhance your overall experience of using docker now that is the end of the session and I hope you all found it interesting and learned something new if you liked the video don't forget to like share and subscribe see you next time you [Music]
Info
Channel: TheITNoob
Views: 6,719
Rating: undefined out of 5
Keywords: Docker internals Session 2: Docker Storage Drivers (Copy on Write, UFS & Images), Docker Storage Drivers, Copy on Write, Union Filesystem, Overlay Filesystem, Overlay2, Docker Image, Dockerfile, Docker Internals, Docker operations, Docker, Docker layers
Id: 3BkCaBxq5Ag
Channel Id: undefined
Length: 22min 54sec (1374 seconds)
Published: Sat Jul 27 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.