Top 8 Docker Best Practices for using Docker in Production

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're gonna talk about eight best practices for using docker in production docker is obviously a technology that became a standard and a technology that everyone is familiar with however not everyone is using docker according to the best practices so in this video i want to show you eight ways you can use docker in a right way in your projects to improve security optimize the image size and take advantage of some of the useful darker features and also write cleaner and more maintainable docker files the first best practice is to use an official and verified docker image whenever available let's say you are developing a node.js application and want to build it and run it as a docker image instead of taking a base operating system image and installing node.js npm and whatever other tools you need for your application use the official node image for your application this will not only make your docker file cleaner but also let you use an official and verified image which is already built with the best practices okay so we have selected the base image but now when we build our applications image from this docker file it will always use the latest tag of the node image now why is this a problem because this means you might get a different image version as in the previous build and the new image version may break stuff or cause an unexpected behavior so latest tag is basically unpredictable you don't know exactly which image you are getting so instead of a random latest image tag you want to fixate the version and just like you deploy your own application with a specific version you want to use the official image with a specific version and the rule here is the more specific the better this also gives you and your team a transparency to know exactly what version of the base image you're using in your docker file now looking at all the image tags or versions here you see that for node.js there are multiple official images not only with different version numbers but also with different operating system distributions so the question is which one do you choose and that's an important point if the image is based on a full-blown operating system distribution like ubuntu or centos which has a bunch of tools already packaged in the image size will be larger right but you don't need most of these tools in your application images in contrast having smaller images means you need less storage space in image repository as well as on a deployment server and of course you can transfer the images faster when pulling or pushing them from the repository now in addition to the size there is another issue with images based on full-blown operating systems with lots of tools installed inside and that is a security issue because such base images usually contain hundreds of known vulnerabilities and basically create a larger attack surface to your application image and this way you basically end up introducing unnecessary security issues from the beginning to your image in comparison by using smaller images with leaner operating system distributions which only bundle the necessary system tools and libraries you're also minimizing the attack surface and making sure that you build more secure images so the best practice here would be to select an image with a specific version based on a leaner operating system distribution like alpine for example alpine has everything you need to start your application in a container but is much more lightweight and for most of the images that you look on a docker hub you will see a version tag with alpine distribution inside it is one of the most common and popular base images for docker containers so the best practice here is that if your application does not require any specific system utilities or libraries make sure to choose the leaner and smaller images from the selection the next best practice with docker is optimizing caching for image layers when building an image so what are image layers and what does caching and image layer mean it's very simple actually now docker image is built based on a docker file right in docker file each command or instruction creates an image layer let's look at a simple docker file based on a node alpine image so every docker image is made up of layers this means when we use a base image of node alpine like in this example it already has layers because it was already built using its own docker file and you can see that actually in the documentation when you go to the image page in docker hub and click in one of the text you will see the layers that make up the image so this is how the image was built and plus in our docker file on top of that we have a couple of other comments that each will add a new layer to this image and again this is how every docker image is created using this multiple layers and once you build your own application you can also see all the image layers of the final application image on a command line using docker image history command with the image name so this will display you the image layers with the corresponding commands that created this layer okay now what about caching well each layer will get cached by docker so when you rebuild your image if your docker file hasn't changed a docker will just use the cached layers to build the image this of course makes building the image much faster caching is also useful and important when pulling and pushing an image so if i pull a new image version of the same application and let's say two new layers have been edited in the new version the whole image doesn't need to be downloaded only the newly edited layers will be downloaded the rest are already locally cached by docker so they will be reused from the cache so going back to our simple docker file example what we're doing here is that we're copying all the files from the project into the image using the copy command and then we are executing npm install to install all the project dependencies inside now what happens if we make some code changes in our application since we are copying everything into the image this means copy command will be executed again because it needs to copy the change files right all the changes that we made in the code but the next line of npm install is also re-executed even though we didn't change anything in the dependencies now why is that why isn't it used from cache well that's because once a layer changes all the following or downstream layers have to be recreated as well in other words when you change the contents of one line in docker file caches of all the following lines or layers will be busted and invalidated so each layer from that point will be rebuilt however instead we want to take advantage of docker cache and we want things that haven't changed to be reused from cache again giving us an advantage that the image will be built fast so in our case we don't want to rerun npm install dependencies every time some file in the project changes we only want to run it when package.json file contents have changed which includes all the application dependencies so let's restructure the docker file to only copy package.json file and then run npm install and only after that run the copy all the files comment in this case if we add or remove a dependency in package.json file it will be copied and npm install will be executed if we change any other file in project these two layers will be reused from cache so npm install will not get re-executed and you can see that in the output of docker build command as well where you have the information that a layer has been reused from cache or a layer has been rebuilt so the rule here and the best practice is that you should order your commands in the docker file from the least to the most frequently changing commands to take advantage of caching and this way optimize how fast the image gets built before moving on i want to give a shout out to castin who made this video possible castings k10 is the data management platform for kubernetes k10 basically takes off most of the load of doing backup and restore in kubernetes from the cluster administrators it has a very simple ui so it's super easy to work with and has an intelligent logic which does all the heavy lifting for you and with my link you can download k10 for free and get 10 notes free forever to do your kubernetes backups so make sure to check out the link in the video description and now let's continue now usually when we build the image we don't need everything we have in the project to run the application inside we don't need the auto-generated folders like targets or build folder we don't need the readme file etc so how do we exclude such content from ending up in our application image in order to reduce the image size and that's our next best practice to use a dot docker ignore file and it's pretty straightforward we basically just create this docker ignore file we list all the files and folders that we want to be ignored and when building the image docker will look at the contents and ignore anything specified inside but now let's say there are some contents in your project that you need for building the image so during the build process but you don't need them in the final image itself to run the application the way it works is that while you build an image from a docker file many artifacts actually get created which are required only during the build time and this could be development tools and libraries needed for compiling the application or this could be dependencies needed to run unit tests could also be some temporary files and so on if you keep these artifacts in your final image even though they're absolutely unnecessary for running the application it will again result in an increased image size and increased attack surface a specific example for this is a package.json or palm.xml or any other dependencies file which specifies all the dependencies for the project and are needed to install those dependencies however once the dependencies are installed we don't need these files in the image itself to run the application another more interesting use case is when building java based applications for example we need jdk to compile the java source code but jdk is not needed to run the java application itself in addition to that you might be using tools like maven or gradle to build your java application those are also not needed in the final image so how do we separate the build stage from the runtime stage in other words how do we exclude the build dependencies from the image while still having them available while building the image well for that you can use what's called multi-stage builds the multi-stage builds feature allows you to use multiple temporary images during the build process but keep only the latest image as the final artifact let's see how that works this is an example docker file with two build stages the first stage which we call build specified like this is used to build the java application using maven tool and in the second stage which starts from here with directive from tomcat we use the files generated in the previous build stage and copy them in the final image so the final application image is created only in the last stage using these two lines and all the files and tools that we used in the first stage will be discarded once it's completed and also we talked about layers in our case the final two commands of this docker file will create the layers of the final image again all these previous steps will be discarded so this basically helps us separate the build tools and dependencies from what's needed for a runtime and gives us images which have way less dependencies and are much smaller in size now when we create this image and eventually run it as a container which operating system user will be used to start the application inside well by default when a docker file does not specify a user it uses a root user but in reality there is mostly no reason to run containers with root privileges and there is also a bad security practice this basically introduces a security issue because when container starts on the host it will potentially have root access on the docker host so running an application inside the container with a root user will make it easier for an attacker to escalate privileges on the host and basically get hold of the underlying host and its processes not only the container itself especially if the application inside the container is vulnerable to exploitation to avoid this the best practice is to simply create a dedicated user and a dedicated group in the docker image to run the application to create the user in its group you can simply run user add and group ad linux commands like this and once you have that user to also run the application inside the container with that user you can use a directive called user with the username and then start the application conveniently some images already have a generic user bundled in which you can use so you don't have to create a new one for example node.js image already bundles a generic user called node which you can simply use to run the application inside the container finally how do you make sure and validate the image you build has a few or no security vulnerabilities and so the next and final best practice is once you build the image to scan it for security vulnerabilities using the docker scan command here note that you have to be logged into docker hub to be able to scan your images so you can do a simple docker login on your command line and then execute docker scan command with image name as a parameter and in the background docker actually uses a service called sneak to do the vulnerability scanning of the images the scan uses a database of vulnerabilities which gets constantly updated so new ones get discovered and edit all the time for different images and this is basically an example output of docker scan command where you see the type of vulnerability a url for more information but also what's very useful and interesting you see which version of the relevant library actually fixes that vulnerability so you can update your libraries to get rid of these issues and in addition to scanning your images with docker scan command on a command line interface you can also configure docker hub to scan the images automatically when they get pushed to the repository and you can see the results of that scanning in docker hub or docker desktop and of course you can integrate this check in your ci cd pipeline when building your docker images so these are eight best practices that you can apply today to make your docker images leaner and more secure of course there are many more best practices related to docker but applying these will already give you great results when using docker in production do you know some other best practices which you think are super important and have to be mentioned please share them in the comments for others as well and finally make sure to check out the video description for more learning resources and with that thank you for watching and see you in the next video
Info
Channel: TechWorld with Nana
Views: 92,641
Rating: undefined out of 5
Keywords: docker best practices, docker security best practices, docker production best practices, docker tagging best practices, docker, dockerfile best practices, dockerfile, dockerignore file, dockerignore, docker cache, docker multistage build, docker reduce build time, docker reduce image size, reduce docker image, docker security scanning tools, techworld with nana, docker image, docker tutorial, best practices docker, docker in production
Id: 8vXoMqWgbQQ
Channel Id: undefined
Length: 18min 26sec (1106 seconds)
Published: Thu Nov 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.