Dockerfile Best Practices

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay welcome everybody definitely time to to get started and grab a seat for this amazing forty minutes we're going to hear about dockerfile best practices I'll be your MC for these 40 minutes my name is David Butler and I'm in the product marketing team at Dockers so it's great to see everybody here but you're really here to hear two great engineers out of the darker team Tibor Vass who's a maintainer a docker engine and a build kit and we've got sebastian van Stein's also maintainer of the docker engine and we got lots of great tips about the dark a file here so with that guys you can get started and my mic it's properly here all right thank you very much welcome everyone thank you for coming so numerous Sebastian and I would like to show you some of the best practices when you write darker files and especially in the context of latest improvements with milk with build kit all right the clicker never works for me all right so it is first of all docker files are very a very simple blueprint to build up your images they're very popular there's like over a million docker files on github the reason for their popularity I think is their simplicity but also the fact that in in the darker builder you you have build caching when you when you build so you have a much faster iterative development with with doctor bills some of you may know that the current builder has some limitations and that it could be improved and that's exactly what we what we did it's a new version of the Builder called build kit it has its own it's its own project but we incorporated it in docker 1809 and it has a bunch of new features it's the next generation builder can you can I ask a question please raise your hand if you heard of build kit before okay that's that's a few thank you and some of you may have used it already can you keep your hand okay alright thank you very much so first those of you don't know build kit brings like a lot of performance improvements also unlocks a bunch of new capabilities that will cover some of which we'll cover in the talk later I just want to point out that Windows is still work in progress so it's some of the things that I will talk about won't be what will only apply to Linux and not not Windows as of today at least so I invite you to use latest docker 18:09 works fine and in order to enable bilk it this is what you do you have two ways on the client side it's with this environment variable called Ducker underscore underscore build kit equals one if you don't want to do that on all the clients HUMINT you have deployed then you can also enable that onto docker demon with this features built gate equals true config so yeah our intention is to make build kit the default but so right now it's up tin that's why I invite you to to try it out let us know if there are bugs in your and your when you build your doctor files but yeah our our goal is to make a default so for an exhaustive list of features I put a link here obviously we won't cover a lot of them but yeah this is just for for reference with this I'll hand it over to Sebastian who will talk to you about improving docker files yeah Thank You Tibor so today we're going to be looking at improving your darker files and we'll show some examples some ways how to apply best practices and we'll be touching on some improvements you can make we'll be looking at the build time the image size maintainability of your darker files also touch on security and repeatability some of these these things may be known to you as already so we'll try to escape the the basic parts or go through them as fast as possible we'll start with an example application sorry it's a project it's a Java spring hello world web app it has a docker file it has some documentation and pum dot XML a readme a source and the target directory directory containing a pre-built version of the application let's look at the docker file it's not too complicated it uses Debian image it copies the source into the app directory it installs Java and it sets the default command for the image to run well there's one easy fix already in this docker file so let's get that done and now get on with your real work so incremental build sign because that's what it's all about when you're developing you're building the image and you make some code changes and then you need to build it again and it's important here to make cash the build cache your friend and although it's there there are some things to take into account during here using when you're writing your docker file because first of all order is important if you make changes to any line in or in any stage of your docker file then subsequent stages they're they're cash will be busted so order your steps from least to most frequently changing steps to optimize your caching in addition when you're copying files into your image make sure you're very specific as to what you want to copy because any changes to the files you're copying will bust the cache so in this case we had a prebuilt application so we only copy that into the image that way unrelated file changes will not affect the cache sometimes you want things to be cached together so identify cacheable units for example if you use a package manager such as in this case a baguette you want the cache of the package manager itself so the the package index to be updated together with installing the packages so if you add an additional package or remove one they should be cached together or not cached at all image size can be important smaller images faster deploys and a smaller attack service so don't install things you don't need in your image don't install debugging tools you can always install them later it really needed and while we're at it we can also use the no install recommends flag which basically make sure that you don't installed dependencies that you are not really using and finally clean up some cache it's from your package manager you don't need them after installing the packages so why keep them in the image just remove them afterwards so we optimized the docker file for caching and size but now it's become a bit complicated look at this it's way more complex than it used to be so how can we optimize this well let's have a look at maintainability because you want your darker file to be as simple as possible so oh sorry if possible have a look at the official images because in this case we were installing Java and there's many more people doing so and there's official images that have all the installation steps already done for you it can save you a lot of time on maintenance they will be written already with containers in mind so all the best practices applied and in addition if you have multiple projects they can share those layers because they use exactly the same base image so applying that to our docker file we can remove all these steps and have a very simple docker file but be sure to not use latest if you're using text from from docker hub there's always a latest tech but that's a rolling text so you you never know what's gonna be there today or what's gonna be there tomorrow so be specific as to what you want to to base your image on in this case we're using open JDK 8 but there's many more tags that are available and you should have a look at the docker hub documentation which lists exactly which variants exist which also means that sometimes there's more smaller variants maybe you don't need all the stuff that in that's in the bigger images but there might be a smaller one in this case if we just switch the base image we can reduce the image size with over 500 megabytes which is the quickman have a look at reproducibility because one things stood out in this application we're copying a pre-built binary into the image so how did that image come to existence consider building your application aspired of the whole build process your daka file should be the blueprint of your image and the source code should be the source of truth for your application in addition doing so inside the docker file makes it easy to get a reproducible build environment and it becomes the whole blueprint of your application so modifying that the docker file to do just that we switch to a different base image that has maven installed we copied the pom that XML and the source code and we built the application okay this is a good start but now there's one thing because every time we make a code change all the dependencies will also be fetched each and every time and that's not something you want so again identify cacheable units in this case is split resolving the dependencies and installing them to a separate step so that that part can be cached and building the application will take after that now there's one problem because this is all great we have a great image and it works we are able to build it again but now again the image has become quite a lot bigger and not only that we're also shipping all the development tools and the builder tools as the final image and it will not don't want to be deploying your build tools because you don't need a metric at runtime so how to deal with this there's a solution multistage builds if you haven't been using them yet it's busy still a docker file but you specify multiple from come and each from the limp starts a new stage you can build you can name those stages success in it in this example we have a build a stage and a final stage and you have a separation of concerns so in this case we have to build a stage in which we build the application then a second stage that starts with just the Java Runtime and then we copy the artifacts from the Builder stage to the final stage and only the final stage is what you will be pushing to the registry and what will be deployed and all the other parts will never be shared so this is only one example of using multi stage builds but there's many other cases where multi stage builds are very useful and I'm gonna be handing back to Tibor what's gonna talk about that Thank You Sebastian yeah like Sebastian said multi stage Dugger files it's not just for reducing image size and that's part of what this talk is about so if like some of these projects movie and build kit they have sixteen or forty four stages in the case of a build kit and you would you could be wondering how come how come there are so many multi stages or how many how come there are so many stages in a docker file when the most popular use cases is to reduce the final image size so I listed some of the use cases here the first one being why you most of you may know that Sebastian talked about there are other ones we'll cover some of them except maybe for the last one unless you are interested at the end for Q&A I can talk a little bit about it but yeah so before explaining some of those use cases just a quick refresher when you when you name a stage you can build only that stage and all of its dependent stages with the - target flag so if you do dr. bill - - target stage name it will build the stage named stage name so the first use case was or the second use case I listed in our list was customizing your image or having different image flavors so here instead of being instead of having the Alpine based image for our app we also want in addition to that a Debian Jessie based image you know Alpine uses a muzzle Lipsy Debian jessee's is good new leap seed there can be there can be differences it's for the QA team asked for it so we have we have both so that's what it looks like the problem is you don't want to repeat yourself so we cut you could literally copy these two lines and and in a bigger docker file that these lines can be more than just two lines so what you can do is use a variable right in this case we call them globe like build arcs so if you define a global arc which is the first arc variable or build arc at the top of the docker file before the first from then you can use them in in in from lines as well which is exactly what we're doing here flavor by default is alpine and you are building open JDK 8 JRE alpine by default but if you specify - - target release with - bill dark flavor equals Jessie then that same stage will be built based on the Jessie base image another use case here is how to get various containerized environments for for for your for your development so I came up with some of this example these examples there are absolutely not strict at all just like patterns I I noticed there are variations on this so don't don't take this literally so one example is a builder where you have all your build dependencies then build is exactly the same but with your build artifacts built in them crosses if you do multi platform dev is basically your built artifacts plus your dev or debug tools lint is actually let's start I think the first example here is a linter simple linter for java essentially what you want is every stage wants to be minimal and have the minimal amount of dependencies so if you don't need the whole JDK to do lint lint ink then you can just use JRE so yeah this is just a simple example for lint for for for debugging or developing your application you might want to have you want you might want to use like s trace or simple editor or TCP dump like you can you can customize the way the way you want but it's essentially the dev environment can be also defined in the docker file itself and every time you can just use that - target to run to build that specific image for tests for instance in the Builder but like this is very maven specific here but by default when you do maven package it will run your tests and you could decouple that as well you could say you know what I just want to build those the my jars and I'll run the tests in a separate in a separate stage for integration tests you can have all the dependencies that you need for integration tests in this case I just gave curl as an example but it can be anything I know it is that for integration tests here your based off of the release stage so you you don't have any JDK or development build dependencies build time dependencies it's all just your your really started facts on top of which you had your integration tests Dependencies and now is the time i will kindly ask you to pay extra attention because we're going to talk about concurrency this is what it looks like when you start a build kit concurrent build so by default a linear docker file looks like looks like the following all the stages are linear you start from the top it goes all the way to the end and all stages are executed in sequence and with when you don't use build kit then unneeded stages are still executed and they're just discarded which is a huge waste so instead of having all your linear stages like this what you want is to essentially create a graph of your dependencies and build kit will traverse from the bottom which is essentially the stage that you named with - target to the top and in the process a one even it won't even look at unneeded stages so they're not even considered I made a very simple graph here but you can imagine how how your Ducker file graphs can be much more complex and and so the key point here is that s2 s3 and s4 stages will run concurrently and only when all of them have have been are done will s6 the stage 6 be executed so how to do this or what use cases are there so assets I was thinking of assets as an example in your final image sometimes you know if you're having like a web server or whatnot you you might want to put your assets in there you don't need to wait to build your entire app to build your assets you can totally build your assets in a separate stage and be have that in done in in in parallel so the way the way you do the way you make this parallel is by having those coffee - crumbs very near each other usually they're stacked all all at once in this case both the builders stage and the assets stage will be running in parallel and and that final image will be done building when both of those stages are done so basically what you just saw about assets can be generalized to any part of your Ducker file for instance usually it's the Builder part that takes most of the time in in simple cases it's it's it's fine but you might have a lot of build dependencies and you don't necessarily want to wait for all your build dependencies to to build in in sequence you might realize that hey actually that that part over there could could be built in parallel so in this example I just I was alluding to a see library that you might need a fork of or and then another one with a C++ library so the way you do this is when you have your Builder stage you just call that builder base you're you you split out the your build dependencies like the when you're building your C library and your C++ library you put them in their own stage with their own minimal dockerfile stage and well or and their own minimal dependencies and you build them and so this will build both the C library the C++ library all in parallel and now your builder stage is basically inheriting the Builder base part and waiting for all your libraries to build so this is the pattern essentially is having those multiple copy from that's how you create concurrency and width and build kit leverages that concurrency so I want to just point out a little thing that's quite useful if you are able to specify your prefix I know you can I think you can do that with the Alpine package manager' I'm pretty sure you can do that with others as well but in the case of like libraries that you install you don't you can just set a prefix like slash out and install it there and then at the very last stage you just you just take everything from out into your final builder or final into your builder image the nice thing about this is that it's very much scoped to that one folder that you that you specified into prefix and you don't need to do copy from of all the different files that are scattered for instance like when you install a package through the package manager there are many files that are installed everywhere so the prefix is useful couple of just benchmarks to give you an idea so 18:03 is like basically without build kit at all the old builder the legacy builder this this is done on on the mobi movie repo and you can see here it's twice as fast from an empty state so don't have anything you just do docker build on on both machines and it's twice as fast repeated build with matching cache this basically means you built it once and you rebuild right away and it just hits the cache all the way it's like seven times faster with build kit and this is the most important one which is you make a source code change and you rebuild and what's the what's the improvement here so two and a half times faster with built kit and this is without any new docker file features or improvements this is just you enabled build kit with by using multistage so this is all thanks to multistage and now we're going to talk about new docker file features whoops I guess I forgot to finish those slides so this one directive we call at the very beginning it's a way to enable the features I'm going to talk about this will this this one line will be able to parse the docker file and and understand the new features what I forgot to put in that box is to is I wanted to explain what experimental means here it doesn't mean that will break compatibility it just means that we're not sure about the syntax yet it might we might want to mature itself a little before we make it in the mainline stable docker file syntax so yeah if you want to learn more details about the various diverse new features and syntaxes I you can look up on the mobi built kit page this this documentation here and the first feature I'm going to talk about is context mounts so again you need 18:09 with build kit enabled to have this and you need the syntax directive at the top what this does is instead of doing a copy of your source code into the into the interior stage you can have a specific run line have have parts of your your build context Bank mounted so there's like zero copying here it's literally just a by now the way do that is - - melt and you specify the target and in this case for maven we need to specify a different output directory than than the current one because of the because of the by mount so yeah looks a little cleaner so this is basically the slide that Sebastian had earlier whoops the Sebastian had earlier this is without build kids and I wanted to point out that you can you can use application cache which is a new mount type to do to do something a little better than that which is you allocate a folder on it essentially it allocates a cache folder that you specify what the target is and in this stage in this case for maven the the target is dot M 2 is the dot M 2 folder and every time you run the build it will put the it will bind amount the cache folder to that dot M 2 folder if it has it it's like a best-effort you can't rely on it being there but if it's there your build will be much faster because in the case of maven that means they won't have to pull down all your dependencies every time so this is of course not maven specific you can all the languages and package managers they all have their cache folder and essentially can use this type cache mount and specify that the cache folder for apt go and go modules and and p.m. tip I listed some of these here secrets please don't do this so the reason you don't want to do this is because this will put your environment basically our secrets in the in the image and you could be like yeah it's alright I have my private registry it's all you know it's all closed I control everything but actually that's that's not a good practice first of all and and it can it can leak it can leak even if it's private so don't do this so some people do this and please also don't do this because so this being using build args instead of environment variables what build args allow you to do is to not commit the the environment variable in the final image however all the run commands have-have the the value of those environment variables in the Duggar history so you still have it in the image or at least in the final image so so yeah so what you want to do is use the new secret feature in build kit so the way you do this is you type you you mount a secret type sorry you use the secret type amount you give it an ID in this case we just call it a double us for our credentials and you give it a target file or you can also specify if you want that this this secret is required for your build to run and the way you use it is by doing darker build a secret you give it the ID and you give it the source on your local client essentially so this will build this will run only that stage to fetch assets from s3 it will only run and that's it will only bind my other credentials in that line in that container and no other build scripts will will see those credentials so it's much more scoped another feature that people want is to just build or git clone private gate repos so some of you may have to put your private keys in the image for the same reason so this is a very bad idea don't do this instead what you want or what we allow you to do now with build kit is to mount the SSH socket so it's that easy just to digest mount type equals SSH if it's required you add required and on docker build you do that - - SSH equals default which is the the the default configuration so you do need to add like get like github.com in this case to the known hosts but yeah so okay I think I forgot to give you the demo of the concurrent stages so I'll give you the recap and then we can go back to the demo so essentially we went from inconsistent build F test environments to a consistent build F test environment we went from bloated images to much more minimal images slow build our incremental build times due to cash busts and whatnot to very fast build and incremental build times and building more securely so there are some blog posts you can you can read from tonus who's the creator of build kit and maintainer and maybe I'll just get to the demo real real quick if we have time do we have time all right so perfect I just don't know Hana all right is it big enough I can make it bigger line wrapping one help though but so just want to show that these are two different hosts whoops okay all right sorry about that so these are two separate hosts but our have the same have the same content have the same power and just want to show you the Ducker file that we're going to build whoops so I am using team ugh this is not virtual reality this is just two terminals in the running at the same time so essentially we're going to to build this stage here and you can see the the parallel thing where it first takes run see then build control then build kit D these are three binaries and this is what you are going to build on the left plane on the left pane we don't have built get enabled and on the right plane we do so I disable build kit with equals zero I enable it on the right with equals one and I just specified the dockerfile and the target and I hit run so nothing special I I won't have you wait for the whole thing to build but this is just to show you the difference in when it's all cached you can see that most of the most of the bottleneck here is the uploading the context build kit has lazy context uploading so it's if it doesn't need to upload anything and won't even send once and much in this case it's sent like 37 bytes so you know so now let's do something more interesting which is modifying the source code on both machines so like let's say build utility demo'd dunkirk on and rerun it so what this this is without the mount the cache mount improvements in built kit so this is literally the same darker files on both left and right that are that don't use built kit features like sit build gets in taxes in the dockerfile I would say so actually this might take a little more time than I thought so yeah this is it the problem here is that it's it's literally building everything so yeah it figured out that nothing changed in Runcie so it the Runcie part is all cashed on the right and the build kit one is is done all right and so I don't really need to wait for for this one to finish either it will take more time what I also want to show you is a an optimized version of the docker file that uses bilk it features fineries yeah so we're going to essentially going to build this stage here you can see the variables that I mentioned etc and what I want to show you here is the amount target the mounting the cash for both the build go build cache and the Google modules cache etc etc so this uses the new two new features so let's resync these two let's modify whoops let's modify again the dockerfile oh of course because I only opened it here it'll control me don't go that's add a couple more exclamation marks and so this time I'm building with build kit equals one on both machines but one of them uses the non-optimized docker file and the right one uses the optimized docker file with you can see Tesla built kit the docker file it's the optimized one on the right and done eight seconds I don't think we want to wait for the left one but uh so that was the demo and I'm happy to take any questions with Sebastian thank you very much for attention [Applause]
Info
Channel: Docker
Views: 30,782
Rating: 4.9595962 out of 5
Keywords: Docker for Devs
Id: JofsaZ3H1qM
Channel Id: undefined
Length: 39min 35sec (2375 seconds)
Published: Mon May 13 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.