dotScale 2013 - Solomon Hykes - Why we built Docker

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I'm going to talk to you about docker which is an open-source project which we released a few months ago and it's a portable container engine I'll talk about what that means and you know we've released quite a few open-source projects at dock cloud it is kind of a regular thing we do but docker is pretty special because almost immediately it got a lot of buzz right people got really excited about it and we didn't really expect it we we thought it was great I personally am really excited about docker I think it's a you know I probably the most important project I've worked on but because of the buzz because of the excitement I get a lot of questions where people come to me and I say what what the what the hell is dog her in the first place I heard about it I don't get it you know why what's the big deal and and since it's all about containers I also get this other question which is basically what the hell is a container why should I care and since I get that question a lot at the beginning I was just demo docker that would be my presentation just get a terminal and show it and I thought okay let's let's take some time and talk about why why I'm excited about it well I think it's a big deal and it is a big deal I'll start by saying that and I think containers the concept of containers and the implementation behind it will be a big deal for all of us I think it will affect the way we work the way we create software the way we distribute it in big ways in the coming months and years and I think that's really exciting so that all has to do with the problem of shipping software specifically moving it from machine a to machine B whenever you're getting your code to work a machine a machine B in the same way and you're making an efforts to know that it will behave the same way in both machines your shipping and that's a perimeter that we use all the time and the problem is you know we want it to be reliable we wanted to be automated and a lot of times it's not and so you know examples of what what I mean by shipping is if you're sharing your development environment with another with a colleague or contributor of your open-source project you're shipping right if you're deploying from your development machine to stick to a staging server you're shipping same thing if you're scaling out from one server to multiple servers then you're shipping the same code to lots of different servers and again if you're migrating from one hosting provider to the next are we going from a private cloud to a public cloud etc etc you know going moving into a new data center again you're shipping the same code to even more machines and again you hope that it behaves the same way and and hopefully you can do that in some sort of reliable automated way otherwise your your life becomes really difficult and that's the problem our lives are I think really difficult right now because that stuff is way too hard it's really hard for a developer to get his code all of it to run reliably on on all these machines that has to run on and the reason I think is that our applications got really complicated some at some point in the last few years our software stacks started looking really really complicated I try to make a drawing of that but you know used to be this really nice simple stack and now it's a service-oriented architecture with loosely connected components that interact with each other over the network you may have all sorts of different languages working together software components that you don't even know what language they're written in you just have to use them so you've got this really complex software stack and it's running on an increasingly complex hardware infrastructure right you're developing on the VM in your laptop your your your moving that out to a staging server you know the whole point of cloud infrastructure is that you can have more machines more easily right you can you make me make an API call and boom you got a server now increasingly we're seeing organisations deploy them on private clouds like OpenStack and then on ec2 increasingly we're seeing companies spread out loads across multiple hosting providers so what that means is the same software stack that can be pretty complicated to ship in the first place now you have to ship more and more and more so that the difficulty is multiplied and so I threw together an example of what that that stack might look like to be sure we're talking about the same thing here but you know typical application that I this is an application I've seen there's you know it's it might be a static website hosting that's hosted on your own version of nginx Ruby on Rails front ends a REST API endpoint that's written in Python using flask that sends jobs to a Redis queue which is consumed by another Python worker using celery and that worker may have all sorts of special libraries built in to do special work you know in this case was ffmpeg and so the list goes on that stack just kind of sprawls out and as a developer all of it is now your responsibility right these components use different languages if you're using Hadoop for analytics that's Java so you need to figure out which version of that JVM you want which system libraries you're going to depend on every time there's an application server you have to pick a version eventually have to pick patches you have to tweak the configuration and all of that that's your application the whole thing and then again that that's going to move around from machine to machine soon enough your customers are going to say hey I love that software but I don't want to mutual eyes your version of it I want to run it on my own private servers can you give me an appliance and again you have to ship it and so what you end up having to deal with is a sort of a matrix of every possible every software component in your stack multiplied by every element of your infrastructure that it has to run on and every intersection of that matrix has to work if it doesn't work something's in the break right you're going to test on Python 2.7 and then it's going to run on Python 3 in production usually it's the other way around but and something weird will happen you know you'll rely on the behavior of a certain version of an ssl library and another one will be installed you'll develop or you'll run your tests on debian and production is on RedHat all sorts of weird things happen and the underlying hardware properties are not always the same right the network topology might be different the security policies might be different storage is different everything's different basically yet somehow the same soft to run everywhere so how do we deal with that and how do we avoid how do we fix the current situation caused by this matrix which is basically a very brittle very labor-intensive process how do we fix it well the first thing we can do is look for examples of other people who have the same problem before and fixed it and we happen to have such an example in the shipping industry so the shipping industry obviously is in the business of moving physical things across the world from point A to point B and obviously has been around for centuries and for the longest time it actually operated in a way similar to the way we ship software right so we had all sorts of physical goods you know furniture boxes barrels bags of stuff etc and then you had the infrastructure necessary what's necessary to ship it so we had boats and trains and cranes and warehouses and all that stuff and the the process of shipping something let's say I have you know I want to ship coffee I've got bags of coffee to get it shipped I actually have to worry about all sorts of intricate details about how it's going to be shipped right is it going to be shipped along with a piano and is the piano going to be sitting on the coffee beans crushing it in the process there's the staff in Rotterdam that I'm going to employ know how to handle bags well some of them be lost right so every step of the way you've got this combination of infrastructure for shipping and things that are being shipped and you just got to figure it out as you go right and the result again is this complicated matrix and a very labor-intensive and unreliable process and so that's been that that was the way it was for a long time and then one day in the 1950s a few people got together in the shipping industry and agreed on a standard box they agreed on the dimensions they agreed on the weights they agreed on the way the doors would work so they agreed on format and they agreed on a standard set of operations basically they agreed on an API and started using it and so the shipping container was born and you know it's a pretty ugly box I think everyone knows what that what that box is but that box literally changed the world it changed the way they change the shipping industry first of all because what it enabled was separation of concerns all of a sudden if I want to ship coffee all I have to do is put it in my container any way I want with other stuff if I want to it doesn't matter then I close the door I seal it and I put a tag that identifies the container and from that point on getting the container to the other side of the world is no longer my problem my infrastructure provider will take care of that and I will all I have to do is wait for the container to show up on the other side I break the seal I open the door and it's my problem again in between the two I don't need to know what happens and I'll need to know where it goes through what harbor what infrastructure it doesn't matter and conversely the infrastructure provider does not have to worry about what's in the box right they have standardized cranes standardized boats and as trains etc all of which are interchangeable all of which can deal with any container in the world I think there's there's something like 50 million shipping containers in the world today every single one of those containers can work can be loaded by the same cranes onto the same trains the same boat etc and that's actually very powerful because with the separation of concerns comes automation and with automation comes reliability and low costs and and basically what happened is the shipping industry exploded and by the end of the 60s and the 70s there were so many ways to ship things to the places where was not practical to do so that that marked the beginning of global trade and it literally made the world smaller right it's one of those few inventions that really change the world very very rapidly and so this brings us this embarrassing situation where it's actually that the automation for carrying coffee across the world is better more reliable than the kinds of tools we use to ship software between computers right I think that's pre embarrassing and so I think that it's important that we copy that idea and come up with a shipping container or something similar for software right going back to my example from the beginning I thought my developer I should have a standardized way to take any software component put it in some sort of box and then hand it to infrastructure providers and not worry about how it's going to be deployed all right separation of concerns and and the operations team in charge of infrastructure should be able to do all sorts of things with that container without worrying about how I built it with language I use all of that stuff now the next question is why you know why hasn't been it has it not been done before after all you know it sounds a lot like sandboxing and you guys write code so you're thinking well I write Java and I use jars right isn't that a container or I write Python I use virtual and physical Taner and so yes those tools enable me to sandbox my code they enable me to put something in a container but I can't put everything in it in other words the the sandboxing is incomplete and I think very quickly when you start when you keep working on an application you realize I can put all of my Python dependencies in the virtual end but I can't put my system libraries in it and there's always a dependency at some point that goes outside of that box right in the modern application environment the whole point is that you can use components from this huge community of developers running in all sorts of languages right we're supposed to polyglot and so what's the point meeting polyglot if you can't we can't reuse each other's code so these tools are great they're practical I use them every day but they're not sufficient ok then option to what about PMS right if if a Python package is not enough if a jar is not enough then let's just take the whole machine let's put the software in it share the computer along with it that way we are guaranteed to have the same context for everyone and that is actually a very good idea and and I believe the only way to share software and a truly reliable repeatable way is to ship the whole system with it because really the system is part of the application your choice of distribution your choice of system libraries all of those choices even if you didn't make them consciously maybe you just use the system that was lying around there that is affecting the behavior of the application and if you swap it out things will change the problem though with virtual machines is that they they bundle too much right you do want the whole system but you do not want as a developer to package things like a whole hard drive right a whole virtual set of processors of network interfaces you don't want to be deciding as a developer this is how storage is going to work for this application everywhere this is how networking is going to work this is how much RAM there's going to be this is this is what the kind of processor that's the kind of processor that you're going to use you can't do that because then you're breaking separation of concerns the infrastructure provider is not free to implement though to make those decisions based on based on the current machine right so to take the the metaphor of shipping campaign containers the the shipping company should be free to choose the crane it should be free to choose the vote my the fact that I'm giving them coffee bags to ship doesn't mean I can tell them which crane to use that's a whole point so another problem with virtual machines is they are extremely heavy anyone who's tried to simulate a whole stack of maybe 10 components on their laptop by booting 10 VMs probably agrees with me right if you've ever tried to do computationally intensive work on the cluster of VMs you also agree with me there was overhead there's it takes CPU it takes Ram takes a long time to boot it's a machine so it's not practical as a unit of software delivery so okay what other options do we have and this is where the set of discoveries that that lead to docker come into play the there is a way to get the best of both worlds and the best way to describe the best of both worlds is I want to stand box the entire system so that as a developer I know everything that's going on I have a guarantee that what I ship is going to be repeatable but I don't want to ship the Machine details because that's too much and I don't want the performance hit of IBM so that's kind of the pipe dream and it's been the pipe dream for a long time and the good news is now it's possible to have that and to have it be so fast that is actually scary how cool it is to use and that is all thanks to the Linux kernel and other kernel hackers in the world that have finally implemented namespacing that works and what that means is you can now using a modern Linux kernel I select any process from the others and basically make that process believe that it has actually its own VM when really it doesn't and that includes isolating the filesystem the network interfaces memory you know resource access the whole thing and it works in the Linux kernel it works well enough to run in production and that's what we've been using a doc cloud for a long time testing it dealing with its quirks and and now it's ready and the result is best of both worlds so enter docker right the reason docker exists is that these capabilities which we did not develop right the the countless developers of the Linux kernel did they are raw capabilities they are not readily usable by everyone it's not the point right the point is to use them as building blocks as build something else so what's missing is a standard container format that developers can use and that's what docker does so we do three things the first thing we do is we define that format and by the way this is our new logo I just found found out about it today we did a contest these little boxes are shipping containers obviously so we define the format the standardized how to ship software right and then we give simple tools for a developer to build his source code into a container regardless of the language regardless of the build tool regardless of all of that and separately we give the ops team the guys running the infrastructure simple tools to take that container without having to know what's in it and then run it and run it hopefully on as many machines as possible so we're working very hard to make docker run on as many machines as possible so that's that's how docker works in essence one thing we've tried to do we've try really hard to do is not come in and say hey if you want to use docker you've got to stop using these other tools right and that that works for developers and ops we don't want we're not telling developers you got to learn this new packaging system you got to learn this new you know start you stopped using make stop using pip install stop using jars use this thing again learn it throw everything out we're not saying that and we're not telling the ops guy forget about Chef puppet and all that stuff forget about your your your current system packages what we're saying is here is an ingredient which is very lightweight it's designed to not get in the way it's designed to improve your existing tools and if you integrate it into your tools then all of a sudden things will start getting really awesome and you'll have all these nice properties specifically you'll be able to ship software from machine a to machine be reliably and automatically which is the whole point so it's open source it's written in go which has a nice property of compiling down to a static binary and you can drop it into any server and it just runs it has all sorts of nice properties like that really you can think of it as the result of the last five years working at doc cloud trying to figure this problem out and finally reaching a point where we feel like we found a good solution that is simple and elegant enough to share with everyone and that's what we're trying to do so I want to encourage you guys to check it out if you're interested the thing we need most right now is feedback docker just reads a version 0.4 it's very young it doesn't work perfectly yet we need more people to try it break it find bugs adapt it into their own workflow we've got an amazing community of people from all over the world war building incredible things on top of it and breaking it in ways that still blow my mind and we need more of that so if you're interested come see me I'll give you a demo and yeah thanks a lot for having me you
Info
Channel: dotconferences
Views: 32,278
Rating: 4.9802632 out of 5
Keywords: solomonstre, cloud, docker, containers, dotcloud, dotscale, dotconferences
Id: 3N3n9FzebAA
Channel Id: undefined
Length: 20min 48sec (1248 seconds)
Published: Thu Aug 01 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.