RailsConf 2018: Containerizing Rails: Techniques, Pitfalls, & Best Practices by Daniel Azuma

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

(techno pop music) - Wow, well, thank you all for coming. I know it's the last session of the afternoon, and usually for me, if it's the last session of the afternoon, I wanna be doing something other than sitting in a session. I'm really grateful that you've all come. This session is Containerizing Rails. My name is Daniel Azuma and we'll get started. Basically, during this hour, we're gonna be talking about container best practices. We'll talk about how to wrap your Rails application in Docker containers. We'll talk about how to optimize your containers for size and for performance. We'll talk about what should and what should not be in your containers. Basically, we'll talk about how to make your rails application work well in a container-based production system. We're gonna start off with a little bit of background on Docker. I'm not going to make assumptions about how much experience you've had with Docker or with Kubernetes or related technologies. You should be able to follow this even if you don't have a lot of experience, but, that said, this is actually not meant to be a first time tutorial. It's not really a first time beginner thing. I kind of have to apologize for this. I know that the program says that there's kind of going to be a walk through for deploying an application to Kubernetes, and I ended up cutting the walkthrough for time. I didn't get a chance to update the program. There are good tutorials on Kubernetes that you can find online. But what we're going to be focusing on here today is best practices, is tips for making better containers. That's pretty much what we'll do. Let's start off with just a little bit about me, about your speaker. My name is Daniel Azuma. I've been developing professionally for about 20 years or so. About 12 of those years have been spent with Ruby and Ruby on Rails. I remember going to my first Rails Conf in 2007. Was anyone here at the Rails Conf 2007 or before? A few. Okay, that's actually really good. I was hoping to see most of you are newer than that, 'cause that really means a community that continues to grow and continues to evolve. I'm glad to see that. So, welcome. I've been doing a variety of things in Ruby over the past 12 years. I did some geospatial work. Did a bunch of startups. Currently, I'm part of a little boutique west coast startup at Google. I'm part of a team that is focused on language, programming language infrastructure. We call it Languages, Optimizations, and Libraries, or LOL for short, as you can probably imagine. It's a team that, again, focuses on making language infrastructure better, both internally at Google, as well as externally for our customers, for you people working on the Google cloud. I'm part of the team that handles Ruby and Elixir. I'm just really excited to work on stuff that helps you, people who are using languages that I love. Glad to be here. This is a sponsored talk. I do want to thank Google Cloud Platform for enabling me to be here and to share with you about stuff, some of our ideas on containers and what makes them tick. Let's get started. We'll get started with some basic concepts. Again, just to make sure we're all on the same page. First, containers. What is a container? I think, if you talk about containers, there's one word that you really should, should come to your mind when containers are mentioned. That is isolation. Containers are about isolation. In particular, take Docker for example. A Docker container is basically an isolated file system, slice of the CPU, slice of the memory, network stack, user space, and process space. Again, all of this is isolated. It's set up so that it's very controlled how inside the container interacts with outside the container. Now, at first glance, this might look kind of like a virtual machine, right? Kind of like a VM, but it's not. This is not a VM. There's no hypervisor involved here. This is, containers are actually a feature of Linux. Containers live inside a Linux machine. They share the Linux kernel and the CPU architecture with anything else that's running in Linux. That includes other processes in Linux, and it includes other containers. You can run multiple containers on a Linux machine. Again, they are isolated from each other. Really basic on containers. The next concept that we need to cover is images. What is an image? An image is basically a template for creating a container. Most of that consists of a file system snapshot, so imagine the files and the directories that you need to start your container. Your application, the files that comprise your application. The files that comprise your operating system, any dependencies of your application. All of that lives in the image. In Docker, you create an image using a recipe called a Dockerfile. Dockerfiles, this is basically what a simple Dockerfile might look like for a Rails app. This is kind of somewhat simplified, so there aren't a lot of best practices here. It's here so you can kind of see what the different parts are. The first line here is what's known as the base image. This is a starting point for a Dockerfile. It's another Docker image. Again, it has files and it has all the information you would need to start a container. There are different kinds of base images. This one, in particular, would refer to the Ruby, the official Ruby image, which includes an operating system and an installation of Ruby. From that base image, there are commands that you would run in a Dockerfile. Some of those commands do things like copy files into the image. You might copy your Rails application for example, into the image. You can also run shell commands to do things in the Dockerfile. For example, you can rundle bundle install to install your gem dependencies, install your bundle in the image. You can also set different properties of the image. This particular property is the command that is run by default to start a container based on this image. Different parts of a Dockerfile. Again, they're basically commands that are run in order by Docker when you build an image from this file. Now, we're gonna start by looking at this first line. The base image. When you go through a getting started tutorial for Docker, there are often some, you'll be directed to use some specific base image. There are a variety of those base images. One example is, we saw the official Ruby image. There are different variants of that image using different operating systems, different versions of Ruby. There are different, there are lower level images that just install an operating system for you, so you might have an Alpine Linux image or a Debian or an Ubuntu image. There are also a variety of third party images. Some good ones from Fusion, for example. I'm not going to advocate for a specific base image here. Each one has its own goals, its own philosophy behind its design. Rather, what I'm going to do is I'm going to talk about the things that go into an image, what makes an image effective. These are tips that you can use if you're creating a base image yourself, or if you're just creating an application image based on someone else's image. That does bring us to our first pro tip here, which is that it's important to understand what's going on in a base image that you use. Don't treat a base image like a black box. It's very tempting to do that, but it's actually very important in order to make effective use of a base image to understand what it's doing and why it's doing that. Read your base image's Dockerfile. The Dockerfiles generally are not that long, it'd take probably just a few minutes to read, and most of them are available on GitHub. Take a look at your Dockerfile, get to know what operating system it installs, how it sets up the environment, and whether that really matches how your application wants to be set up. You can also learn a lot of good Docker practices by reading other people's Dockerfiles. A good practice, read base images. As you get familiar with Dockerfiles, you'll notice that there are some properties that a lot of good Dockerfiles have. One of those, one of the important ones, is size. Size matters a lot with Dockerfiles, with images. They matter in terms of runtime, in terms of how much resources your image uses, your application uses at runtime, but it also matters at build time and at deployment time, because if your image is large, you have to upload and download that to and from your CI system into your production system. Maybe you're building things locally. However you do that, some of these images can be fairly large. It's good to try to minimize that size. There are actually a lot of techniques that you'll see out there on how to optimize the size of your image. Let's take a look at a few of those because they're really interesting. One of the most common things that you'll do in a Dockerfile is install stuff. You have Rails, you're running Rails. It'll probably use certain libraries. You'll use maybe LibYAML. You might use libxml. Heaven forbid you might use Image Magic. There are various things that you'll end up using. If you're on Debian or Ubuntu, the tool that you use to install that is apt-get. One thing to know about apt-get is that it does, it's not really designed for Dockerfiles. It's been around a lot longer than that. It actually leaves a bunch of files in your file system. It downloads package lists, downloads package archives, certain manifest files. It's important, these files are not necessary for your application at runtime, so it's a good idea to get rid of them after you're done installing. Often times, when you look at a Dockerfile, you'll see something that looks like this. You'll see a line in the Dockerfile saying okay, let's go ahead and delete all these temporary files, all these cache files that apt-get uses. This is good, this is important to do, but it's also important to do this correctly. For example, don't do it like this. Don't run, don't update, install and then clean up in a series of run commands. Instead, do it like this. Combine those steps in a single run command. That's very important. The reason for that is because of how Docker images are represented. A Docker image is represented as a series of layers, so imagine you have a base image that you use, and then in your Dockerfile, you have a series of commands. Each major command that you run will add a diff on top of those layers. As you run a series of Docker commands, you have these series of diffs. The image is that entire set of layers. For example, if you run apt-get update in one command on your Dockerfile, that will download a bunch of package lists from the Debian repositories. Now, those are in your image. Those are in that diff, in that layer. Now, if you continue to run additional commands, they'll add additional layers. Later, if you run apt-get clean, those will remove those files from that layer, but those files are already part of your image. They were added in an earlier layer, so you actually really haven't gained anything here. The image comprises the entire set of diffs (audience member coughing) command that you run in the Dockerfile. It's important, again, to do it like this. Apt-get update, install, and clean all in the same run command, and what that does is it makes sure that those temporary files the apt-get update installs get removed before that layer gets finalized. They never appear in the layer. You'll see, you go out and look for Dockerfile best practices, this is one of the key ones that you'll see a lot. One loophole is we'll talk about is minimizing the number of layers as well, and that's also important. I think it's more critically important to understand what's being done in each layer. And if you install something in a layer, it's there. It's part of your image at that point. It has to be downloaded when you install an image. So very important. So next concept, again, combine your installations. So this is how it works in apt-get. If you build the source, you're installing a different source for example, download the source, configure, and make install and then delete your source directory all in the same run command so that the source files which you don't need at runtime don't get end up in the layer. Similarly Alpine Linux. This is a great distribution for example to use for Docker files because it's tiny and it has a lot of really useful features. One of those is a virtual package feature. It's kind of like, a virtual environment in Python. Basically you can install stuff using APK temporarily, so install things, use them, and then remove that entire environment. But again important to do all that within the same run command so that those temporary packages never show up in the layer. Again, very important, combine installations with cleanup. Here's another optimization technique. Some gems come with C extensions. So if you're running Rails, one of the gems that will probably be part of your bundle is local gearing. It has C extensions as part of it. So in order to install that bundle, you need a C compiler. You need a bunch of things, in fact you need Make, you need libraries, you need a whole set of builds to install that. Now these build tools, you need them to install your bundle, but you probably don't need those at runtime. And those build tools are actually pretty large. I tried installing the essential package on top of Ubuntu last night just to see how big it is. The Ubuntu bit image by itself is about 100 megabytes. With build essential, it triples that size. So this is not small. So it'd be nice to be able to install your bundle but not have these build tools in your final image. But is there a way to do that? Yes there is. There's a powerful technique that not a lot of people are using yet. But it's very useful for this and kind of a whole class of similar problems, and that is multi-stage Docker files. This is a feature that's been around for about a year in Docker. Seems like not a lot of people are using it yet, but you should, you should use this feature. It's really, really useful. The basic idea is like a multi-stage rocket. You have a initial stage that kind of does your heavy lifting for you in the build process. And then once you're done with it, you just discard it. And so only your final stage which is gonna be much smaller is then used at runtime. So this is how this might look. This is again kind of an illustrative Docker file. There are some commands that you would normally find that are kind of missing here. But the idea is that you have multiple images in a docker file, multiple stages. And it's only that last image that is finally used at runtime. The earlier images are node. So this is how this would work. So we start with a base image. I call this my base because I'm not sure if the spacing actually exists as a public image. So imagine a base image that contains Ruby, but no build tools. Normally I think the official Ruby images do have all the build tools because they expect you're gonna install gems. But that image is less useful in production because you don't need those build tools at runtime. So imagine you have a base image with Ruby and no build tools. So you start there, you copy your application to the image. Now you need to install your bundle. So let's install those little tools. So you're right, actually we'll do your apt-get update, install and clean, right? Now we bundle install. So now we've got this image which has ruby, has your operating system, has your application, has all of your gem tendencies including those build C extensions, and has all the C compilers and build tools. That's 200 megabytes of stuff that you needed to build but you don't need at runtime. So here's our first stage. Then we start over. We start over with a new base image. And then what we can do is we can copy the application directory from that first stage into the second stage. Notice what we did here with a bundle install. We did dash dash deployment. That's among other things bends your gems. That means it installs those gems in your application directory. That includes all those C libraries that got built. So when you copy that location directory, it includes your application and all of those install gems. So now we've got a new image, that's what we want at tun time. We have a base image with Ruby. We have our application. We have all the gems with all the C extensions all built and ready, and we have no build tools. And then again when you run this Docker file, that first stage just goes away. So basically tip number three, use a separate build stage like this to create smaller runtime images. Really, really useful technique. Okay, so we've talked a lot about the size of your image. So let's dig into maybe the context of your image. What should go into a Docker image? What should be present in your containers? I have a few tips based on my experience with Docker images and Rails apps. I'm gonna cover some of the things that I think aren't covered enough, basically things that are often overlooked but I think are still very important. First encoding. I remember back in 2007 when I first started working with Rails, encoding was a big problem. We ran into, (mumbles) wasn't as widely used as it is now. We ran into coding issues all the time. We have this very specific checklist that we've set up when we deployed things. So we made sure that things are set up properly. Now that it's really, strings have included coding support, but the operating system will try to say, still has some kind of odd effects on the way that Ruby handles encodings. The rules are a little bit subtle. So in general, if you don't set the locale in your operating system, you don't set the encoding, sometimes you might get Ruby strings that US ASCII rather than probably what you want as UTF8. So it's very important in your Docker file to set the operating system encoding if not already done in your base image. This is what that might look like, approximately in Debian. So again next tip, make sure that locale information is set. Often overlooked still but still very, very important. There's another thing that's often overlooked. Seems obvious maybe when we first look at it, but something that more often do, or more think to do when we're using Docker. And that's in production, I mean, what is root? I hope we're not running as roots in Rails. I mean there's no reason to run Rails as roots, and there are of course a number of security issues that could happen. But when you're running containers, remember that containers isolate your user space. And so the default user in a container is a super user in that container. So unless you explicitly set the user, set an unprivileged user in your container, you are ready as root. You're running Rails as root in that container. So it's good practice to create an unprivileged user in your container and use that when you're running Rails. So again next tip, create an unprivileged user. Now you might say, okay, is this really necessary? I mean, containers are supposed to be isolated, right? Containers are isolated, users are isolated. Does it matter if I'm writing as a super user in a container? The answer is generally actually yes, it still matters, and that's because security, the best way to secure your systems is really defense and depth. If you don't need to run as a super user, then don't. Set up the unprivileged user. Suppose, for example, your Rails application gets hacked. Now the chooser might have super user in that container. What could that user do? What could that user do? Could they install something nasty in your container and cause your container to do unpleasant things? Worse, how confident are you that Docker itself will never have a security flaw that could allow a super user in the container to get out of that container and get access to the rest of your system? That will be kind of catastrophic. So just use an unprivileged user. It's best to put as many layers of security in front of your application as possible. So, again, often overlooked but very important. Next, let's move on to entry point and talk about something else that's often overlooked. So if you've used Docker before, you've probably seen these two different forms of a command or an entry point, right? There's exact form which is basically a string, a string of words that form a POSIX command. Then there's shell form which is a single string that is passed to a shell. You might have heard that it's generally recommended to use exact form. And yes there's various reasons why this is true. One of the less commonly cited but very important reasons for this has to do with signals. So when you need to stop a running Docker container, so for example when you call Docker stock or you're running Kubernetes, and Kubernetes needs to upgrade your app or it needs to scale some things, and so it needs to stop and start containers. What it's gonna do is it's gonna send a signal to the first process, the main process, process ID one in your container. So let's say a sig term or a signature, whatever that signal was. If you use shell form to start your container, that first process is the shell. That is not your Rails app. It's the shell. And fun fact about shells, they by default don't propagate signals into things that they start. So what's gonna happen here is your Rails application, the shell's gonna receive the same term, but the your Rails application is not. It's gonna continue on, it'll not know what's gonna happen. It's not gonna clean up after itself. Eventually, and your container is not gonna exit. And so eventually Docker is gonna have to, Kubernetes is gonna have to go in and force kill your container. You don't wanna do that. You want nice cleanup for your processes. So very common, often overlooked problem with shell form. So again our next tip, prefer exact form. If possible, use exact form. Now I know that there will be cases when you'll find shell forms to be really useful. Maybe you need to do shell substitutions in your Docker file or something like that. If that's the case, there is a workaround. Insert exec in front of your process. Exec is a Bash keyword. It's a built in, it basically tells Bash this process is part of the main thing that's going on. So propagates signals into this thing. So if you need to use exec, another pro tip, prefix your command with the exec. So if you do shell form, prefix your command with exec. All right, let's see. No, I think we have time. One more tip. Docker, it includes this feature called ONBUILD. Let's you define commands that run when a base image gets used in an application, the majority of the downstream image. So for example you might write a basic example, looks like this. And then when the application image builds from this image, these ONBUILD, these two commands get run implicitly immediately kind of in the beginning. So seems convenient, seems kind of a look idea at the time. Turns out in practice it's usually not worth it. So tip number eight, avoid ONBUILD. There are several reasons for this. First ONBUILD makes some assumptions. It basically works as the base image making assumptions about what's being done downstream about your application structure, about what it means to run. So for example you were copying the application image or the application directory, where is that directory? Where is that application? There are assumptions being made here. So it turns out that ONBUILD really isn't as useful as it might first seem. Another thing, generally for build process, it really is best to be very explicit and very transparent about what's going on when you build your application. ONBUILD basically removes that. It's running things implicitly in your Docker file. Things that are defined by the base image which is not necessarily part of your application itself not present in your source code. So generally I just recommend just forgetting that ONBUILD exists at all. You don't really need it. So far we've covered some tips regarding optimization of size. We've covered tips regarding pushing in new image. Let's take a step back and take a little bit of a broader view about your application right in production and how those containers should look. Your real world application is probably more complex than just a single Rails server. So you might have background processes, sidekick workers running. You might have other services, you might mem cache running. You might have multiple replicas of your application running for scaling. So do you run all that in a single container? Do you split it out into multiple containers? And if you do, how do you do that? There are a lot of interesting questions, interesting architectural questions around this. Always have a lot of times to go through all those, but I move such on a few day six year. So first remember Docker files or containers are about isolation. Containers are about isolation. This is important for writing introduction because it enables predictability. Enables predictability, lets you remove unknowns so you can understand the behavior of your application. Predictability is the key to a stable production system. So here's an example. Again, containers, isolation. Containers isolate resources like CPU and memory. You can tell Docker when it runs a container, just give it this many cores and this much memory to run, and that will prevent that container from spinning out of control and taking down the rest of your system. It's very important to do this in production. Always specify those resource constraints. So take advantage of this feature. Really crucial to maintain a stable, predictable production system. Similarly if you use Kubernetes, set those resource constraints in your configs. Also very important for Kubernetes because it allows Kubernetes to do some interesting things like bin packing. If it knows the size of your containers, it knows how many of those will fit on your system. It is able to do that in a smart way. So enables one of the really powerful features of your illustration system. Now, some might say that's great in practice or great in theory, but in practice occasionally we have containers where it's difficult to come up with kind of a static fixed size for that container, static fixed resource constraint. If that's the case, what I would say is if you're having trouble coming up with static fixed resource constraints. That's actually a container design, now that's a sign that maybe your container might be doing too much, and maybe it will be useful to think about how you break up that container. So it's kind of a useful tool to decide what should be in your containers and how you should structure those containers. Here's an example of that. Some app servers like Unicorn might let you freeform workers. And some of them will even do things like auto-scale, scale that worker count up or down based on traffic. Now their opinions will kind of differ on things like this. But in my experience, doing this in a container generally is not that great idea once again because it makes the resource usage in that container less predictable. Even if you fix the number of workers and you have compound write memory, still compound write memory can be tricky to predict the behavior of that especially with a language like Ruby where there's so much dynamic stuff going on. So in general I recommend not pre-forking your container. Don't try to scale up inside a container. Just run one worker in the container. So pay for it to be multi-threaded, that's generally I found to be okay. But forking also the workers tends to make those resource strings a lot more difficult to handle. So then how do you scale pre-fork? You scale by adding those containers. So again containers are best used now as a unit of scaling. Each container should have static, predictable resource constraints. And if you need more resources, just add more containers. It's pretty simple. One more. Loggin, it's one of the basic elements of launching your application. By default Rails belongs to a file in the application directory. Don't do this in a container. It makes it difficult to access your logs because container is isolated. You have to log in to the container to get access to that. And what happens to logs? Maybe additionally Docker's file system again is designed for layering. It is not designed for high throughput data. So if you have big logs, you might wanna do some resource (mumbles). So instead of directory logs outside the container. There are various ways to do this. The easiest is just to write the standard out. That will let Docker and Kubernetes handle your logs for you, and give you little (mumbles). One of the easiest ways to do that is just to set this environment variable. That tells Rails by default during production to write log as a standard out. You can of course perform a more sophisticated solution if you have a logic agent like (mumbles). Go ahead and use that. Again it's probably a good idea to run that in another container. So again however you do it, make your logs useful by directing them outside your application container. Okay, so we've covered a bunch of tips. I hope you've learned a few things. If you didn't catch all of those, I saw some of you trying to snapchat a lot of these, I would post all these tips along with the slides and along with some of the examples here at this alleyway. It's not up yet but I believe at the end of the day. So this is the slide to snapchat (mumbles). Okay, thank you all for coming. This has been great. Again, I'm part of Google Cloud Platform (mumbles). If you're interested in talking about containers, about Kubernetes, Docker, or any of the, you get a lot of fun things we're doing at Google Cloud, machine learning and various hosting options. I'll be there at the booth for most of tomorrow. And we have a whole team of my colleagues who are there to answer your questions. So come down and hang out. Thank you very much. (crowd applauding)

Info

Channel: Confreaks

Views: 6,713

Rating: undefined out of 5

Keywords:

Id: kG2vxYn547E

Channel Id: undefined

Length: 40min 25sec (2425 seconds)

Published: Sat May 19 2018