(techno pop music) - Wow, well, thank you all for coming. I know it's the last
session of the afternoon, and usually for me, if it's the last session of the afternoon, I wanna be doing something
other than sitting in a session. I'm really grateful that you've all come. This session is Containerizing Rails. My name is Daniel Azuma
and we'll get started. Basically, during this hour, we're gonna be talking about
container best practices. We'll talk about how to wrap your Rails application
in Docker containers. We'll talk about how to
optimize your containers for size and for performance. We'll talk about what should and what should not be in your containers. Basically, we'll talk about how to make your rails application work well in a container-based production system. We're gonna start off with a little bit of background on Docker. I'm not going to make assumptions about how much experience you've had with Docker or with Kubernetes or
related technologies. You should be able to follow this even if you don't have
a lot of experience, but, that said, this is actually not meant to be a first time tutorial. It's not really a first
time beginner thing. I kind of have to apologize for this. I know that the program says that there's kind of going to be a walk through for deploying an
application to Kubernetes, and I ended up cutting
the walkthrough for time. I didn't get a chance
to update the program. There are good tutorials on Kubernetes that you can find online. But what we're going to
be focusing on here today is best practices, is tips for making better containers. That's pretty much what we'll do. Let's start off with just
a little bit about me, about your speaker. My name is Daniel Azuma. I've been developing professionally for about 20 years or so. About 12 of those years have been spent with Ruby and Ruby on Rails. I remember going to my
first Rails Conf in 2007. Was anyone here at the
Rails Conf 2007 or before? A few. Okay, that's actually really good. I was hoping to see most
of you are newer than that, 'cause that really means a
community that continues to grow and continues to evolve. I'm glad to see that. So, welcome. I've been doing a
variety of things in Ruby over the past 12 years. I did some geospatial work. Did a bunch of startups. Currently, I'm part of a little boutique west coast startup at Google. I'm part of a team that is focused on language, programming
language infrastructure. We call it Languages,
Optimizations, and Libraries, or LOL for short, as you
can probably imagine. It's a team that, again, focuses on making language infrastructure better, both internally at Google, as well as externally for our customers, for you people working
on the Google cloud. I'm part of the team that
handles Ruby and Elixir. I'm just really excited to work on stuff that helps you, people who are using
languages that I love. Glad to be here. This is a sponsored talk. I do want to thank Google Cloud Platform for enabling me to be here and to share with you about stuff, some of our ideas on containers
and what makes them tick. Let's get started. We'll get started with
some basic concepts. Again, just to make sure
we're all on the same page. First, containers. What is a container? I think, if you talk about containers, there's one word that you really should, should come to your mind when
containers are mentioned. That is isolation. Containers are about isolation. In particular, take Docker for example. A Docker container is basically
an isolated file system, slice of the CPU, slice of the memory, network stack, user
space, and process space. Again, all of this is isolated. It's set up so that it's very controlled how inside the container interacts with outside the container. Now, at first glance, this might look kind of like a virtual machine, right? Kind of like a VM, but it's not. This is not a VM. There's no hypervisor involved here. This is, containers are
actually a feature of Linux. Containers live inside a Linux machine. They share the Linux kernel
and the CPU architecture with anything else
that's running in Linux. That includes other processes in Linux, and it includes other containers. You can run multiple
containers on a Linux machine. Again, they are isolated from each other. Really basic on containers. The next concept that we
need to cover is images. What is an image? An image is basically a template for creating a container. Most of that consists of
a file system snapshot, so imagine the files and the directories that you need to start your container. Your application, the files
that comprise your application. The files that comprise
your operating system, any dependencies of your application. All of that lives in the image. In Docker, you create an image using a recipe called a Dockerfile. Dockerfiles, this is basically what a simple Dockerfile might
look like for a Rails app. This is kind of somewhat simplified, so there aren't a lot
of best practices here. It's here so you can kind of see what the different parts are. The first line here is what's
known as the base image. This is a starting point for a Dockerfile. It's another Docker image. Again, it has files and
it has all the information you would need to start a container. There are different kinds of base images. This one, in particular, would refer to the Ruby, the official Ruby image, which includes an operating system and an installation of Ruby. From that base image, there are commands that you would run in a Dockerfile. Some of those commands do things like copy files into the image. You might copy your Rails application for example, into the image. You can also run shell commands to do things in the Dockerfile. For example, you can rundle bundle install to install your gem dependencies, install your bundle in the image. You can also set different
properties of the image. This particular property is the command that is run by default to start a container based on this image. Different parts of a Dockerfile. Again, they're basically commands that are run in order by Docker when you build an image from this file. Now, we're gonna start by
looking at this first line. The base image. When you go through a getting
started tutorial for Docker, there are often some, you'll be directed to use
some specific base image. There are a variety of those base images. One example is, we saw
the official Ruby image. There are different variants of that image using different operating systems, different versions of Ruby. There are different, there
are lower level images that just install an
operating system for you, so you might have an Alpine Linux image or a Debian or an Ubuntu image. There are also a variety
of third party images. Some good ones from Fusion, for example. I'm not going to advocate for a specific base image here. Each one has its own goals, its own philosophy behind its design. Rather, what I'm going to do is I'm going to talk about the
things that go into an image, what makes an image effective. These are tips that you can use if you're creating a base image yourself, or if you're just creating
an application image based on someone else's image. That does bring us to
our first pro tip here, which is that it's important to understand what's going on in a
base image that you use. Don't treat a base image like a black box. It's very tempting to do that, but it's actually very important in order to make effective
use of a base image to understand what it's doing and why it's doing that. Read your base image's Dockerfile. The Dockerfiles generally
are not that long, it'd take probably just
a few minutes to read, and most of them are available on GitHub. Take a look at your Dockerfile, get to know what operating
system it installs, how it sets up the environment, and whether that really matches how your application wants to be set up. You can also learn a lot
of good Docker practices by reading other people's Dockerfiles. A good practice, read base images. As you get familiar with Dockerfiles, you'll notice that there are some properties that a lot of
good Dockerfiles have. One of those, one of the
important ones, is size. Size matters a lot with
Dockerfiles, with images. They matter in terms of runtime, in terms of how much
resources your image uses, your application uses at runtime, but it also matters at build
time and at deployment time, because if your image is large, you have to upload and download that to and from your CI system
into your production system. Maybe you're building things locally. However you do that, some of these images can be fairly large. It's good to try to minimize that size. There are actually a lot of techniques that you'll see out there on how to optimize the size of your image. Let's take a look at a few of those because they're really interesting. One of the most common things that you'll do in a
Dockerfile is install stuff. You have Rails, you're running Rails. It'll probably use certain libraries. You'll use maybe LibYAML. You might use libxml. Heaven forbid you might use Image Magic. There are various things
that you'll end up using. If you're on Debian or Ubuntu, the tool that you use to
install that is apt-get. One thing to know about apt-get is that it does, it's not
really designed for Dockerfiles. It's been around a lot longer than that. It actually leaves a bunch
of files in your file system. It downloads package lists, downloads package archives,
certain manifest files. It's important, these
files are not necessary for your application at runtime, so it's a good idea to get rid of them after you're done installing. Often times, when you
look at a Dockerfile, you'll see something that looks like this. You'll see a line in the Dockerfile saying okay, let's go ahead and delete
all these temporary files, all these cache files that apt-get uses. This is good, this is important to do, but it's also important
to do this correctly. For example, don't do it like this. Don't run, don't update, install and then clean up
in a series of run commands. Instead, do it like this. Combine those steps in
a single run command. That's very important. The reason for that is because of how Docker images are represented. A Docker image is represented
as a series of layers, so imagine you have a
base image that you use, and then in your Dockerfile, you have a series of commands. Each major command that you run will add a diff on top of those layers. As you run a series of Docker commands, you have these series of diffs. The image is that entire set of layers. For example, if you run apt-get update in one command on your Dockerfile, that will download a
bunch of package lists from the Debian repositories. Now, those are in your image. Those are in that diff, in that layer. Now, if you continue to
run additional commands, they'll add additional layers. Later, if you run apt-get clean, those will remove those
files from that layer, but those files are
already part of your image. They were added in an earlier layer, so you actually really
haven't gained anything here. The image comprises
the entire set of diffs (audience member
coughing) command that you run in the Dockerfile. It's important, again, to do it like this. Apt-get update, install, and clean all in the same run command, and what that does is it makes sure that those temporary files
the apt-get update installs get removed before that
layer gets finalized. They never appear in the layer. You'll see, you go out and look for Dockerfile best practices, this is one of the key
ones that you'll see a lot. One loophole is we'll talk about is minimizing the number of layers as well, and that's also important. I think it's more critically
important to understand what's being done in each layer. And if you install something
in a layer, it's there. It's part of your image at that point. It has to be downloaded
when you install an image. So very important. So next concept, again,
combine your installations. So this is how it works in apt-get. If you build the source, you're installing a
different source for example, download the source,
configure, and make install and then delete your source directory all in the same run command
so that the source files which you don't need at runtime don't get end up in the layer. Similarly Alpine Linux. This is a great distribution for example to use for Docker files because it's tiny and it has a lot of
really useful features. One of those is a virtual package feature. It's kind of like, a virtual
environment in Python. Basically you can install
stuff using APK temporarily, so install things, use them, and then remove that entire environment. But again important to do all that within the same run command so that
those temporary packages never show up in the layer. Again, very important, combine
installations with cleanup. Here's another optimization technique. Some gems come with C extensions. So if you're running Rails, one of the gems that will
probably be part of your bundle is local gearing. It has C extensions as part of it. So in order to install that bundle, you need a C compiler. You need a bunch of things,
in fact you need Make, you need libraries, you need a whole set of
builds to install that. Now these build tools, you need them to install your bundle,
but you probably don't need those at runtime. And those build tools are
actually pretty large. I tried installing the essential package on top of Ubuntu last night just to see how big it is. The Ubuntu bit image by
itself is about 100 megabytes. With build essential,
it triples that size. So this is not small. So it'd be nice to be able
to install your bundle but not have these build
tools in your final image. But is there a way to do that? Yes there is. There's a powerful technique
that not a lot of people are using yet. But it's very useful for this
and kind of a whole class of similar problems, and that
is multi-stage Docker files. This is a feature that's been around for about a year in Docker. Seems like not a lot of
people are using it yet, but you should, you
should use this feature. It's really, really useful. The basic idea is like
a multi-stage rocket. You have a initial stage that kind of does your heavy lifting for
you in the build process. And then once you're done with it, you just discard it. And so only your final stage
which is gonna be much smaller is then used at runtime. So this is how this might look. This is again kind of an
illustrative Docker file. There are some commands that you would normally find that are
kind of missing here. But the idea is that
you have multiple images in a docker file, multiple stages. And it's only that last image that is finally used at runtime. The earlier images are node. So this is how this would work. So we start with a base image. I call this my base because I'm not sure if the spacing actually
exists as a public image. So imagine a base image
that contains Ruby, but no build tools. Normally I think the official Ruby images do have all the build
tools because they expect you're gonna install gems. But that image is less
useful in production because you don't need those
build tools at runtime. So imagine you have a base image with Ruby and no build tools. So you start there, you copy
your application to the image. Now you need to install your bundle. So let's install those little tools. So you're right, actually
we'll do your apt-get update, install and clean, right? Now we bundle install. So now we've got this
image which has ruby, has your operating system,
has your application, has all of your gem tendencies including those build C extensions, and has all the C
compilers and build tools. That's 200 megabytes of stuff that you needed to build but
you don't need at runtime. So here's our first stage. Then we start over. We start over with a new base image. And then what we can do is we can copy the application directory from that first stage
into the second stage. Notice what we did here
with a bundle install. We did dash dash deployment. That's among other things bends your gems. That means it installs those gems in your application directory. That includes all those C
libraries that got built. So when you copy that location directory, it includes your application
and all of those install gems. So now we've got a new image, that's what we want at tun time. We have a base image with Ruby. We have our application. We have all the gems
with all the C extensions all built and ready, and we have no build tools. And then again when you
run this Docker file, that first stage just goes away. So basically tip number three, use a separate build stage like this to create smaller runtime images. Really, really useful technique. Okay, so we've talked a lot
about the size of your image. So let's dig into maybe
the context of your image. What should go into a Docker image? What should be present in your containers? I have a few tips based on my experience with Docker images and Rails apps. I'm gonna cover some of the things that I think aren't covered enough, basically things that are often overlooked but I think are still very important. First encoding. I remember back in 2007
when I first started working with Rails,
encoding was a big problem. We ran into, (mumbles) wasn't as
widely used as it is now. We ran into coding issues all the time. We have this very specific checklist that we've set up when we deployed things. So we made sure that
things are set up properly. Now that it's really, strings have included coding support, but the operating system will try to say, still has some kind of odd effects on the way that Ruby handles encodings. The rules are a little bit subtle. So in general, if you don't set the locale in your operating system,
you don't set the encoding, sometimes you might get
Ruby strings that US ASCII rather than probably
what you want as UTF8. So it's very important in your Docker file to set the operating system encoding if not already done in your base image. This is what that might look like, approximately in Debian. So again next tip, make sure
that locale information is set. Often overlooked still but
still very, very important. There's another thing
that's often overlooked. Seems obvious maybe when
we first look at it, but something that more often do, or more think to do
when we're using Docker. And that's in production,
I mean, what is root? I hope we're not running
as roots in Rails. I mean there's no reason
to run Rails as roots, and there are of course a
number of security issues that could happen. But when you're running containers, remember that containers
isolate your user space. And so the default user in
a container is a super user in that container. So unless you explicitly set the user, set an unprivileged
user in your container, you are ready as root. You're running Rails as
root in that container. So it's good practice to
create an unprivileged user in your container and use that
when you're running Rails. So again next tip, create
an unprivileged user. Now you might say, okay,
is this really necessary? I mean, containers are
supposed to be isolated, right? Containers are isolated,
users are isolated. Does it matter if I'm
writing as a super user in a container? The answer is generally
actually yes, it still matters, and that's because security, the best way to secure your systems is really defense and depth. If you don't need to run as
a super user, then don't. Set up the unprivileged user. Suppose, for example, your
Rails application gets hacked. Now the chooser might have
super user in that container. What could that user do? What could that user do? Could they install something
nasty in your container and cause your container
to do unpleasant things? Worse, how confident are you that Docker itself will never have a security flaw that could allow a super
user in the container to get out of that
container and get access to the rest of your system? That will be kind of catastrophic. So just use an unprivileged user. It's best to put as
many layers of security in front of your application as possible. So, again, often overlooked
but very important. Next, let's move on to
entry point and talk about something else that's often overlooked. So if you've used Docker before, you've probably seen
these two different forms of a command or an entry point, right? There's exact form which
is basically a string, a string of words that
form a POSIX command. Then there's shell form
which is a single string that is passed to a shell. You might have heard that
it's generally recommended to use exact form. And yes there's various
reasons why this is true. One of the less commonly cited but very important reasons for this has to do with signals. So when you need to stop a
running Docker container, so for example when you call Docker stock or you're running Kubernetes, and Kubernetes needs to upgrade your app or it needs to scale some things, and so it needs to stop
and start containers. What it's gonna do is
it's gonna send a signal to the first process, the main process, process ID one in your container. So let's say a sig term or a signature, whatever that signal was. If you use shell form
to start your container, that first process is the shell. That is not your Rails app. It's the shell. And fun fact about shells, they by default don't propagate signals into things that they start. So what's gonna happen here
is your Rails application, the shell's gonna receive the same term, but the your Rails application is not. It's gonna continue on, it'll not know what's gonna happen. It's not gonna clean up after itself. Eventually, and your
container is not gonna exit. And so eventually Docker is gonna have to, Kubernetes is gonna have to go in and force kill your container. You don't wanna do that. You want nice cleanup for your processes. So very common, often overlooked
problem with shell form. So again our next tip, prefer exact form. If possible, use exact form. Now I know that there will be cases when you'll find shell
forms to be really useful. Maybe you need to do shell substitutions in your Docker file or
something like that. If that's the case, there is a workaround. Insert exec in front of your process. Exec is a Bash keyword. It's a built in, it basically tells Bash this process is part of the
main thing that's going on. So propagates signals into this thing. So if you need to use
exec, another pro tip, prefix your command with the exec. So if you do shell form,
prefix your command with exec. All right, let's see. No, I think we have time. One more tip. Docker, it includes this
feature called ONBUILD. Let's you define commands
that run when a base image gets used in an application, the majority of the downstream image. So for example you might
write a basic example, looks like this. And then when the application image builds from this image, these ONBUILD, these two commands get run implicitly immediately
kind of in the beginning. So seems convenient, seems kind
of a look idea at the time. Turns out in practice
it's usually not worth it. So tip number eight, avoid ONBUILD. There are several reasons for this. First ONBUILD makes some assumptions. It basically works as the
base image making assumptions about what's being done downstream about your application structure, about what it means to run. So for example you were
copying the application image or the application directory,
where is that directory? Where is that application? There are assumptions being made here. So it turns out that ONBUILD really isn't as useful as it might first seem. Another thing, generally
for build process, it really is best to be very explicit and very transparent about what's going on when you build your application. ONBUILD basically removes that. It's running things implicitly
in your Docker file. Things that are defined by the base image which is not necessarily part
of your application itself not present in your source code. So generally I just
recommend just forgetting that ONBUILD exists at all. You don't really need it. So far we've covered some tips regarding optimization of size. We've covered tips regarding
pushing in new image. Let's take a step back
and take a little bit of a broader view about your application right in production and how
those containers should look. Your real world application
is probably more complex than just a single Rails server. So you might have background processes, sidekick workers running. You might have other services,
you might mem cache running. You might have multiple replicas of your application running for scaling. So do you run all that
in a single container? Do you split it out into
multiple containers? And if you do, how do you do that? There are a lot of interesting questions, interesting architectural
questions around this. Always have a lot of times
to go through all those, but I move such on a few day six year. So first remember Docker
files or containers are about isolation. Containers are about isolation. This is important for writing introduction because it enables predictability. Enables predictability,
lets you remove unknowns so you can understand the
behavior of your application. Predictability is the key to
a stable production system. So here's an example. Again, containers, isolation. Containers isolate resources
like CPU and memory. You can tell Docker when
it runs a container, just give it this many cores
and this much memory to run, and that will prevent that container from spinning out of control and taking down the rest of your system. It's very important to
do this in production. Always specify those resource constraints. So take advantage of this feature. Really crucial to maintain
a stable, predictable production system. Similarly if you use Kubernetes, set those resource
constraints in your configs. Also very important for Kubernetes because it allows Kubernetes to
do some interesting things like bin packing. If it knows the size of your containers, it knows how many of those
will fit on your system. It is able to do that in a smart way. So enables one of the
really powerful features of your illustration system. Now, some might say
that's great in practice or great in theory, but in practice occasionally we have
containers where it's difficult to come up with kind of a static fixed size for that container, static fixed resource constraint. If that's the case, what I would say is if you're having trouble
coming up with static fixed resource constraints. That's actually a container design, now that's a sign that
maybe your container might be doing too much, and maybe it will be useful to think about how you break up that container. So it's kind of a useful tool to decide what should be in your containers and how you should
structure those containers. Here's an example of that. Some app servers like Unicorn might let you freeform workers. And some of them will even
do things like auto-scale, scale that worker count up
or down based on traffic. Now their opinions will kind
of differ on things like this. But in my experience,
doing this in a container generally is not that great idea once again because it
makes the resource usage in that container less predictable. Even if you fix the number of workers and you have compound write memory, still compound write memory can be tricky to predict the behavior of that especially with a language like Ruby where there's so much
dynamic stuff going on. So in general I recommend not
pre-forking your container. Don't try to scale up inside a container. Just run one worker in the container. So pay for it to be multi-threaded, that's generally I found to be okay. But forking also the workers tends to make those resource strings a lot
more difficult to handle. So then how do you scale pre-fork? You scale by adding those containers. So again containers are best used now as a unit of scaling. Each container should have static, predictable resource constraints. And if you need more resources, just add more containers. It's pretty simple. One more. Loggin, it's one of the
basic elements of launching your application. By default Rails belongs to a file in the application directory. Don't do this in a container. It makes it difficult to
access your logs because container is isolated. You have to log in to the
container to get access to that. And what happens to logs? Maybe additionally
Docker's file system again is designed for layering. It is not designed for
high throughput data. So if you have big logs, you might wanna do some
resource (mumbles). So instead of directory
logs outside the container. There are various ways to do this. The easiest is just to
write the standard out. That will let Docker and Kubernetes handle your logs for you, and give you little (mumbles). One of the easiest ways to do that is just to set this environment variable. That tells Rails by
default during production to write log as a standard out. You can of course perform a
more sophisticated solution if you have a logic agent like (mumbles). Go ahead and use that. Again it's probably a
good idea to run that in another container. So again however you do it, make your logs useful by
directing them outside your application container. Okay, so we've covered a bunch of tips. I hope you've learned a few things. If you didn't catch all of those, I saw some of you trying
to snapchat a lot of these, I would post all these
tips along with the slides and along with some of the examples here at this alleyway. It's not up yet but I believe
at the end of the day. So this is the slide
to snapchat (mumbles). Okay, thank you all for coming. This has been great. Again, I'm part of Google
Cloud Platform (mumbles). If you're interested in
talking about containers, about Kubernetes, Docker, or any of the, you get a lot of fun things
we're doing at Google Cloud, machine learning and
various hosting options. I'll be there at the booth
for most of tomorrow. And we have a whole team of
my colleagues who are there to answer your questions. So come down and hang out. Thank you very much. (crowd applauding)