(electronic music) - Yes, time to start? Hi all. My name is Kir and today I'll talk about running Rails in Kubernetes. I work at a company called Shopify and for over the past year or so, we've moved hundreds of Rails apps within the company to Kubernetes. As well as our main monolith, which is also known as one
of the largest and oldest Rails apps in the community. We learned quite a bit about
running Rails efficiently in Kubernetes and I
decided to make this talk to share some of the
things that we learned. So today, we'll start
from getting a quick intro into Kubernetes for those who haven't been exposed to it yet. Then we'll talk about what
makes Rails a bit special in running it in orchestrated
platforms like Kubernetes. And then I'll share some of the things that helped us to migrate all these apps. First of all, please raise
your hand if you've ever played with Kubernetes or
container orchestration? Oh, it's quite a lot. So, in 2018 ... Almost everyone agreed that
containers are awesome, because they provide
this universal interface for any app to run it, in basically any environment that you want. But the problem of running
and scheduling containers is not, it's still there. You need to run these
containers somewhere. Just as a note, I'm not going
to talk about containerizing Rails, because there will be a great talk tomorrow at 3:30. If you're interested in
hearing about containerizing Rails itself, please
(mumbles) this talk by Daniel, and I'll talk about
running it in production and orchestrating containers. So ... You have the container with the app, and you're going to run it somewhere. In the static world, where
servers are configured with something like Chef, you would have a bigger server that would
handle more fat containers that require more memory and CPU. You would have a server
with a bit less memory, and you would decide that you would run some other containers there. So, all of that is, all
that math is done by humans, and assigned by us, maybe configured with some scripts, but
it's still pretty manual. And if we think about this
process, there is actually quite some resources can be
pasted, because there would be still some CPUs unused
left, some memory left. And nothing really stops
us, well, the desired state would be that every CPU is used and ... And all the resources
are efficiently scheduled so that we would achieve the same results, have the same capacity with
less resources consumed, and save some energy. What Kubernetes solves,
is efficiently scheduling the ... The resources on your servers,
in a very dynamic way. Bin packing the containers
that you want to run in the very best way. So, if we want to define
it in just one sentence, it's a smart container scheduling
for better utilization. Two things here. It's scheduling. I want to emphasize,
because you no longer have a defined list of servers
that you bootstrap. It's all schedule dynamic. If one server crashes or the power dies, the same unit of work would be rescheduled on another machine, and you
wouldn't even notice that. The second about utilization,
to make the best use of the resources that you have, which is specially important as you grow. Because you would have more
servers, more unused CPUs, more unused memory left. Which of course, you don't
want to just sit there. If ... The next step, I just wanted to ... Get some shared recovery,
and to talk about the concepts that Kubernetes brings. First, the very basic concept is a pod. Pod is basically running
container, one instance of something, so if we run
one process of Sidekick, it would be just one pod. And obviously, one instance
of something is not enough to run a whole app or service. So we come to the next
concept, codeployment, which is a set of pods. A technical app would have
maybe two deployments, one with the web workers,
and another with job workers. And the number of instances
in the deployment, the number of pods, is very dynamic, it can be adjusted, you can scale it up, you can scale it down. You can even set up office scaling. If you ever worked with
Heroku, you probably remember this concepts of Dynos and the Dyno count that you can adjust and scale up. It's the same with the
deployment in Kubernetes, which can scale up and down. This all sounds great,
but how do you actually tell or describe all these resources? If you use Chef or
Capistrano, you probably had a Ruby DSL. And as any DSL in dynamic
language, it comes with good and bad things. Good, it can be very expressive. You can describe lots of
things there, but sometimes, it comes as a disadvantage
too, because you can do basically anything that
you can do with Ruby. And sometimes, you want
a DSL to be as minimal as possible. So Kubernetes leverages
YAML files as a way to describe resources. You would have a YAML
config of maybe 20, 30 lines of a resource. This is just an example
of a config for Rails app. Then, you would apply that config to a Kubernetes cluster, and
store that same YAML file in the repo, which ... Which I think is a great
benefit, because it's just a couple configs that
are in the same repo, not in another repo with
cookbooks or whatever. At least for me and some
of the people that I know, this came to some kind of ... Shift in the mindset,
because we had to move from controlling servers, when
we deploy code applications on new apps. When you deploy resources
with Chef or with Capistrano, it was ... At the end, it was just
sequentially applying commands by SSH, and controlling servers. You would always have an
output of exact SSH commands, and see what's going on, see what fails, see what commands are stuck. And so on. With Kubernetes, it's
quite different, because you just take a YAML
file and tell Kubernetes to apply it, and that
is the desired state, which will be rolled out
there in a few seconds or in a minute, if you
have, if you applied a very big subset of
configuration or resources, it would take maybe more. But you have ... At least me, I had to
move from this concept of controlling servers, exact machines, to describing configuration. If we take controlling
servers, it would be running commands remotely,
comparing their output, when in contrast, when you
describe the configuration, you just push it, and
then poll for it to apply. Which comes with the advantage of being abstracted from physical
machines, which is great for things like
self-healing, if one server goes down, the same work
would be rescheduled somewhere else. While if it's controlling
servers manually, it can be not very prone to failures. For instance, a (mumbles) file
would have Capistrano config with more than 100 hosts, and eventually, once a couple month, some costs would die. Just because it's too many servers. And we had to, this wouldn't
self-heal, if it was configuration described with
orchestrated containers. And yeah, if we talk about
tools and technologies, example of controlling servers
is Capistrano and Chef ... In contrast, platforms
like Kubernetes and Mesos, let you describe the
configuration, describe the desired state, and the
platform would roll out the state for you. So, containers. Kubernetes takes a container and runs it for whatever number of
instances that you specified, and it's very easy to
run a plain container, but Rails, eventually, is a bit more than just a process. Many Rails apps work as a
monolith with many things embedded into them that
makes them sometimes quite special to run
as a simple container. One thing, if you use Heroku, you probably were familiar with the
concept of 12 factor app, which is a methodology
for building software as a service apps that promotes
clarity of performance, that promotes minimizing
difference between production and development,
and apps that follow the 12 factor manifests,
they are usually easy to scale up and down. With no significant changes
to the architecture. As you have guessed, there's 12 factors, and we'll go through a couple of them that are, I think that can
be sometimes be forgotten, when we work on Rails apps,
but they're nevertheless quite important, especially
if you want to run the app in Kubernetes successfully. One of them is disposability
in termination, which, in other words, is what happens when you want to restart
or shut down a process. For something like web
requests, it's as easy as waiting for the request timeout. If you know that a request
will not take longer than 30 seconds, you stop
accepting any new requests, wait for 30 seconds, and then you're safe to shut down the worker without losing any live requests. Same about background jobs. You have to wait for the
current jobs to terminate, and then you're safe to
shut down the process without losing any work that is going on. However, this might be a bit trickier for long-running jobs. This is one of the examples
of a very simple job that can become long-running. If this example, it
rates over some records in the database, and does
it, and calls the method on active record. If you have just a few users, this job would complete within
seconds, maybe a minute, but as it grow to a size
of us, we have millions of records in a row, and
we would have jobs that were very similar, they
were doing similar things as this example, and it
would take them weeks to iterate over all records
and do something with the records. So, how do we shut down these workers? We must keep in mind that
workers that are long-running, they will be aborted and re-enqueued. Which in this example, would mean that this job can be maybe
aborted in the middle, and then it will run
again, which is essentially what Sidekick does, and ... Here, we come to the
concept of item potency, when the work that, when the
code that is called there should process the same ... Should not process the
extra side effects, and be safe to be executed more than once. Another aspect of 12 factor
apps is the concurrency. That allows your app to
scale with a process model. They have this illustration, which shows that you have web workers
and some job workers, which you can scale up and down, and to be able to successfully
scale these workers, they should not share
any kind of resources together, because if they would all ... If they all had a bottleneck of just one shared resource, they would
not scale very successfully. Talked a bit about 12 factors. Some things about Rails,
to know when deploying to Kubernetes. First is assets. When you use something like Capistrano, it would probably run assets,
precompile on every server that you wanted to serve requests from. Which was a bit of a waste of resources, if you can precompile assets only once, and then distribute that
image on all servers. Instead of precompiling
them on each server. So, the efficient way of doing that is to embed assets into the
container, with the app. So that when the app
starts, it already got all the dependencies, like assets. Another part that can
sometimes get a bit messy is database migrations. In the Rails community,
we're very much used to migrations as a part of deploy. Maybe as a hook at the
end of deploy, you deploy the code, and then you apply
the migrations right away. This step of the deploy
process makes the deploy a bit fragile, because what do you do with the code change,
if the migration failed? Do you roll back the code,
or do you keep running it? If you rolled it back, you already had the new code in production,
for like, 30 seconds or a minute. It might not be very safe to roll it back. So, we tried to avoid
migrations as a part of deploy and make the, made
developers to write the code that is compatible with both the old and the new schema. Because at the middle of the rollout, you would always have some
workers on the older version and some workers on the newer version. We try to make the
migrations asynchronous, which helps to establish this contract with developers, that the
code may run on both versions of the schema. So instead of changing code and applying the migration as the
same step, the first step could be add a migration, for instance, that adds a column, and only then, you would update the code to interact with the new column, when you'll be ... When you'll be sure that all the schemas are having that new column. Usually, these asynchronous migrations, they would be applied in a few minutes, after they deploy, which
gives us, which we make a bit easier for developers
by announcing that in Slack and giving them a notification when their migration is applied. Another part of Rails
is secrets, which is ... Which, well, I think
none of the modern apps run in kind of isolated,
basically, every app now would interact with some
kind of third party API that can be S3 buckets or ... Facebook API, which, and
all these third parties and APIs require some tokens, API keys. Which Rails has to be aware of. One approach is secrets
in the environment, variables, the approach
that Heroku promotes. This is very easy, but as
you grow, you would have hundreds of tokens, and
you probably don't want to run the app with hundreds of variables that the app is dependent on. You may think about
putting secrets right into the container with the app, which is not the most secure approach
that you can take, because anyone who gets
the container also gets the secrets. Fortunately for us, Rails 5.2
ships with the credentials feature, which allows you
to put encrypted secrets credentials right into the repo. And edit them, and it's
fully safe to commit and store them in the repo. All you need to read and change them is the Rails master key. And as a result, you run
the container with just one environment variable,
which is the key to the rest of the secrets. To recap, following 12
factors helps it easier to run Rails apps in
orchestrated environments, and being mindful about worker termination also helps. Migrations as a part of
deploy, as a hook after deploy can be fragile and make
the rollout process not very safe, so asynchronous migrations can help solving that, and ... Credentials that ship with
the Rails 5.2 make the process of sharing keys a bit easier. At Shopify, we've had ... We've had hundreds of apps running in different environments. Some of them were in
Heroku, some of them were in AWS, some of them were on ... On physical hardware, managed with Chef, and what we wanted for
our developers is to stop being exposed to all that infrastructure and just have a platform to
run Rails apps somewhere. So we've decided to invest
in something like Kubernetes, which would allow us to
scale, to scale containers in the best way, and also
to utilize the resources in the best way ... As I said, describing, if
as I said, if we wanted the apps to run in
Kubernetes, they had to have the resources specs in YAML,
which is pretty easy format, no more than 20 or 30
lines of code in YAML, but still, we didn't want every developer to learn that YAML declaration. What we did instead is we
created a bot that created a PR on Github, based
on stuff that you use in production. If you use Sidekick, it would generate you a YAML config for that
unit of work in Kubernetes, and the first item in that
PR description would be a checklist that recommends to look if that config makes sense for this app. If that looks good,
just merge, and your app is ready to run. The next step is to apply the
config with the cube control CLI tool, and if you
ever tried cube control, apply file and then to YAML,
it returns immediately, because it just lets Kubernetes know about the desired state, and
then it takes the system for some time to provision
all those containers. To find a server that
has some CPU available, and schedule the work there. And this process is not very visible. If you're used to
Capistrano, you probably want some kind of progress to
monitor, to see how many of your servers already
run that new container. And if maybe ... What's the progress of the
rollout, and things like that. So we've made a gem
called Kubernetes deploy. That provides visibility into the changes that are applied to
the Kubernetes cluster. This is open source
project, it's been adopted by other companies as well,
and just like Capistrano, it applies ... It applies configuration and lets ... Not working. There was a little video preview. And applies the config, and ... And tracks the progress. So, robots help humans to migrate apps by generating YAML configs. Developers didn't have to
write YAML configs anymore and Kubernetes deploy
brought visibility into the rollout progress. Overall, I think the steps that Rails have been taking towards running
in Cloud and running in container environments just like Heroku these steps were in the right direction, that helps us now to
run Rails in Kubernetes. This is a lot thanks to Heroku
that has been pushing Rails into that direction. To make that run smoothly. In containers. For us, and for many other
companies, Kubernetes helps to schedule the work efficiently, save resources, and to stop caring about on which server some container has to run. And then, it's not magic, it
just, a technology that helps to schedule the work. There were some things
that you have to know about Rails in running it
in orchestrated platforms, to make it run smoothly. Before it took me hours
to set up a new app in production with Chef and
Capistrano, I had to find an instance, provision
it, write some cookbooks, or do something else to set
up all the environment, all the packets that were
needed there, to run Rails. Now, with orchestrated
containers, it's a matter of just a couple YAMLs. It becomes, I think it
becomes very standardized. In terms of getting started with any app, if the app is using Kubernetes,
you can just read through the resource specs and
see how the deployment is organized, which
reminds me what Rails did more than ten years ago,
because before every app has used their own
structure, and it took you some time to understand how it works. Now, you can get started
with any Rails app, within hours, just because
you know that all controllers are in app slash controllers
and config routes has all the routes that the app has. So, Kubernetes brings this abstraction. It collapses this complexity,
what David Dietrich talked in the keynote this morning. You would maybe have a
question, when it's worth, when it's worth getting
started with Kubernetes, moving on to ... To orchestrated environments. I would say that if you want to stop caring about physical
machines where something runs, if you want just a platform
to run a container, that's a good solution. You can follow me at Twitter. If this ... if working on ... Things that I mentioned
in this talk from Rails to the infrastructure by
Kubernetes sounds exciting, please hit me up, and thank you for coming to the talk. (audience applauds) So the question is, what's the easiest way to organize asynchronous migrations? One way is to just ... Add some checks for pool
requests, so the developers ship pool requests separately, let's ship one PR with the migration,
another PR with the code change, because that also makes it
easier to revert something if you really want to revert. And which also makes it
easier to revert code, and not revert the migration. Because you wouldn't really
want to reverse the migration. And ... Yeah, does that answer the question? - [Man In Audience] I
was just thinking about how you actually run the migrating. - Yes, how we run it. We have a recurring job,
that runs every five, ten minutes, that checks
for any pending migrations and applies them. And that works for a background job. I have a blog post
about that, if you find, it's in my Twitter. How do we deal with stateful resources? We don't run things like
MySQL in Kubernetes yet. With things like Redist, I know it's been a bit painful, because I don't know, Google Cloud or any other provider would diagnose the servers
unhealthy, it would reschedule Redist to another node,
and it would be down for that 30 seconds while
it's being rescheduled. So it's something that
we're actively looking in. I would say that is not
as smooth yet, but for stateless things, it's ... It's getting better. So, the question is, do
we use Kubernetes secrets to store credentials? Yes, we do, and that Rails master key that I had a slide with, you can put that into Kubernetes secrets. And it just works very
very smoothly, you just mount it. I was surprised that it just worked. So the question is, how
do I manage configuration for different environments? By environments, you mean
like staging and production? We don't have a classic staging. We use feature flags, and ... But something like redeploys
would be interested to look in. Thank you all so much for coming. (audience applauds)