RailsConf 2018: Operating Rails in Kubernetes by Kir Shatrov

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

(electronic music) - Yes, time to start? Hi all. My name is Kir and today I'll talk about running Rails in Kubernetes. I work at a company called Shopify and for over the past year or so, we've moved hundreds of Rails apps within the company to Kubernetes. As well as our main monolith, which is also known as one of the largest and oldest Rails apps in the community. We learned quite a bit about running Rails efficiently in Kubernetes and I decided to make this talk to share some of the things that we learned. So today, we'll start from getting a quick intro into Kubernetes for those who haven't been exposed to it yet. Then we'll talk about what makes Rails a bit special in running it in orchestrated platforms like Kubernetes. And then I'll share some of the things that helped us to migrate all these apps. First of all, please raise your hand if you've ever played with Kubernetes or container orchestration? Oh, it's quite a lot. So, in 2018 ... Almost everyone agreed that containers are awesome, because they provide this universal interface for any app to run it, in basically any environment that you want. But the problem of running and scheduling containers is not, it's still there. You need to run these containers somewhere. Just as a note, I'm not going to talk about containerizing Rails, because there will be a great talk tomorrow at 3:30. If you're interested in hearing about containerizing Rails itself, please (mumbles) this talk by Daniel, and I'll talk about running it in production and orchestrating containers. So ... You have the container with the app, and you're going to run it somewhere. In the static world, where servers are configured with something like Chef, you would have a bigger server that would handle more fat containers that require more memory and CPU. You would have a server with a bit less memory, and you would decide that you would run some other containers there. So, all of that is, all that math is done by humans, and assigned by us, maybe configured with some scripts, but it's still pretty manual. And if we think about this process, there is actually quite some resources can be pasted, because there would be still some CPUs unused left, some memory left. And nothing really stops us, well, the desired state would be that every CPU is used and ... And all the resources are efficiently scheduled so that we would achieve the same results, have the same capacity with less resources consumed, and save some energy. What Kubernetes solves, is efficiently scheduling the ... The resources on your servers, in a very dynamic way. Bin packing the containers that you want to run in the very best way. So, if we want to define it in just one sentence, it's a smart container scheduling for better utilization. Two things here. It's scheduling. I want to emphasize, because you no longer have a defined list of servers that you bootstrap. It's all schedule dynamic. If one server crashes or the power dies, the same unit of work would be rescheduled on another machine, and you wouldn't even notice that. The second about utilization, to make the best use of the resources that you have, which is specially important as you grow. Because you would have more servers, more unused CPUs, more unused memory left. Which of course, you don't want to just sit there. If ... The next step, I just wanted to ... Get some shared recovery, and to talk about the concepts that Kubernetes brings. First, the very basic concept is a pod. Pod is basically running container, one instance of something, so if we run one process of Sidekick, it would be just one pod. And obviously, one instance of something is not enough to run a whole app or service. So we come to the next concept, codeployment, which is a set of pods. A technical app would have maybe two deployments, one with the web workers, and another with job workers. And the number of instances in the deployment, the number of pods, is very dynamic, it can be adjusted, you can scale it up, you can scale it down. You can even set up office scaling. If you ever worked with Heroku, you probably remember this concepts of Dynos and the Dyno count that you can adjust and scale up. It's the same with the deployment in Kubernetes, which can scale up and down. This all sounds great, but how do you actually tell or describe all these resources? If you use Chef or Capistrano, you probably had a Ruby DSL. And as any DSL in dynamic language, it comes with good and bad things. Good, it can be very expressive. You can describe lots of things there, but sometimes, it comes as a disadvantage too, because you can do basically anything that you can do with Ruby. And sometimes, you want a DSL to be as minimal as possible. So Kubernetes leverages YAML files as a way to describe resources. You would have a YAML config of maybe 20, 30 lines of a resource. This is just an example of a config for Rails app. Then, you would apply that config to a Kubernetes cluster, and store that same YAML file in the repo, which ... Which I think is a great benefit, because it's just a couple configs that are in the same repo, not in another repo with cookbooks or whatever. At least for me and some of the people that I know, this came to some kind of ... Shift in the mindset, because we had to move from controlling servers, when we deploy code applications on new apps. When you deploy resources with Chef or with Capistrano, it was ... At the end, it was just sequentially applying commands by SSH, and controlling servers. You would always have an output of exact SSH commands, and see what's going on, see what fails, see what commands are stuck. And so on. With Kubernetes, it's quite different, because you just take a YAML file and tell Kubernetes to apply it, and that is the desired state, which will be rolled out there in a few seconds or in a minute, if you have, if you applied a very big subset of configuration or resources, it would take maybe more. But you have ... At least me, I had to move from this concept of controlling servers, exact machines, to describing configuration. If we take controlling servers, it would be running commands remotely, comparing their output, when in contrast, when you describe the configuration, you just push it, and then poll for it to apply. Which comes with the advantage of being abstracted from physical machines, which is great for things like self-healing, if one server goes down, the same work would be rescheduled somewhere else. While if it's controlling servers manually, it can be not very prone to failures. For instance, a (mumbles) file would have Capistrano config with more than 100 hosts, and eventually, once a couple month, some costs would die. Just because it's too many servers. And we had to, this wouldn't self-heal, if it was configuration described with orchestrated containers. And yeah, if we talk about tools and technologies, example of controlling servers is Capistrano and Chef ... In contrast, platforms like Kubernetes and Mesos, let you describe the configuration, describe the desired state, and the platform would roll out the state for you. So, containers. Kubernetes takes a container and runs it for whatever number of instances that you specified, and it's very easy to run a plain container, but Rails, eventually, is a bit more than just a process. Many Rails apps work as a monolith with many things embedded into them that makes them sometimes quite special to run as a simple container. One thing, if you use Heroku, you probably were familiar with the concept of 12 factor app, which is a methodology for building software as a service apps that promotes clarity of performance, that promotes minimizing difference between production and development, and apps that follow the 12 factor manifests, they are usually easy to scale up and down. With no significant changes to the architecture. As you have guessed, there's 12 factors, and we'll go through a couple of them that are, I think that can be sometimes be forgotten, when we work on Rails apps, but they're nevertheless quite important, especially if you want to run the app in Kubernetes successfully. One of them is disposability in termination, which, in other words, is what happens when you want to restart or shut down a process. For something like web requests, it's as easy as waiting for the request timeout. If you know that a request will not take longer than 30 seconds, you stop accepting any new requests, wait for 30 seconds, and then you're safe to shut down the worker without losing any live requests. Same about background jobs. You have to wait for the current jobs to terminate, and then you're safe to shut down the process without losing any work that is going on. However, this might be a bit trickier for long-running jobs. This is one of the examples of a very simple job that can become long-running. If this example, it rates over some records in the database, and does it, and calls the method on active record. If you have just a few users, this job would complete within seconds, maybe a minute, but as it grow to a size of us, we have millions of records in a row, and we would have jobs that were very similar, they were doing similar things as this example, and it would take them weeks to iterate over all records and do something with the records. So, how do we shut down these workers? We must keep in mind that workers that are long-running, they will be aborted and re-enqueued. Which in this example, would mean that this job can be maybe aborted in the middle, and then it will run again, which is essentially what Sidekick does, and ... Here, we come to the concept of item potency, when the work that, when the code that is called there should process the same ... Should not process the extra side effects, and be safe to be executed more than once. Another aspect of 12 factor apps is the concurrency. That allows your app to scale with a process model. They have this illustration, which shows that you have web workers and some job workers, which you can scale up and down, and to be able to successfully scale these workers, they should not share any kind of resources together, because if they would all ... If they all had a bottleneck of just one shared resource, they would not scale very successfully. Talked a bit about 12 factors. Some things about Rails, to know when deploying to Kubernetes. First is assets. When you use something like Capistrano, it would probably run assets, precompile on every server that you wanted to serve requests from. Which was a bit of a waste of resources, if you can precompile assets only once, and then distribute that image on all servers. Instead of precompiling them on each server. So, the efficient way of doing that is to embed assets into the container, with the app. So that when the app starts, it already got all the dependencies, like assets. Another part that can sometimes get a bit messy is database migrations. In the Rails community, we're very much used to migrations as a part of deploy. Maybe as a hook at the end of deploy, you deploy the code, and then you apply the migrations right away. This step of the deploy process makes the deploy a bit fragile, because what do you do with the code change, if the migration failed? Do you roll back the code, or do you keep running it? If you rolled it back, you already had the new code in production, for like, 30 seconds or a minute. It might not be very safe to roll it back. So, we tried to avoid migrations as a part of deploy and make the, made developers to write the code that is compatible with both the old and the new schema. Because at the middle of the rollout, you would always have some workers on the older version and some workers on the newer version. We try to make the migrations asynchronous, which helps to establish this contract with developers, that the code may run on both versions of the schema. So instead of changing code and applying the migration as the same step, the first step could be add a migration, for instance, that adds a column, and only then, you would update the code to interact with the new column, when you'll be ... When you'll be sure that all the schemas are having that new column. Usually, these asynchronous migrations, they would be applied in a few minutes, after they deploy, which gives us, which we make a bit easier for developers by announcing that in Slack and giving them a notification when their migration is applied. Another part of Rails is secrets, which is ... Which, well, I think none of the modern apps run in kind of isolated, basically, every app now would interact with some kind of third party API that can be S3 buckets or ... Facebook API, which, and all these third parties and APIs require some tokens, API keys. Which Rails has to be aware of. One approach is secrets in the environment, variables, the approach that Heroku promotes. This is very easy, but as you grow, you would have hundreds of tokens, and you probably don't want to run the app with hundreds of variables that the app is dependent on. You may think about putting secrets right into the container with the app, which is not the most secure approach that you can take, because anyone who gets the container also gets the secrets. Fortunately for us, Rails 5.2 ships with the credentials feature, which allows you to put encrypted secrets credentials right into the repo. And edit them, and it's fully safe to commit and store them in the repo. All you need to read and change them is the Rails master key. And as a result, you run the container with just one environment variable, which is the key to the rest of the secrets. To recap, following 12 factors helps it easier to run Rails apps in orchestrated environments, and being mindful about worker termination also helps. Migrations as a part of deploy, as a hook after deploy can be fragile and make the rollout process not very safe, so asynchronous migrations can help solving that, and ... Credentials that ship with the Rails 5.2 make the process of sharing keys a bit easier. At Shopify, we've had ... We've had hundreds of apps running in different environments. Some of them were in Heroku, some of them were in AWS, some of them were on ... On physical hardware, managed with Chef, and what we wanted for our developers is to stop being exposed to all that infrastructure and just have a platform to run Rails apps somewhere. So we've decided to invest in something like Kubernetes, which would allow us to scale, to scale containers in the best way, and also to utilize the resources in the best way ... As I said, describing, if as I said, if we wanted the apps to run in Kubernetes, they had to have the resources specs in YAML, which is pretty easy format, no more than 20 or 30 lines of code in YAML, but still, we didn't want every developer to learn that YAML declaration. What we did instead is we created a bot that created a PR on Github, based on stuff that you use in production. If you use Sidekick, it would generate you a YAML config for that unit of work in Kubernetes, and the first item in that PR description would be a checklist that recommends to look if that config makes sense for this app. If that looks good, just merge, and your app is ready to run. The next step is to apply the config with the cube control CLI tool, and if you ever tried cube control, apply file and then to YAML, it returns immediately, because it just lets Kubernetes know about the desired state, and then it takes the system for some time to provision all those containers. To find a server that has some CPU available, and schedule the work there. And this process is not very visible. If you're used to Capistrano, you probably want some kind of progress to monitor, to see how many of your servers already run that new container. And if maybe ... What's the progress of the rollout, and things like that. So we've made a gem called Kubernetes deploy. That provides visibility into the changes that are applied to the Kubernetes cluster. This is open source project, it's been adopted by other companies as well, and just like Capistrano, it applies ... It applies configuration and lets ... Not working. There was a little video preview. And applies the config, and ... And tracks the progress. So, robots help humans to migrate apps by generating YAML configs. Developers didn't have to write YAML configs anymore and Kubernetes deploy brought visibility into the rollout progress. Overall, I think the steps that Rails have been taking towards running in Cloud and running in container environments just like Heroku these steps were in the right direction, that helps us now to run Rails in Kubernetes. This is a lot thanks to Heroku that has been pushing Rails into that direction. To make that run smoothly. In containers. For us, and for many other companies, Kubernetes helps to schedule the work efficiently, save resources, and to stop caring about on which server some container has to run. And then, it's not magic, it just, a technology that helps to schedule the work. There were some things that you have to know about Rails in running it in orchestrated platforms, to make it run smoothly. Before it took me hours to set up a new app in production with Chef and Capistrano, I had to find an instance, provision it, write some cookbooks, or do something else to set up all the environment, all the packets that were needed there, to run Rails. Now, with orchestrated containers, it's a matter of just a couple YAMLs. It becomes, I think it becomes very standardized. In terms of getting started with any app, if the app is using Kubernetes, you can just read through the resource specs and see how the deployment is organized, which reminds me what Rails did more than ten years ago, because before every app has used their own structure, and it took you some time to understand how it works. Now, you can get started with any Rails app, within hours, just because you know that all controllers are in app slash controllers and config routes has all the routes that the app has. So, Kubernetes brings this abstraction. It collapses this complexity, what David Dietrich talked in the keynote this morning. You would maybe have a question, when it's worth, when it's worth getting started with Kubernetes, moving on to ... To orchestrated environments. I would say that if you want to stop caring about physical machines where something runs, if you want just a platform to run a container, that's a good solution. You can follow me at Twitter. If this ... if working on ... Things that I mentioned in this talk from Rails to the infrastructure by Kubernetes sounds exciting, please hit me up, and thank you for coming to the talk. (audience applauds) So the question is, what's the easiest way to organize asynchronous migrations? One way is to just ... Add some checks for pool requests, so the developers ship pool requests separately, let's ship one PR with the migration, another PR with the code change, because that also makes it easier to revert something if you really want to revert. And which also makes it easier to revert code, and not revert the migration. Because you wouldn't really want to reverse the migration. And ... Yeah, does that answer the question? - [Man In Audience] I was just thinking about how you actually run the migrating. - Yes, how we run it. We have a recurring job, that runs every five, ten minutes, that checks for any pending migrations and applies them. And that works for a background job. I have a blog post about that, if you find, it's in my Twitter. How do we deal with stateful resources? We don't run things like MySQL in Kubernetes yet. With things like Redist, I know it's been a bit painful, because I don't know, Google Cloud or any other provider would diagnose the servers unhealthy, it would reschedule Redist to another node, and it would be down for that 30 seconds while it's being rescheduled. So it's something that we're actively looking in. I would say that is not as smooth yet, but for stateless things, it's ... It's getting better. So, the question is, do we use Kubernetes secrets to store credentials? Yes, we do, and that Rails master key that I had a slide with, you can put that into Kubernetes secrets. And it just works very very smoothly, you just mount it. I was surprised that it just worked. So the question is, how do I manage configuration for different environments? By environments, you mean like staging and production? We don't have a classic staging. We use feature flags, and ... But something like redeploys would be interested to look in. Thank you all so much for coming. (audience applauds)

Info

Channel: Confreaks

Views: 4,442

Rating: undefined out of 5

Keywords:

Id: KKtS0QD5ERM

Channel Id: undefined

Length: 31min 56sec (1916 seconds)

Published: Tue May 15 2018