Kubernetes Security Best Practices - Ian Lewis, Google

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right I think that's my cue to get started hi everybody hi so this is the talk on curb Bernese security best practices so this talk is going to be a little bit about what sort of you know simple things and hopefully easy things you can do to help make your queries cluster and more secure my name is Ian Lewis I work for Google as part of Google cloud and I'm based in Tokyo Japan I focused mostly on kubernetes and Google container engine and also on container security related features so I'd like to think about kubernetes in terms of you know what type of piece of software it is so most people will call it a container Orchestrator but it is I like to think of it as something like a framework or you know basically something that provides infrastructure so in the form of an API and as part of that it provides features and many of these features are security related features that you can take advantage of in order to make your applications and your infrastructure more secure so in terms of topics and the agenda for the talk today I think with container and kubernetes security related topics you could probably talk for an entire day so I don't think I have time for that but what I am going to do is I'm going to top cover these topics so I'm gonna do a little bit of security 101 just give a little bit of background we'll talk a little bit about run science security and host security as well as network security and provide some tips on how to make those a little bit more secure there's obviously many other type of topics related to container and kubernetes security like threat detection determining if you've been compromised making sure that your build CI CD pipelines are secure in making your images more secure how to do operation security operations those are kind of topics that I'm not going to be able to cover in this talk but hopefully you'll get some things that you could take home and do tomorrow to improve the security of your cluster so just sort some background some security 101 this is I all just want to like get everybody on the same level to understand the background for what we're trying to do here so basically like security the first thing that I want to everybody to understand is that security is not something that is just a linear thing it's not a you know a single scale where you make one thing you make your entire cluster more secure or whatever it's there's many different pieces of a Khomeini's cluster and many different pieces of all kinds of software and so each of those has its own you know security posture or security context and so you need to think of it as a holistic picture another thing that we need to understand is that attackers have can pick their targets so like this is basically part of the idea that there's a an attackers advantage to security so this is usually people say something like attackers advantage defenders dilemma so like essentially that means the attackers have an advantage over defenders when you are thinking about security what's related to software so basically attackers can pick which things they attack they can pick the time that they attack so they can pick a time in the middle of the night when you're sleeping in order to attack your cluster so they won't noticed you know things like that and so like these are things that you have to think about as you are doing the security or cluster and attackers can attack whichever thing is the weakest link in your cluster so you need to make sure that the holistic picture of this purity of your cluster is is thought about and is is addressed so part of how you do that is by applying something called layered defense or defense in depth so this means that you make each of your your security pieces of security are redundant in your clusters so this means that an attacker has to do multiple things in order to get data that's interesting so get you know user data or credit cards or anything like that that is important to them most programmers are familiar with the idea that you don't want to repeat yourself you don't want to have multiple you know copies of the same code in your codebase or something like that whereas security is very much the opposite we want to be redundant we want to make sure that we have multiple things that we're doing in order to secure something we also want to limit the attack surface at each piece of these levels so that what that means is that we want to give the attacker as a few options as possible in order to attack a cluster so this means usually this means reducing the amount of software that's installed on the machine or inside of your images inside of inside of each of your applications so that the there's less software that you have to patch less software that is vulnerable to attack and then finally the last concept that I want to talk about is the concept of least privilege this is the idea of like providing each of the applications that you are running with the least amount of privilege that it needs in TOR in order to run and this is so that if somebody is able to take the credentials of that application they don't get more permissions than err absolutely necessary because that means that they could then further attack or get more information and then escalate their privileges and then get a cause of wider and wider problem so today like just this is an application that I'm going to be using as my kind of example application as I'm talking about the different security ideas and this is just a guestbook application that has a few different services here so like we have a web front-end which is just a web service and that web service the users can post a guestbook message that gets saved using the message service to a Redis database and we also have an authentication service called the user service you don't really have to think about this too much this is just an example of application that I'm going to be using as we talk about the different you know security related problems and things we can improve so I'm going to talk a few about a few different ways of attacking a cluster and then try to talk about some of the ways that we might mitigate those problems so the first thing I want to talk about is attacking the kind of coup Bernie's cluster itself or the API server itself so by default in a pot every pot in a kubernetes cluster gets what's called a service account and if you're particularly if you're not using our back each of those security those accounts or those service accounts has a number of permissions or a role associated with it with it that allows that pod to talk to the API server and make requests to the API server and if you're not careful and you have given too many permissions to this particular service account you can use that service account to then do things with the API server like send requests there so we can imagine that our web front-end has some vulnerability and in it in it that allows us to read a file off of the filesystem and so what this does is allows would allow the attacker to read a token file that then the the attacker could then use to talk to the sender request to the API server with that token and then be able to perhaps like create pods maybe read some secrets that's sort of thing and so we want to is kind of be able to prevent those type of attacks and mitigate them so the first thing that you absolutely positively need to do is make sure that you are using are back in kubernetes this is the default since I think about 1.4 or 1.6 something like that so it's been the default for a while but it's still possible to disable it if you don't know what it is so this is something that you absolutely positively must be using if you want to have a secure cluster and what this does is it gives roles to each of the users in your cluster as well as each of the service accounts that are running in your cluster so this is the service account that gets added to your pod and then each role in our back contains a number of permissions that the user or the service account can perform so this is usually a verb and an attendee and an hour so something like get secrets or update config map like these are things that that a user can do and then those roles can be assigned to people so you can assign the same role to multiple people and each of those roles can have multiple permissions and then another thing of notice that our back settings apply to namespaces so that if you can assign roles to a user that are allowed in one namespace but not in other namespaces so roles look something like this so in this case we have a cluster role in a cluster role binding and what this is doing is just assigning a role to a user and you each user can have many roles associated with them and in this case we're applying a role that applies to the entire cluster but if you use just a role or a role binding that applies to a single namespace so that's going to allow us to mitigate the allow the API server to decide whether we can perform certain types of operations another thing that we can do is that we can firewall this the the API server so that the attacker hopefully does not have access to actually even send a connect to the API server from you know say the Internet so we can do this using regular firewalling rules or port firewalling rules or if you're using something like gke you can use a master authorized networks feature in order to limit the IP addresses the can edit that can access the API server another way to like mitigate further attacks on the cluster is to use something like Network policy which allows us to if we look back we're saying that an attacker could then continue to like read secrets and then attack other parts of the cluster so things like network policy can help to restrict the things that you can have access to from a single pod so if you have say our message our front end doesn't talk directly to Redis but we want it to talk to the message server instead we can set up Network policy to make sure that our front end is not able to send requests to the Redis server directly and network policies used has an implementation which is usually calico or weave or some other network policy enforcement mechanism and it looks something like this where you specify a label selector to tell which pods that the policy applies to and then which pods that that pod is able to accept request to or send request to so in this case what we're doing is we're saying that this applies to our Redis pod and that we are going to accept ingress so incoming network requests can only come from our guestbook or our message application and not from the web front-end another way that we can another problem that is typical typically happens in clusters and is a way that somebody can attack a cluster is to get access to cluster components and many of you like those cluster components so one typical thing is to do is to like just if you've provided your XID that's the API server uses to the cluster then an attacker could eat could interact with Exedy and bypass the API server completely so you need to make sure that your Exedy is is secured so the things that you would need to do is to make sure that your XE D is set up so that it requires authentication and then firewall it to make sure that you cannot Nate send network traffic to exit e from inside of the cluster this just makes it so that like your your cluster objects the things that the API server saves in exit ER are secure because if a user is able to modify those objects directly in XE D that's the same thing as if they had full access to the entire cluster they can create pods or they can read secrets and things like that and so we don't want them to be able to do that another thing that you can do is you can encrypt the data that gets saved in AG CD so the API server has the ability to encrypt data that's coming in and being saved tag C D so this allows you to make it so that if somebody tries to read from Exedy directly they're just reading encrypted data and they're not reading data that's in just basically saved in plain text another way that you might attack a cluster would be to break out of the container and then try to get access to the coast the host itself so this is a single worker node in the in the kubernetes cluster and then read data off of other containers and and or the the configuration of the Kuh blade in order to further attack the cluster and get escalated privileges so typically people will break out of a container by using something like a current kernel bug or some other container or some other kubernetes bug there have been several that allow container escapes and so a user might or an attacker might be able to break out of a container and then read files from the filesystem or otherwise execute applications on the hosts outside of the container and so this would allow them to read other and see other containers that are running on the host read the secrets that are associated with those containers and and whatnot as well as be able to read the the couplet their crew blitz authentication and certificates so what then users can do and attackers can do is use those credentials from the couplet in order to talk to the API server and pretend it's the couplet and then try to get more information about other pods or other secrets so one thing to do this to to try to mitigate this is to try to make it harder for people to break out of a container so one way to do that is to run applications inside the container as a user rather than as a root user typically in doctor people run applications as root inside the container and this just remove makes it one step easier for folks to be able to break out of continuous and so it's much better practice to run them as a user typically to break out of a container they have to then they have to first get X privileges as root inside of the container in order to break out of the container so first step is to or the first kind of low-hanging fruit is to your applications as a user inside the container another good practice is to make your root filesystem read-only for your application so sometimes applications require you to write some data to disk or like to the file system so you can either create a temporary directory or an external volume that you can allow the container to write to but the root filesystem itself you can make that read-only and this prevents applications or attackers from overwriting files on the file system that you don't expect them to be able to write to so maybe your application allows you to upload files and you have a bug in the application that allows them to upload a file to anywhere and so like that makes it so that they can overwrite some binary and then trick your application into executing that binary and then they have full privileges inside the container so this just helps you is like one way of trying to prevent that type of attack another thing to do is to like not allow new privileges or set the new new previous frag flag on your application and this just makes it so that if your application is executing another binary that you don't allow extra privileges to be attached to that particular application or that particular process so this makes it so that it's it's harder to escalate privileges inside of the container and if you do these all together then that just makes it even harder for application or attackers to get privileges inside the container that they can then use to break out of the container another idea is to use sandbox pods and this is like using a basically a different runtime from the normal container runtimes in kubernetes and what these do is they give you an extra layer of isolation between the sandbox or between the container and the host system that allows you to run applications that you don't trust as much so typically these might be used for something that we're users upload code or where you're executing third-party applications and so things like cotta containers orgy visor or solutions for that that make it really hard to break out of a container so gee guys don't look something like this where we have like you know it intercepts this calls and handles that you know the in your space rather than using the hosts and so that makes it really hard to take advantage of bugs that are in the host operating system in order to break out of the container another way to further lock down on containers to use set comp a barmer and/or SELinux so a container if you think of a container like a jail for your application it's a little bit leaky in the sense that you can do a lot of things there's a large surface area that you can attack but set comes then allows us to put it to add another kind of like layer on top of that container isolation layer and then app armor and SELinux allow us to add another layer on top of that that adds other policy more policy and more options in order to kind of further lock down the container and make it hard to escape so one thing that I really recommend that people do is to enable set comp the the runtime default set comp policy on their containers and this is like woefully still not applied by default and so what this does is just applies the default docker run time or set comp policy to to a container so at the very minimum you should be doing something like this you can also use a primer which thankfully if it's installed will apply the default app armor policy to your to your containers but if you need to develop your own profile or you want to really make sure that your running app armor using an app armor policy then you can specify that as an annotation in your pod as well and then SELinux is still the only one that is currently GA is an as a feature and that's part of the security context so inside there you can specify SELinux options in order to specify the type of options that you want to use to locked out lock down the container the downside of all of these is that it's kind of requires that you know what your application does in order to be able to specify the right types of options in order to to lock down the container properly and then after they've broken out of the container we can try to mitigate things like them reading your approval it credentials and then being able to do other things in the cluster by using our back for the couplet which is enabled by adding some options to your couplet or to your API server and this gives or creates a special R Brack role for a node that restricts things like reading secrets to on the node to only 2 pi Secrets that are attached to pods that are assigned to that node for instance so a node can't just read any secrets in the cluster it can only read the secrets that it's allowed to see and so this kind of limits a little bit what you can see from the node another way another thing to do is to rotate those certificates so that even if that certificate is stolen it's only valid for a short period of time before it gets becomes invalid another thing that's that's an issue is unsecured pots so this is basically if you create a pod I've showed you a bunch of options that you can set in order to make pods more secure but if you have many many people using your cluster and you want to make sure that everybody is like doing the right thing and setting all the the features that they need to on the on their pods you want to be able to set some sort of policy that says you can't create pods without that run is route for instance so you can use something like open policy agent excuse me in order to block those type of features by creating some policy that says that specifies the type of policy that you want to you to use and then basically uses the admission controller features in the API server in order to block those type of pods from being created so this is something to investigate if you want to kind of help secure a cluster against multiple teams that are using it and then finally I want to talk a little bit about listening to the traffic that goes between the services and doing things like request forgeries this means like sending a request to another service as if you were you know as if you are a different service and not the one that you are so like so in this case we're sending direct requests directly to Redis instead of from the message service and we can mitigate that by using things like sto or other service meshes that provide a identification or a an authentication mechanism for your services so that you know which service is sending the request and which service is receiving the quest request and you can block services that are not being used or that are not allowed to access it so this is very similar to the way that network policy works except it works at a very at a service level rather than at a network level so it's another way of having adding that kind of defense-in-depth so what this do does is it puts a proxy between services and then enforces service policy that way and then it encrypts traffic between the services and roles it update continually updates the certificates so that they aren't valid forever and this will allow you to do things like if the web front-end tries to send much like network policy if the web front-end tries to send requests to services that it's not allowed to send requests to then those requests will be blocked by the proxy on the other end of the of the requests you also can't just listen - today - or like traffic on the network that your your cluster is running because all of the pods will be sending all of the interrupted traffic all the traffic is encrypted between the the services and so listening to traffic on the network means that you're just listening to encrypted data or encrypted traffic on the network so this just helps secure your cluster overall if in case it's been compromised so I'll just finally finally finish up by saying giving a few extra tips some general tips one is to update your Kumari's cluster early and often still like often like kubernetes itself has some security issues with it and so you want to make sure that you're using the latest and greatest kubernetes and make sure that your node obviously has up-to-date features as well or up-to-date applications don't use admin privileges for your everything in your cluster use admin to say like deploy applications only from your CI CD or your your say Jenkins or your your deployment server don't use admin from your say laptop because maybe if you lose your laptop you don't want to like have somebody be able to access your kubernetes cluster and be able to use admin privileges try using things like benchmarking tools such as coop bench and which will inspect your kubernetes cluster and tell you give you suggestions on how to secure it better and also use things like manage services like gke which will give you a little bit better defaults and better they're not perfect of course but they'll give you a little bit better defaults in terms of your security posture but just by using the service all right so that's all that I had time for hopefully that was useful to you guys I don't think I'll have like too many time too much time for questions up here but I'll be hanging around outside and if you have any questions you can grab me and we can we can chat thanks a lot [Applause]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 28,006

Rating: undefined out of 5

Keywords:

Id: wqsUfvRyYpw

Channel Id: undefined

Length: 28min 53sec (1733 seconds)

Published: Mon Dec 09 2019