Securing Kubernetes: Best practices and effective strategies

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right welcome back everyone for lunch so we'll start with the first session post lunch so I'm very happy to welcome Nilesh so uh Nilesh is uh are going to give a talk on securing kubernetes best practices and effective strategies so he has over eight years of Industry experience and I think we'll be really getting a lot of insights from him thank you thank you and uh hi everyone so uh I'm here to talk about securing kubernetes and what are the effective strategies and best practices that you can do so I'm not from around here I'm from Sri Lanka so I flew yesterday and I also run a similar Community called Cloud native Sri Lanka back in Sri Lanka so let's look at kubernetes attack surface before I do this I just want a raise of hands who runs kubernetes like managed kubernetes clusters so fans right there's a few of you right what do you use openshift or Rancher or just Cube idea Rancher all right okay so this is actually relevant to a lot of people so the people who run in the cloud manage gke AKs eks they only have to you know manage their application vulnerabilities but the people who run their own clusters the first entry Vector is your control brain components you have the API server running you have the hcd API you know all of your control plane components are over there she had to protect those the next one is the application vulnerabilities if somebody hacks into one of your containers they can execute a remote shell talk to your services do mess up your cluster quite badly then let's say your VMS are on a public network and connected over the Internet or something like that uh so then cubelet API is another place that you have to take a look at and protect because that's the place where all the container the docker demon and you know everything that runs is managed by cubelet you can do quite a lot of damage if you get a cubelet access there's a lot of light coming from there then finally accessing your virtual machines if somebody hacks in app gains SSH access or something into your virtual machines they can again ping your services and because the network is still there so they can do quite a lot of damage on that front as well okay so let's start with protecting your applications first because that applies to everyone here and then let's look at how to uh manage the cluster level components so the first thing when you are doing application security is securing your CI CD pipeline that's the first thing that you've got to do and in cicd pipeline there are a few stages there's a source stage then there's build then there's test and then you deploy in a very high level and across all these stages you have auditing and monitoring then I'd build and test you have static security testing and across again all the stages you have vulnerability testing and then at you know deploy stage you have runtime security so at the source level I'll just talk about auditing and monitoring for a second because I don't have a slide on that at the source level it's basically you know developer best practices you know putting proper PRS and doing all of that which enables you to do proper auditing and monitoring at build level and test level it's the way you run your static scanning and then log them and then create git issues if there are problems if you ignore issues the the whole process that you do and at deploy level basically the logs of your runtime security okay so static security testing the first one I'm going to talk about is image vulnerability scanning and I believe all of you know how images work images has a base layer so each image is built with layers on top of them so if at least one layer is you know compromiser has a vulnerability it would go on to the other layers as you can see from the example so by exploiting a vulnerable image that is running you can do quite a lot of damage can do privilege escalation I can get remote shell access information can be leaked like for example if you are I've seen certain companies like big credentials into the image rather than Mount so I've seen that quite a lot I see some of you are smiling maybe you are doing that don't do it so Adidas attacks and all of that right so yeah so to avoid this so basically to avoid information leaks basically scanning is not going to help you it's a developer practice that you have to do don't break credentials into images but other vulnerabilities you can quite easily scan and fix using you know tools like clear trivia snake and few others and also a rule of thumb is use small images like Alpine images to start with or official images they have the least set of vulnerabilities and there are scenarios where you know there are vulnerabilities which has not been fixed yet so in my company what we do is in such cases we create a gate issue and then we ignore it and we move forward and then we fix it later if it's not critical but depends on the processes that you have and you know the risk assessment and all of it that your particular company has right the next one is code vulnerability scanning basically scan your entire code like there are tools like sonar Cube Etc which does quality Gates and scans whether you have hard-coded passwords and stuff like that in it which would allow you to figure out whether there are any vulnerabilities in the packages that you have imported to your code and all of that and fix how to fix those again this uh goes with the security practices that your company has and the policies so there are certain things you will obviously have to ignore and go forward there are scenarios where you can't do it and you somehow have to fix it and then push it then finally configuration scanning so there are tools like Chekhov Cube SEC confessed yeah conf test Etc which basically scans your configuration files like the yamls that you import on Argo CD it would test those and against a bunch of policies or best practices it would tell you like for example let's say that you haven't set security context on your deployment so it would tell you that you have to do this as a best practice Etc so they they allow you to enforce security standards and you know Enterprise you can have your own bunch of rules and apply them so that there are no surprises when you deploy emls to the cluster so uh I thought I'd talk about kubernetes admission controller here so this is how it works like if you are using uh something like open policy agent which you can say like you can only run pods with a certain Run images of a certain registry and nothing else what you can do is uh admission control has two web books one called validating web hook and another one called mutating webhook uh what the mutating webhook do is according to a given logic and a bunch of annotations you can actually change the content of the ml before it gets applied and executed whereas the validating web hook uh all it does is it scans your yaml and then makes a decision where they should go forward and get applied or you know get denied right and then these plug into third-party policy controllers like the open policy agent um yeah so that's how kubernetes admission controller works and if you do employ something like open policy agent this is how it would look like once you enforce certain rules on kubernetes configurations the next one is container hardening how do you harden your containers uh pretty basic stuff removing bash and shell is one of the best ways you can stop people from you know executing into your content and you know doing stuff uh if you look at kubernetes API server and all of it you can't really SSH into them because they have removed bash and shell then the next one is making your root file system read only we make sure that your application is you know your container is stateless and also no Intruder can come and write things into your file system and let's say that you are running something like nginx or something that actually do write certain parts into your application then you use in MTD IRS then you mount it into your certain parts and then it allows you to do rights but the base contain image at the file system you cannot then finally run as a non-root user so certain images they do need privileged users like for example if you're running Falco you need a root user but other than that you know normal applications that your Java app your python app Etc which are non-system apps uh make sure to run them as non-root so that even if somebody SSH is in and attacks your container they will be restricted within that will not go into the run the VM level then let's say that the image is already built you don't have control over the image all you can have is the control of running it then there are a few things you can do as well you can use a startup probe and execute a shell command to remove Bash of your container that's quite easy to do then there are in security context policies you can set as run as group run as user and run as non-root to set things up and also as I mentioned earlier enforce these rules using open policy agent so that's how those are the strategies you have for container hardening container runtime security this is where when we do all our scans and everything is fine but at runtime there are some malicious code that is running and we don't know like maybe a Bitcoin mining or something like that we haven't captured at run time level uh the static scanning level so a little intron how our containers work as you can see physical Hardware runs below and then we have the Linux kernel and then we have the Linux container lxc running it is like the virtualization and then you have containers running so basically if one can all of these containers are now talking through sys calls to your kernel of the VM to execute processors and commands so if there's a vulnerability in your kernel the containers actually can exploit it and then attack other containers in your VM so in a single tenanted scenario this is not a big case but if you are if your cluster is let's say multi tenanted like you have outside certain name spaces to third party software vendors or whoever that's you know building in your infrastructure this could be a problem that's where you know container sandboxing comes into play by using container sandboxing like gvisor cada containers Etc they would isolate your container processes in such a way that they cannot be exploited and attack other your kernels cannot be exploited using those if container sandboxing is not an option the other option you have is using other app armor or set comp profiles using either appomose a comprofiles what you can do is you can actually restrict processors or CIS calls happening in your kernel and then you can actually say these are the ciscoes that are allowed and you know anything other than that is not allowed so there are standard default circum and Obama profiles available so you can use those to run your normal applications and then there are other tools that continuously monitor what happens in your runtime processes one tool is called cystic Falco which is open source by utilizing that you can quite easily it has a default rule set as well so if somebody has executed a remote shell or if someone is writing into the file system it automatically gives an alert and you can act accordingly so these are the strategies for you to manage container runtime security this is a general rule of thumb basically don't use environment variables uh always Mount them as files uh and then if you are managing Secrets it's recommended to use an external wall like hashikov Vault or in the clouds they already have secret managers available and as a developer discipline do not log sensitive information I've been with certain organizations that there have been pii issues and other issues where they've locked tokens and Etc when an error has been thrown so yeah that's a developer discipline that you have to do so that's it and then this is for if you are running multiple tenants on your kubernetes cluster isolating your network one is namespaces but I think a previous Peak already spoke that namespaces is a very weak isolation so you have to harden it so one way of hardening is using network policies by using network policies you can quite easily say this name SP the containers in this namespace cannot talk to the containers in the other namespace I started like a firewall and then even the if you have API gateways you can actually make sure even the internal Services can go through the API Gateway with authentication but if not then there's nothing much you can do or use mtls with a service measures or I think psyllium now has mtls with evpf but here's the thing just because because you install link ID or istio and enabled automatic mtls doesn't mean you are secure it basically identifies who's talking to who that's it you have to attach Network policies too so that you can govern and say only this can talk to this one mtls would just encrypt your Network traffic which is good but you need the whole solution uh that is it on isolating your tenants in the network so here's a small example of how Network policies work uh as you can see there's an English all a love policy for the symbol app namespace and the requests do come to web server and within it we have added Network policies so that web server cannot talk to database and web server can only talk to the python backend and python backend can talk to the database so because web server is the public exposed one if somebody hacks into web server and gains shell access through to the web server they can only attack python backend they cannot directly attack the database so that is how you do security right so the next one is protecting your control plane I think I'm about time also right so hardening your kubernetes cluster so I heard some people from some people that they have created clusters using Rancher uh I think by default Rancher do enable audit logging but just to make sure I think the default way they do it they don't Mount the audit policies and then Mount the logs back into your VM so that you can take a look later so that's very important so when you enable audit logging via rke or something else you have to make sure you give a path to lock the audit and also not just the path is enough you have to mount a host path into like you have to give you had to basically provide a path on the host and mount it to the API server so that it's actually retained even if the API server dies and restarts the logs are retained on your your master node so that's very important because I've seen a lot of Enterprises when I asked them oh I have audit logs enabled so can I see that and then when you SSH into the VM there's no audit logs that you can see so they have actually enabled the audit logs but they're actually writing within the API server container not in the VM so you have to mount the volume horse path volume so that you can retain it uh the other one is use CIS hard node images I think if you are running even Azure or Google and if you are running your own clusters rather than manage services they have in their Marketplace CIS hardened images use those to spin up your nodes and even on premises I think there are CIS hardened Ubuntu and you know whatever available so you can use those and then another one is encrypting hcd I don't think Rancher by default encrypts at CD I'm not sure but it's a good practice for you to in crypto hcd as well like if somebody hacks into your VMS like let's say NX employee or like somebody from Mission Impossible comes into your Enterprise right you hack into your VM but you look at the hcd it's encrypted but of course if it's Mission Impossible they decrypt it but in real world I think they'll find it hard so yeah and I already mentioned this Mount Your Secrets as files and not as environment variables because if it is environment variables what happens is if somebody hacks into your VM they are able to actually uh list the environment variables via the process there's a way you can do it but if it is files then actually you can't do that so that's why it's recommended for you to mount your secret as files and also don't keep the kubernetes secrets just use something else like hashragob key Vault or something else or even Vietnamese Secrets is fine so yeah uh I'll just do a quick round off on kubernetes are back I think there's another talk coming later on oidc and how to connect with kubernetes so just the basics in kubernetes there's nobody like doesn't have a concept called users so users are managed externally so kubernetes don't have a resource called users it's a matter of matter of you know where we create a key key pair and then we authorize it with the ca of the cluster sign it off and then that person who has the key pair can actually access the cluster kubernetes has a concept of service accounts in which using the service accounts you can provide some roles and Etc and which can be mounted into your containers and then you can use kubernetes API so when it comes to permissions uh we have roles and role binding and then we have cluster roles and cluster role bindings as the name suggests role is binded with the namespace so if you have a certain role like read Secrets it's only available in that namespace and when you do a role binding for a user or a service account they can only execute that only in that namespace whereas cluster role is cluster level and if let's say me Nilesh I have the cluster the cluster role of reading secrets if I do a cluster role binding to me then I can actually read secrets of all namespaces but if I do a role binding with the namespace attached I can only read from that namespace so there's no way in kubernetes you can do like Nilesh can only read these these these namespaces there's no way if if you want to do something like that you have to do roll bindings per namespace that's how you do it and this is an example I just gave an example too so just quickly go through John has a role binding so there are two secret roles for each secret role and read write secret cluster row and then on four namespace John has the rate secret role buy it into Jones so he can actually read secrets in the full name space but if you look at Jane and admin Jane is connected to the cluster role with the namespace bound role binding and admin has a cluster all binding to the cluster role so as you can see with the red markers that admin can actually read all the namespaces whereas Jain can only read and write into the full name space so that's it all right so we are at the end of my presentation so final authorities you know kubernetes it's not new now but for a lot of security engineers and Etc it's still new they are figuring it out and there are vulnerabilities exploited every day and if you are running Cloud managed kubernetes you know you only have to worry about your application security everything else is managed by the crowd control plane and if you do run with you know proper best practices and proper developer self-discipline you should be all right so that's the end of my presentation thank you so much any questions [Applause] right so her question was uh I think Port security policies versus open policy agent as far as I know with kubernetes 1.23 or two two one word spot security policies were deprecated and they are also uh enabling you to work dynamically with or something like open policy agent uh to have some set of rules and then go through that because I believe pod security policies were a bit limited on scope that's why they deprecated it so now the kubernetes community kubernetes also recommending you to use something like Opa to have set of rules and then use their gatekeeper to enforce those uh with the web hooks that I explained in the slide all right any other questions yeah the intermission between the replications and the rapid requests comes law will be very high Global okay so in that cases when we do for each request is more so will it reduce the load of performance integrate so how we can increase the entry more if the performance is very important to their English report right so interesting question uh so his question was when you have multiple hcd nodes and when you enable encryption uh and then if you have a lot of load coming in and if performance is you know your core requirement how do you do it like whether there's a problem so I personally like there are limits in hcd as far as I know as well and of course encryption and decryption would give you a performance a slight performance bomb uh one as well so I actually managed the Sri Lankan government election kubernetes clusters uh so we did encounter that as well because on Election nights there are a lot of requests coming in and polling and whole food so uh but here's the thing it's a trade-off either it's performance or Security in the government scenario it was security for us so the performance bump issue was a trade-off that we had to endure so there are sometimes there are there's those are trade-offs there's no magic solution for stuff like that one more question here so so uh these security things right so let's say the read-only and all these things even if I said how can I monitor that all my nodes right let's say if I'm running a big cluster or multiple clusters how can I monitor are there any tools from which we can see what are the open security issues that are right because let's say part of the teams are implemented and some don't right how can I manage that yeah so this is where the build versus buy thing comes into play so there are tools that actually allow you to do that but you got to pay for them one one thing is I don't know whether I I think I can talk about systic rights and cncf1 so systic is one of such uh tools then there are other new act and few others as well that does it for you um but if you were to do it yourself using only open source then you probably need to have a good platform engineering team with a cube SEC sorry um setup open policy agent setup and do continuous scanning because let's say I scanned my image today and it passed and it's now running on the cluster and then you haven't done a deployment for about a week or a month they have a lot of remember the lock forger incident lock 4G was an exploitation that has been existed for a while in Java so you have to continuously run static scans as well against your stuff yeah but since dig it will automatically do the live monitoring so insist dig yeah can we see a live monitoring will it automatically keep looking for that as far as I know it does okay so don't take my word for it check it out but as far as I know they do got it okay thank you okay yeah thank you all right thanks thank you thank you so much but [Applause]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 1,082

Rating: undefined out of 5

Keywords:

Id: b4xNAlVsdxg

Channel Id: undefined

Length: 30min 0sec (1800 seconds)

Published: Thu Aug 17 2023