Attacking & Defending Managed Kubernetes Clusters

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to another episode of cloud security podcast with virtual coffee with ashish and we are back again once every week we talk about cloud security problems and if you are new to this channel consider subscribing if you are into community security cloud security and anything related to cyber security really and i see magnus already here on the youtube stream thank you for coming in magno for before we get into the stream and before we start talking about attacking and defending manage kubernetes cluster which we will get into shortly i need to thank a few people who have been responsible for making this show happen i'm going to start with exoneus and then bridge crew and i'll see you back in a second hey ashish and cloud security podcast listeners thanks for giving exones the opportunity to sponsor the show exponents does exactly three things by connecting to existing data sources externally this gives customers a comprehensive asset inventory both cloud and on-prem it then uncovers security apps and finally it automatically validates and enforces policies thanks again and check us out at exonies.com bridgecrew is the all-in-one cloud security platform for developers briscoe analyzed thousands of open source helm charts to bring you the newly released state of helm security research report learn their security insights and more at pritchard io csp awesome and uh thank you so much for the sponsorship uh it really makes us i guess run the show and bring some amazing guests as well so i do appreciate it now it's the moment you were waiting for i'm gonna put the music on the key [Music] hey brad welcome thanks for having me i'm like i hope i hope people got the song i i truly hope people got the song so like as to what the song was in case people in case people are wondering it's like we didn't start the fire if you are a young one who probably doesn't know the song you should definitely google it so probably one of the greatest songs of our times i guess uh well welcome first of all and it's a tradition so uh cheers man thanks for coming in yeah cheers cheers thank you uh do you know about the other cup as well what are you drinking uh so this is my son henry's uh souvenir from visiting orlando uh it was uh just a fun cup and just some iced tea it's not really iced anymore it's more like just cool tea but uh yeah yeah i got a little caffeine before before uh yeah i appreciate that uh we definitely need a bit of caffeine to kind of get going so uh i'm gonna start with again i know a bit about you i've been kind of uh i guess online uh stalking you for some time but a lot of people may not so who is brad and uh what how i mean i guess where are you professionally now yeah how you got that i guess yeah i i guess uh you know the easiest way to say it is in in short i've been in the space for almost 20 years now but sock engineer pen tester security architect uh sales engineer security consultant and in the last like five years or so you know building ethical hacking training scenarios later it was on vmware but then on kubernetes and that was in 2016 and ever since then i've been uh in the cloud native you know container security space specifically with a lot of kubernetes focus so uh you know this this uh this past year and a half now uh co-founder of dark bit and uh doing cloud security posture assessments for companies running and focusing on kubernetes specifically in gcp and aws and those managed offerings so uh you know just a small group doing some uh some deep dive expertise stuff awesome and i think i'll definitely recommend checking out the dark bit website as well you guys did a great job of uh making the attacker first mentality stand out there so like yeah i couldn't see you guys for that but whatever what i've realized after i guess running a whole month of communities interviews and like the last four interviews over the last four weekends have been amazing guests we've all spoke about communities and i kind of wanted to get some of your perspective because you uh you also bring in like a bit of a holistic perspective as well as you said you've done so you've done sock you've been a panchester you know kind of different things i'm gonna ask two i guess two different sides of the same puzzle i guess what does cloud security mean for you cloud security um you know being a pen tester in the 2008 and before era you know so witness 2008 it was like the latest uh at the time so sort of stopped there there was sort of a specific set of responsibility and it was kind of common everybody's running everything they're running their run books to build their windows servers and things i think cloud security to me is is having tiers of shared responsibility for different workloads and so as a security professional or as a defender as an attacker it's not just you know here's the infrastructure here's the the data center and this is how you get there's there's different models for each one things that cloud providers are responsible for things that you're responsible for and if you're running a vm versus a container in something like kubernetes versus something in serverless i don't want to say they're worlds apart that's not fair they share a lot of substrate but from a just a what am i bending what am i looking for what am i uh you know trying to do a response process on they're worlds apart they're very different so to me it's it's it's it's a a combination of all those things depending on what you're running and adapting all your processes to the various sets of shared responsibility and i'm glad you brought this up with the various process and shared responsibility because i almost find so many similarity between that and community security so what about cuban d's and for people don't know humanities i think after running a month of communities uh in cloud security podcast i'll be surprised people don't know this but what is communities and what is uh its relevance to cloud native yeah it's it's a great way to run other people's code next to your secrets and data as root uh that's a quote that ian and i came up with at our rsa slash group on talk because it was like when you think about it it's really depending on how you run it it could be considered remote code as a service uh because if you're not taking care of things that's where it is but from a kubernetes perspective it's like uh it's it's like when like why would you use kubernetes like what that sort of fits into this question too i think is like you're running infrastructure and you have patterns that keep repeating and because you are running enough infrastructure you tend to be at some scale where you're like we're doing this the same way or sorry differently but for the same types of apps across our infrastructure we should be doing this in a more common way that's typically when you're ready for something like a container orchestration platform that says this is how we do load balancers this is how we build images and how we ship artifacts and this is how we deploy this is how we test and this is how we validate and all the things that come with it have to adapt and so all the security aspects and the policy and the response and and things uh follow suit but it's um that's what i think of when i think of you know distributed containers distributed bundles of shared processes processes and their shared dependencies uh on multiple hosts is that what would become cloud native as well um i think that's one aspect like cloud native is is to me it's i think people use that definition quite loosely but it's back to the shared responsibility but if if you sort of uh step back from it it's leveraging that level of control versus shared responsibility and ownership to the maximum advantage for that given workload so if i have this thing that just needs to you know respond with the iep of who talks to it like i can has ip you know i could do that in a serverless container for very i don't have ops i don't have to work i just push to it and let it run i don't have to do as much but then there's something like this is my core business i have a very custom driver or i have a thing that that is this different that is my secret sauce i need control over it you might take it down to you know a containerized infrastructure or even a vm infrastructure just to have max control over it and that granularity oh i love it and by the way i just want to quickly shout out to you your support crew that's kind of came in which has got dan pop in here as well yeah and i've got magno as well out here supporting you i saw your question as well magno i'm gonna uh come chat come i do have that question so i'm gonna come back to it uh come to that in a few minutes as well okay but uh we're talking about attacking and defending and uh you would i do feel the best way to learn security is to learn how to attack it then then you kind of oh maybe this is how should i defend because it's how it attacks so starting to talk about attacking community security like what's usually your approach for pen testing communities clusters yeah you know it's funny the using the the literal word pen test is i argue that something like a and this is this is going to sound self-serving but a security posture assessment would be akin to a vulnerability assessment that you might do ahead of a full-on pen test to validate that you fixed a lot of the low-hanging fruit and things i argue that a security posture assessment where you're looking at the config and metadata is step one and then pen testing is a validation of that process so like from a config standpoint i'm looking at external attack surface how are the nodes exposed how's the api server exposed or the components that run your cluster exposed which apps are exposed via load bouncers and sort of that attack surface right and then stepping inside the cluster then it gets very interesting because you know running as a container what things do you have what permission do you have do you have our back access to the api server from the pods do you have network policy that's preventing you from moving laterally and moving around inside the cluster how well are you preventing or defending uh access to the metadata typically associated with cloud instances there's credentials in that metadata that give you access to the broader ecosystem so there's there's kind of uh different scenarios so the outside in is pretty straightforward and whatever apps are exposed then you sort of move into your standard web app or or application security models from that point because you're looking at you know the api interfaces and authentication authorization that kind of stuff but when you're inside the cluster it's like i i always envision the what if like what if a container is compromised and i have a shell on just pick a pod that pod what's my world view what things can i see what things can i do can i gain access to credentials that are sitting there can i hit other services inside the cluster that are you know different teams or a different uh you know name space that kind of thing um and just seeing where that lateral movement happens because it's quite common unfortunately oh i love what you said so it's almost like giving you a viewpoint from the port but also like i guess to your point of the two views one is from the outside in yeah where this is where you hear about that tesla api server issue or i guess not issue really but like the use case that happened over there i think offline we were talking about that team tnt one as well was that similar yeah uh shout out to magno uh and and folks at trend uh a really nice blog post a couple days ago uh where they're talking about sort of a a more sufficient i wouldn't say like incredibly sophisticated but more sophisticated than than what i've seen in the past uh attacking the the kublet uh the the old kublet exploit which isn't really an exploit it's just kublet not having off uh authorization enabled uh and so you know tools and techniques that are taking advantage of that and then pivoting and scanning and trying to hit other kubelets and then run a monero miner and using irc to communicate etc so um that's that's some interesting things that are happening but that's that's sort of attacking that's why i say that first step is what are my my components is my ssh on the nodes is kubelet on the nodes the api server on the the uh the control plane nodes uh are those exposed externally because that's where that attack stops uh from the outside in that doesn't mean it can't happen from the inside to the inside but just from a my mass scanning the internet and i see a port 10250 it's probably going to be a kubelet and you can ping it just to see if it has uh authorization disabled really quickly and it's it's high confidence and and low effort from that point interesting and so what so we kind of went into the recon phase as well when you were talking about what does your view look like from the port out and what else you can do and because there's so many services within like that just pick a component you pick a control plane so many components over there you pick a node there's so many components over there like what are some of the low-hanging fruits that people should be kind of looking for each other common yeah absolutely so that that component configuration i talked about i'm talking about like the flags on the api server the flags on the kubelet and those types of things that's step one step two is probably our back i know of someone i'm not gonna name them but they have automated scans using uh you know mass scanners that trigger on api servers that hit api v1 secrets meaning there's a misconfiguration of our back and so if you see that you know it's our back isn't enabled correctly or somebody purposely un you know opened that door and that means basically your cluster admin and that cluster so our back misconfigurations is the second thing just reduce the amount of stars uh on the the rvac policies then it's admission control network policy and then i would say secrets management and admission control being the the the the biggest thing that prevents a developer from becoming your lead sre in two commands right you don't want them you want them to stay in their name space or you want them just to work on their workloads you don't want them to be rude on all your nodes uh just because you allowed them to to run a host path root pod or something like that so that's where mission control comes in actually that's a good point because i just reminded me that in the first few conversations that we had we kind of went into api services low-hanging fruit we even spoke about uh i guess container security as as a low-hanging fruit as well depending on way downloading container from yeah um the our back is something that we haven't we haven't really spoken about because we spoke about network policy and port security policy as well but i'm curious about the whole our back pc when you say rbac is that more our back for the node or is it for the cluster itself like what i mean you don't mind unpacking that a bit yeah so the kubernetes uh authorization mechanism is role-based access control so the api server so can i get pods can i list pods can i create pods verb resource right and then so when you're creating cluster role bindings or role bindings which are namespace specific you know you're you're granting specific api actions to a user and if you're giving create pod or view secrets you're giving very powerful verbs you're giving them lots of access by default you need to actually for the create pods specifically you also need to have admission control to back that up so yes you can create a pod but it only looks this shape it's not a very privileged one that escapes to the node it just does the web server or it just does the you know the caching tier or whatever it's supposed to do right but the view secrets is the one that is often hidden in stars meaning a helm chart and not picking on helm chart specifically but often what happens is somebody installs a helm chart and doesn't audit what is happening and they're installing cluster role bindings to be able to make the thing work the operator uh the get ops operator the thing that's deploying things on your behalf will typically be cluster admin so it'll have a cluster role binding of like verb star resource star right and so you're you're just giving it full access to everything inside the cluster and where i see that sort of overlap is and this is actually a real use case from one of our clients they had a git ops operator so something that's responsible for deploying and it has to have its privileges no doubt and it was in a separate namespace and then they give view secrets because they were using sealed secrets in this like front end web name space we'll call it but what happened was there was a performance problem and they wanted to break it out so they installed multiple versions of these one in each namespace and it just handled its piece but what happened was is then that crossed over so the developers had view secrets and they installed the gitops operator in the same namespace which installs a cluster admin bound service account and service accounts are stored as secrets inside that namespace so they basically gave them access to a cluster admin token it was a complete oversight it was like oh my goodness had we realized that there was that crossover that's what i mean by that misconfiguration it's really easy to install multiple operators and then just go wait i'm just giving them they're viewing secrets in just that name space it's totally fine yes on the surface but you have to look at what else is inside that namespace to make sure it's not a secret that contains a cluster admin bound service account token and that was like the one step for the developers to become cluster admin and then you have other tiers where it's like now that they're cluster admin they're basically root on the node so when you think of cluster admin think of root on the underlying node because you can bypass policies you can take off pod security policy you can disable admission control you can clean logs as well once your cluster admin you're able to access all the credentials that are attached to all the nodes so if you have cloud metadata that maybe makes the control plane nodes and this isn't going to happen in in managed clusters necessarily hopefully it shouldn't if you are able to run workloads or get access to the api server or the control plane typically those have higher im privileges in your in your cloud environment things like uh creating vms attaching disks uh you know doing also like like having full control of some aspects of iem for for that account that's where it gets very very interesting so you have a two-step misconfiguration here you have the ability for a user to see secrets which makes them cluster admin which makes them ownership of all the nodes which makes them ownership of all the credentials attached to the nodes and that's how you walk and escape out of the cluster in a lot of cases so that's like an example of like how you think why i think of you know this this pod as the starting point because it's either compromised from a web app like it's a it's a front-facing web app and it gets compromised attacker has a shell or its developer is coop control executing and feeling malicious today they woke up and they chose violence or it's a malicious dependency that ran from your that comes in from your container and it sits there and now it's running all of those start from i'm an unprivileged container or pod inside the cluster and by that i mean like it's just sitting where what can it do does it have access to tokens does it have access to network you know where is it going that that's why that scenario is so common in a lot of the kubernetes security talks you're like let's assume that the attacker has a shell in the pod that's because that's the start of so many of the scenarios ah right so in a way it's probably your achilles heel you know in a lot of ways it gives you that it is something that you need to function but is also part of i guess um the the beginning of attacks but maybe another way to put this is is sorry we're gonna say something no i said yes i said i agree perfect now i was gonna ask in terms of with the manage cluster and unmanaged cluster maybe if you can start with the difference between the two and what the low hanging fruit be different if it's your point if it's a managed cluster in a cloud environment but say gcp or aws yeah so a managed run by say a cloud provider is typically they make the the dividing line that they don't give you access to the underlying nodes of the control plane the control plane being things that run the api server the controller manager the scheduler and because if you can get access to etcd and this is not very common but it's just that's the database right so like think of the analogy of a web application and it talks to a mysql database well if you could just go into the database and go yeah the record is xyz the web server will trust it and so you know you can make yourself admin if you just go admin equals true in a select statement then the next time you access the web i was like yeah your admin right similar thing lcd is the backing so you never want to let anybody get to etcd because it stores the secrets and it stores all the configuration that would give you the ability to become a cluster admin so they take those keys away from you they don't let you have access to those for very very good reasons so they give you access to the nodes because that's typically where most workloads might need something like you need privileged access to a gpu or a container storage interface driver that's doing something cool with you know fiber channel or whatever you you need a little bit more control you you can give them root on the nodes but not the control plane nodes that just the worker knows where the workloads run and so if you are running your own kubernetes like if you're running something like cops i'm not picking on cops because you're managing that that control plane you have to be very careful not to allow workloads to run on the on the control plane nodes you have to worry about the api server you have to protect it from rate limiting you know denial of service attacks all that things if you expose that api server you own that responsibility of the most important pieces of your cluster so if you're if you're saying i don't really need that much control over the the control plane i'm happy to give that management up to aws or gcp by all means do that it's worth the 100 some odd bucks a month for a cluster because they're the sres are handling that for you they're patching it you know they're taking a lot of extra precautions that you don't have to so then your focus is in a managed cluster you're focused on my node security and you're making sure that workloads don't uh interact with each other that shouldn't not also roping in all the stuff that you need to do necessarily to protect that that api server and std and things interesting so maybe how about we scale this you know how cloud is all about let's scale everything let's just make it like 20 000 deployments in a few seconds so it if we talk about multiple nodes scenario and i'm thinking in my head so it's a managed cluster i've got obviously a uh a port definition of what templates i've defined for the deployment which only has one node but i've got multiple nodes what how does this attack scenarios kind of scale like what are some of the examples you've seen in terms of like a large communities deployment in a in a managed cluster context well so i mean scale is is in vertical horizontal it's like how big your pot is like how much cpu and resources and then how many there are uh one of the so just in general i mean scale can go really big really quickly i've seen somebody go from a 10 replica deployment to a thousand relative deployment and i've seen that com that node auto scale and eventually topple under its own weight because it wasn't configured correctly but it can do that with with the right help so if you're setting requests and limits correctly saying my pod uses one cpu and one gig of ram and i'm correct when i say that kubernetes will handle that for you very gracefully uh what's interesting from an attacker's perspective is the privilege you get by being cluster admin means you can be extremely efficient at then getting access to all those nodes because the kubelet and the api server are fundamentally a uh you know in sync i'll run this command for you it's either a container or a new container or a command on the host so you can you can if you get cluster admin you have access to all of that through a wonderfully documented multiple client library api that lets you get access to everything that's in the cluster so coupe control is your best friend but you can write any code that you want if you get cluster admin if you have the permissions you can be extremely efficient at moving around and being a part of every single node and seeing what's on every single node and i think that's that's sort of the fun aspect of it too is like using the control plane if you get those permissions using it as an administrator would because then you blend in number one and number two because it's the most efficient way to harvest all the secrets and all the data from all the nodes is to use its own literal orchestration against itself uh it makes me smile else because i'm just thinking what about now the the it's interesting that the cloud provider takes away access for apis and probably most of your control plane but if you're still cluster admin you just still have like you're scaling your privilege across the note just have a a crypto miner in all of them but yes maybe i'd let's just talk about maybe multiple clusters then like in terms of like the whole uh lateral movement and everything yeah like what were some scenarios over there and how would you why i i guess abuse them for like a better word in a managed business yes so i'm a big proponent of considering the cluster like the multi-tenancy debate there's really not a lot of accessible or easily used hard multi-tenancy where it's like untrusted code is okay running inside your cluster there has to be some level of trust and typically soft multi-tenancy is like this team and my organization and this other team in my organization all who i have legal control over if they do something nefarious that is typically what we're talking about by multi-tenancy but they're still blast radius and what you just described as you know this whole scaling thing i don't want necessarily to have one compromise make that attacker have cluster admin and root everything in my entire organization just for fault domains just for regional outages or most likely you know availability zone outages and a cloud provider you might want multiple clusters because they might be different shapes too i might have a gpu workload over here i might have a jobs a batch jobs workload over here i might have my front end but those things should never touch this is touching back end data lake customer don't share that in the same one that is front end accessible that runs my you know com website or my my api that i let anybody in the internet use you want those separate so i see using clusters as the blast radius of this think of the similar uh mindset of why would i make put vms in a different vpc oh well these are my dev instances well these are my prod instances well that draw a dotted line around that vpc and say these things go together that should probably be in the shared cluster maybe maybe not but definitely should not be in the same cluster over here which is the dev things definitely should not be the same thing over here which is my data lake thing like that's how i look at it so sharing or i should say separating that blast radius by separate clusters is the best way to go and in gcp specifically i always give the advice of one gke cluster per project because there's too many opportunities for im crossover you know compute admin gives you actually access to all the nodes underlying gke because they're just gce instances like those types of things you don't really think of like the shared logging patterns where all the logs from all the clusters go to the same logging destination those typically are are things that you might not want shared across teams or different things you wouldn't separate those in projects so in aws would be different vpcs and different cloud watch and then in gcp it's in different projects oh that's a good advice actually and thinking about all the cloud providers which actually default have footprints collected as well for everything that you're doing and i'm just imagining a scenario i took your advice i started looking at becoming how do i become a cluster admin i became a cluster admin and i'm trying to deploy clusters across the board but what's my get out of jail card i guess how do i delete my footprints how do i what how do i delete my footprint for what uh or the breadcrumbs that are left behind is there things that i should be looking out for in kubernetes clusters that record me doing things i guess how would i remove that or what should be my thinking there so like you're an attacker right and you're leaving breadcrumbs or you're a defender and you're looking for breadcrumbs like yep if you're an attacker there are some if you're running in managed clusters typically if you're configured correctly in other words you're sending the audit logs from the api server and you're shipping logs from the nodes most times that's done for you it might not be perfect it might not have enough granularity but you should be able to quickly enable that that is one of the key things that is different in a lot of environments that an attacker can only delete what's on the nodes and they might not be able to delete what's in the control plane because it's already been shipped off so as a how do i clean up after myself it's really hoping that the organization is not looking at those historical logs after the fact right that they're just like oh they're live troubleshooting and they don't see anything so they don't know what's going on but the record has already been you know logged off and said exactly what happened this was the manifest at this time from this service account etc so i look at it like there's only so many things you can do to delete the breadcrumbs you can just not show up in a coupe control get pods or get resource or whatever the the item that they're looking for but if you're auditing the right things you're you're like an eks there's there's six you have to you have to enable them explicitly right you want these all right you want those shipped off gk will do it but you actually need to add additional ones to get all of them it could be verbose and it can cost extra money and that's why they're always like well these aren't on by default because it costs a lot of money but that's your literal record of what's happening inside your cluster it's another api with all these features you want every request and every uh you know chance you get of what's happening you want that shipped off the the cluster for you automatically then the breadcrumbs are hard to delete because they're already gone they're already somewhere safely and to your point if those six options were not selected then probably not being logged anyway so yeah you've already minimized your breadth comes in to begin with yeah like i'm not picking on eks specifically but in a default situation very little is logged from what's going on inside the cluster so there's a lot of chance to be silent in stealthy uh but in in if you turn those on it's go it's going there unless you start getting access to go delete those logs from from where they're just they're stored like an s3 or in cloud watcher uh those types of things like you have to go another step further to go delete those logs from the record oh i love what magnus said as well no log no [Laughter] another few layers that i've always considered from an attacker's perspective and keen to know your thoughts on this as well with the ssrf and api metadata that exists in cloud service providers as well and all these services that are being producted like the supply chain of communities as well like a lot of people deployment would let's say oh i'm going to use a ci cd pipeline to create my container for a container image and i'm also going to create one for create a ci cd pipeline for a kubernetes cluster as well we had mark manning uh earlier in the month and he was talking about a cluster where 3 000 developers were logging into one cluster like and i'm like oh i imagine that in a in a cloud context and i'm going oh my god this is insane but what are some of the other moving parts in a managed cluster that people can also look at for either from a recon perspective or uh possible exploits i guess uh wow that's a that's a lot of questions there's a lot there um to your point about what mark manning said i love mark um just in general i'm a big fan of not allowing and this is part of like a maturity journey if you can get to a point where you're developers only your like five or you know small number 10 sre cluster admins whatever infrastructure platform teams those are the folks that are using kube control in a break glass scenario like oops something's really sideways let's go debug this if that's the only time you're using cubecontrol that is an optimum state if the developers are literally interfacing with a code repository that says i want to bump the version i want to you know check in code once it gets approved it gets put in a testing branch and the automation takes it from there and they don't have code control that takes a ton of complexity out of the problem that takes a ton of attack surface because our back while extremely granular is incredibly hard to do temporally i know there's some solutions out there but it's like you either give them crap it's hard to do like a i need to break the glass for this one time type of access that you're like you give them crepe pod you give them view secrets and that's that that's that the our back is sitting there it's there are some identity proxies that can do some fun things there but that's a lot of complexity uh just for this but if you can get the developers not working inside the cluster not caring about that level of abstraction you're winning because then you can change things out from under them uh in a good way like you can upgrade your clusters you can move things around and they don't have to care and that takes away a lot of attack surface of you know giving them giving them that access what that means though is that you have to give them the tools the feedback loops the observability to feel just as confident about that deploy if it goes sideways or if it's working or is healthy as you would if you got it from a cube control command it's it's like a cheat it's like oh i can see it with a crew control that's what we that's why we need it you're like no you need to see that that deployment is healthy you don't necessarily need to run a specific coup control get pods so if you can get them out of the cluster you reduce the all the just the complexity of i can't imagine 3 000 developers in our back we won massive group and it's probably like cluster admin boom there we're done that's right everyone just yeah because it's hard to manage right so yeah and i wonder if that's a debate as well like a lot of security folks maybe are talking to sres or cube cluster admins whoever you're going to call them where everyone is probably talking about and for contextual is kind of like the ssh of the world uh for virtual machine world for people who may be listening in a lot of people do ask for queue control because that's what all the guides online talk about that's what everyone talks about and suddenly security people become the bad people because you don't want me to have keep control like uh well that's why i see a cd pipeline because that's why you go through there and that's how you test it that's why you have the dev environment test environment so um i i i hope people have healthy debates about this and how they land on this but that also is a good segue into probably talking about more on the defending side of things as well then maybe we'll start with something simple and say unless we are a startup and we're thinking about okay we're going to start with cubities but the first question is is right for everyone no that is not it is it is a i love it because well i love it because of a lot of reasons but once you get it it's hard not to see that pattern everywhere and want to apply it everywhere if you don't if you don't see those patterns done poorly first you might not understand why kubernetes was written the way it was right you're you're declaring state and then letting it do its thing it's not you're imperatively like you're acting on things the commands must be in certain order etc like and then it works it's make this pot i don't i don't have to know anything more about the implementation details store the secret store this config map like that that is a clean way and then once you step back from it you're like well gee it handles load balancing it does this it does this it it starts uh making standard patterns that apply to all these problems but you might not be big enough or have those scale patterns that require it you might just be like i just need two vms in an auto scaling group i just want at least one up and i wanted to talk to this rds instance perfectly reasonable three-tier stack like do that thing if that is the simplest thing that achieves your goals great but if you have 50 of those and they start to drift a little bit or they start to have you know these 10 get upgraded but these take a little bit you start going well gee we're having to run across all 50 of those things and do the same things over wouldn't it be great if we had a standard vm that does this and we'd have a standard way we'd set up our databases and you're like gee it'd be really cool to have that in like a little snippet of yaml and we could just say go make the thing that's where oh oh kubernetes there it is that's why you like see that pattern and it comes to that so maybe at a startup your focus is on getting traction getting customers getting users whatever technology it is to to do the thing it's not let's build it on kubernetes and then they will come that's no that's not going to happen so i would argue you will know when you're at that problem space when you start seeing those patterns of doing things poorly across multiple sets of infrastructure that are like man we do everything kind of the same but not really it'd be really great if we standardize this that's when it might make sense for a container orchestration system interesting and i'm thinking about all the large enterprise that may be listening into this and going well it's too late for us the cancer like the horse has already left the barn yeah so for lack of a better world so for people who already have kubernetes clusters in their environment uh thinking about them for a second over here maybe is this right for them maybe because they're at your point they're at that scale they've been doing some container orchestration for some time yeah so so kubernetes has a funny way of bringing out organizational problems whether you like it or not it covers so many areas it covers networking it covers load balancing it covers vm patching it covers container build pipelines it covers security it covers auditing all these things are collapsing into a shared infrastructure so if you have pain it's going to be where your weakest and it's going to bring it right to the forefront you're going to have what do we do with these logs i don't know where do we ship these well we never really shipped it to one central place with a standard way none of our not all of our applications log the same way and structure json well it's gonna bring that right to the surface because you're like how do i debug this thing well this app sends it like in sysloggy format and this one's oh now we have to solve that now you're bringing all this technical debt that you should have solved or should be in the process of solving before you get to the point of kubernetes but if you're already on kubernetes it forces you to be good or at least a decent level in a lot of areas that maybe you weren't really ready for and it then changes things on top of it so the security team is we always like to pick on like uh what is kubernetes uh i i guess we have some of those in our infrastructure how do we pen test it how do we red team at how we defend it i don't know like they're they're having to re-learn a threat model of a completely different way of organizing uh infrastructure and uh you know how do you defend that how do you attack it how do you do incident response on it all those problems surface because maybe you weren't really good at doing that on standard vms or bare metal and now you're doing it and you're doing it in something that has moving target pods come and go uh nodes come and go load balancers load balance to nodes and then they don't load bounce to other nodes and what happened when i don't know we never really kept good luck like it it just it's a it's a trickle-down effect of if you're not doing all the right things you're going to feel that pain so people perceive a lot of complexity with that because it's relearning and rebuilding muscles that maybe you never had or you did differently or did poorly but if you do them correctly if you follow the happy path and you do well i'm just going to send all the logs to cloudwatch i'm going to send all the logs to stackdriver and i'm going to install falco and i'm going to do this i'm going to install this admission control if you start doing the things right you start going oh i get it i see it i'm i'm seeing why this is and i'm going with that happy path you'll be you'll be better off for it and i think your organization will need to mature in all those buckets like all those categories cannot you can't have a bunch of immature buckets and then a bunch of mature buckets you have to bring them up otherwise you're going to feel that pain oh i love it and i think to your point about in all of that amazing gems you dropped in the one thing that stood out for me is that things that used to be individual roles in a threat model now they combined in many cases yes yes like what what is like the role of legal in your software bill of materials like are they looking for gpl or lgpl well we ship so darn fast that what they looked at was six months old right like that kind of thing like those types of folks will want to be like well how do we re rewire that process so that we know that we're not shipping software that we're not licensing just as an example versus you know in vulnerable stuff versus you know stuff from uh other packages that are old or out of date like all those problems surface because you're shipping so quickly and you're leveraging automation you need to wrap all those supporting processes and think through them as if it was running we're deploying 10 times a day how painful would it be for the legal team to see this well maybe we should just in the cicd pipeline go oh we'll ship off the bill of materials to them and if they ever want to look it's in the storage bucket you know what i mean and then just show them here this is how how you would you would look and check if you wanted to check because they're they're not really going like they're not going to have a lot of freeform queries for example so you sort of see that pattern ask what they're using the data for and go hey can i just send it to you in this place and think through all those types of integration points um like that's that's why it's like forcing everybody to be that quick where processes might take days or months and those don't catch up that's where you see that that that you know not meshing of the gears and the abrupt conversations that are like you're not doing it you're doing it completely differently you're not shipping the logs and you're like well we're running this quickly we're doing this thing because it adds business value this is why we're doing this we can ship features our customers are buying this is why we're doing it you all need to get on board that's a little bit awkward as a conversation to say it that way but basically that's what the business needs to do it needs to mature all the supporting processes around it a bit awesome and i love it i feel like i can talk to you for hours about this but i'm going to switch gears and switch to some of the questions that are coming in and thank you for the patience real everyone as well vineeth and magna especially i've got a question from vinita over here which tools would you recommend for both attack and defense coup control jq curl uh that's dead serious by the way uh yeah i was not smiling at all yeah that's actually that's true without recommending any vendor out there yes um i'm a big fan of uh so like there's a couple things like if i'm attacking there's there's cube bench for cis stuff there's coupe hunter for penetration light penetration testing stuff there's pirates um there's a couple others and i'm blanking on them but there's some some that are more attack or red team-esque there's one that just came out that was mapped to the mitre framework that was kind of neat to see um but then there's like from the malicious activity detection like this is one of those things that doesn't ship with kubernetes it's not part of its concern is is what is running in that container good or bad you know from the start or sometimes perspective from a runtime perspective exactly i'm a big fan of the falco project like that is uh you know you can get a ton of value very quickly for a very inexpensive price um you know just the price of being a member of a good community that's what i would argue the cost of oss is it's not free it's being a part of that community and helping contributing back but like that gives you a ton of visibility into nefarious things that shouldn't be happening um and you're you'll want to add that to your your your defensive if you're running something else uh you know a paid or a vendor version that's you know that's fine too it's just something that's telling you that what's happening in this container is suspicious you should take a look um is is huge i mean that that i don't say that's it but like i i tend to look at kubernetes as a part of the whole it doesn't exist on its own it lives on some metal somewhere it lives in some cloud somewhere has extremely tight integrations in a lot of cases so it's not just kubernetes just remember it's everything it's the vms that's the kernels it's the the components it's the containers it's what's in the container all those layers of the stack typically require different tools or different perspectives so it's an inventory problem first so using standard cloud inventory tooling standard posture management tooling to understand where all the things are and then what's inside kubernetes is building on top of that awesome great answer as well and actually just on runtime magna had a question which was interesting for me besides ebpf what other technologies do you think are important for protecting the runtime i mean that's that's the new hotness i shouldn't say new because it's been around for a while but i would argue that in the last two or three years it's seen a huge focus there's number of companies that are focusing solely on being really really good at this uh and i i sort of ebpf is like running kernel level code but sort of in a sandbox that's oversimplifying a little bit but letting you plug in observability and security things of what's going on without having to write a kernel module is enabling that iteration to happen much more quickly and much more predictably across kernel versions and architectures i see i see you know runtime is a combination of what's happening in the container uh how you're you're capturing what's happening if something goes sideways there in combination with the logs that are being emitted from all the various log sources from cloud apis from the api server from the nodes themselves like that all goes into what is happening from a security perspective so you kind of need to have all of that to be able to paint the complete picture that's awesome and i've i've got a few questions from magnus so appreciate the patience as well uh what's next for community security where should we focus on our efforts next you know i think it's tempting to say service mesh and identity and all that good stuff from a risk perspective it's still the hygiene it's still the basics i would argue as we get better with all those basics like i said like exposing api servers no we're not doing that we're restricting layer four uh we're configuring and tuning our back it's not completely uh you know wide open in stars everywhere and then we're implementing network policy and admission control and we're doing basic things we need to do that and do that well before we worry about sort of going up a level of maturity that said the one piece that is getting a lot of talk is software supply chain that's because it's amplified in a containerized environment it's literally like just empty nodes like ready to run random software let me know and i will run it for you so you have to think of like well where's all that coming from that's getting that's sort of like pushing the risk to that part that's the hardest part which is the opsec of all the humans and all the ci pipelines and all the build processes that make up all the dependencies of all the things that we build on top of and put inside of our containers so like our focus should be like defenders should be do the hygiene do the basics do all those things really really really well and then vendors and community focus on how we can make that uh supply chain visible and alertable that isn't completely alert fatigue you know like you scan a container for a vulnerability you get 50 back and you can only patch three of them you're like cool thanks i can patch those three but i still have a glaring red dashboard that says you know 47 unpatched vulnerabilities like that process needs a lot of empathy and a lot of love to make that effective but it also needs a lot more detail and a lot more sophistication to be able to like make that process uh you know get that risk balance for the for the cost in there i i love the answer and the wall of red is real for even though both you and i wearing red t-shirts the wall of water red is definitely real in a monitoring world yeah uh i was gonna say um i it it kind of begs another question to taking a leap from there for a security person to secure this properly it's a clearly a skills shortage as well right it's not that a lot of us came from a non-communities background and a lot of us picked up cloud now this new new thing has come out like oh great i've got to learn containers like oh i've started learning containers like oh great now i have to learn kubernetes as well what else is next uh and i think it's also coming from magno's question uh which i'll get i'd love for you to get into but what are your thoughts on the skills uh conversation for communities i'll tell you what like kubernetes has built upon years decades of abstractions linux you have name spaces you have bundled sets of processes with bundled the dependencies you have this thing called a docker container now an oci spec container right and then you have something that orchestrates them so if you're learning and you start up here and you just assume uh yeah i'll just work on that stuff that's below it later you're probably not going to be able to from a security perspective you're not going to properly develop your threat model you're not going to know why you might know what's happening but you're not might not know why it's like that or why it's configured this way or why that behavior is unless you sort of understand linux primitives understand c groups and namespaces and understand linux networking and all of those things build upon it so if i were to say like where would you start i would start at the start which is not kubernetes not docker not it would be all the way down just like what is root what is a user what's isolation what is the basic uh primitives of of linux before i'd start building on top of that and and as a kubernetes security like focused person the way i got to like and love and fall in love with kubernetes is because i built with it i had to operate it it wasn't like i was just somebody else's cluster that i'm coming along to secure i don't feel like you get enough empathy to know why things are that way we ran a capture the flag exercise in 10 regions 10 one cluster per region aws in 2016 on kubernetes 1.3 with calico alpha right we got really up close and personal this is before our back this is like service account tokens were mounted in every pod that were cluster admin by default we had to work around that we had to protect the metadata api with ip tables rules manually like with the daemon set like we had to think of all those things to make a ctf environment it's just like wordpress containers inside of a namespace but like we need to think of all that multi-tenancy problem space and build it to understand it so like i argue it's really hard to secure something if you haven't even at least deployed a couple apps and try to scale it a couple replicas and like if i change the image and like oh what what if i can do that low balance or think oh the low balancer thing works and then once you get it working then start poking at it then start breaking it because then you'll go okay now i know the happy path this is what they're trying to do and i know why they're doing it this way and then i can go well what if i did it the not correct way or what if i swept its legs out from under itself what would it do how does it behave and i think that's where you have to be curious and be a builder a little bit uh to be able to say i can attack or i can defend this because i think it's really hard if you're it's like your day job is to just be a defender and you've never looked or tried to deploy one of the apps that's running inside the cluster that's very arm's length it feels like you want to you want to understand the workflow of the developers that are running stuff inside your cluster to be able to understand how to threat model it interesting and that leads me to magnus question what was your approach for learning it and what would you have told your past self about how to learn communities yeah it was a it was a team decision uh it was down it was this is 2016. it was between mesosphere uh and kubernetes and docker swarm and we kicked the tires on all of them and the only thing that really worked the way that we wanted to is and we spent nine months building this ctf platform to be able to run these events right and so it was an earlier time i i it's hard to be like yeah it was easier it was similar times back then but it really really was there wasn't a stuff much going on you had pods you had services you had config maps and secrets and deployments pet sets not stateful sets like you had demon sets that were like that those are your primitives so it felt like you could understand all of those things i argue going back to the roots is where i would start i wouldn't start at service mesh i would start at pod service deployment config map secret i would i would go through the cka the certified kubernetes administrator um you know syllabus and really dive in as a builder before and it's a prerequisite before the cks by the way this the certified kubernetes security specialist you have to be a cka first for good reason because you can't secure what you don't know how to administer just get working get running it's very very hard to do that so that's what i would do if i if i was like my job i started tomorrow and i had to be a kubernetes security person i would go go do the cka and i know that's a lot like i would go to that track whatever gaps i had to be able to get to that point that's where i'd go first and then i'd probably look at the syllabus for the cks and if i feel like sitting for it so it's sitting for it awesome so i guess learning humanity is the hard way no pun intended there yeah shout out to kelsey i mean that yeah that repo has lived longer than i thought he would maintain it but early early on it was incredibly useful here's why i think kubernetes the hard way gets a kind of a heart a weird naming uh wrap to it because it's actually kubernetes the manual way but it is quite possibly one of the simpler ways to do it manually by that i mean set up these three nodes download these binaries make this yaml here or make this thing here and start the service okay move to the next node do the same thing for the kubelet on both of those nodes you now have a kubernetes cluster that is quite it's yes it's the tedious way but it's actually pretty straightforward it's like download a buyer and they put a config in and run it that is enlightening for a system administrator a linux sys admin they can grok that i can guarantee you they're like oh that makes perfect sense to me i know what to do now it's just levels on top of that that i have to learn and i think that's the building block that a lot of people would really benefit from i definitely recommend that as well so i'll and i'll add that to the show notes as well so uh thanks for that mac question magnus seems like we've been talking about technical things for a while and i want i want people to know about your side as well and what is brad outside of all this and i've got three fun questions for you okay not too many just three and the first one is what do you spend most time on when you're not working on communities or technology or band fantastic i'm gonna throw that in there uh professionally or personally you can pick whichever one you want uh trying to be a good husband and a good dad i think i think it's easy especially in the work from home and the pandemic to be like just working that extra hour because there's something on top of mind i i really really try to like cut that time off at the end of the day close the laptop and leave the office and and be present because you know you're you're not going to be here life is short and i try to focus on that as much as i can i will admit though to losing sleep sitting up at night like bef wife goes to sleep i'm like good night honey and i'll just be like i wonder how many megabytes per second you could send through a validating web book like stupid things come into my mind i'll write them down i'll go to sleep and then i'll think about them the next day and i'll try to but like that's i i strive for that balance it doesn't always happen but that's that's really where you got to be at like getting ahead is success changes for you as you age i think i like that's how i've everybody around me who i've like grown up with has also had some sort of revelation at some point where it's like define your happiness and then go for that because it's not necessarily kubernetes security all day that's a great job it's a great thing but that's not you that's just something that you do and and help others with but happiness and success uh define that yourself and go for it but typically that should be for me that's focused on on family and friends that's a great answer um and probably a good one for the next one as well then what is something that you're proud of but it's not on your social media that i'm proud of that is not on my social media i would say i mean aside from aside from the the family and the kids like that is by far i say aside from that like after a deep distant third uh is the friends in the community that is the cloud native the kubernetes community specifically like i'm really proud of this community i'm proud to be a part of it and i just hope that it continues on even if it morphs to another technology the people that i've got to know in the past four or five years it's like there's ride or die crew in there shout out to say honk like pop magnet like all these people that like i keep seeing and keep interacting with it's almost always overwhelmingly positive and helpful and i just i just want to give back as much as they have given that's awesome a great answer as well and i can definitely watch for at least most of the people that i'm talking to in cloud native space definitely seems to be a great bunch to hang out with in person as well so hopefully i can do that one day with them yes i would love to oh last question what is your favorite cuisine or restaurant that you can share there's a great uh let's say there's a great thai place near me that we just love and they deliver in like 22 minutes to the door it's unreal uh oh really yeah it's a it's a guilty pleasure no question i do enjoy mexican food anywhere like i i will just i'll go mexican anytime anywhere uh but there's uh there's a chain of restaurants here that's called great american restaurant it's like a sweet water and coastal flats always a quality meal so if you ever come to virginia uh northern virginia you know i'll take you out to one of those places and get a really great burger for for cheap uh i'm looking forward to that one awesome uh well that's pretty much what we have time for and before we kind of drop off the live stream i wanted to kind of um i guess let you share a piece about where you normally hang out for people kind of reach out to you ask any questions like what are your socials that way you normally hang out yeah i mean my dms are open uh unless i get to come you know some some riff-raff but everybody who's who's hit my twitter it's at brad geeseman on twitter i'm also uh b geeseman on github although that's not really his social uh but i'm also brad geesman on linkedin there's there's only one other brad geeseman that i'm aware of that's on linkedin so i'm hard to i'm easy to find because i use my name everywhere so shout out with uh you know reach out with dms or questions uh i'll get back to you as soon as i can awesome and uh thank you so much for coming on board and i had a great time as i was saying we had i had a lot more questions and i didn't go through all of them but this is totally worth it and i hope uh to have you once again at least to talk about more things that we can be doing in community security but i will uh look forward to stay talking to you again soon brad but thanks so much for this and for everyone else who's tuned in thank you so much for joining us as always every weekend and next weekend we are switching another topic and moving on to bug bounty and google cloud security we'll get to know a bit more as we kind of go through this in the week so that next month is focus on that and if you want to know more about this and a lot more on the topic that we talk about feel free to subscribe and follow on whatever platform you're watching this on and uh thank you brad and thank you everyone else coming in thank you
Info
Channel: Cloud Security Podcast
Views: 161
Rating: 5 out of 5
Keywords: Attacking & Defending Managed Kubernetes Clusters, brad geesaman, cloud security podcast, kubernetes security attacking and defending kubernetes, attacking kubernetes, kubernetes security, hacking kubernetes, hacking kubernetes api
Id: xD5nyk-WaXk
Channel Id: undefined
Length: 62min 55sec (3775 seconds)
Published: Sun May 30 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.