Connecting Kubernetes Clusters Across Cloud Providers - Thomas Graf, Covalent

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right good morning my name is Thomas Groff I'm the co-founder and teacher of I surveillant that's the company behind solium the Solon project I'm here to talk about how you can multi can connect multiple kubernetes clusters together across cloud providers Who am I I've been a kernel developer for a long time spent about 10 years at Red Hat mostly focusing on working security and PPF so I've been living low-level in the kernel for a long while and started the cilium project and founded a company about one and a half years ago to drive saloum forward the goal of this session is to run services across cloud providers specifically what we will do is we will set up a cubanelles cluster in GCP will deploy services will set up a second cluster in eks will deploy in the same service and we will establish service load balancing between the cloud providers between the clusters and we will discuss and show how you can set up encryption between the clusters so that's the goal of the session and one potential use case we'll talk about others is that if a back-end in one of the cluster fails and the zombies are coming in it automatically load balances and uses the backing of two across so that's the HJ use case that's one we'll discuss orders so that's the goal of the session if you're not interested in this you can you can leave the room again what tools do we need we need cuban artists of course i think everybody is familiar i hope if cuban artists will repeat a little bit about the basics that we need we will need infrastructure api's and i'm trying to be as generic as possible we need some form of BPC with routing a virtual network context all cloud providers support this your on-prem infrastructure supports this and we need an IPSec compatible VPN solution all clapping all clapping call this VPN gateway can't all of some sort and typically all cloud providers support iqe won some support i Cretu so that's that's what we need from an infrastructure perspective if your cloud provider if your own premium for structure provides this you can use the solution you can set this up the last piece we need is psyllium what is psyllium Selam is an open source project apache licensed it's a little bit over two years old it's based on BPF a new exciting technology which we'll talk about just a little bit the next slide we provide several things first of all we are CNI plugin we provide networking we provide connectivity between pots we implement kubernetes services so we're replacing to proxy where 95% they're to completely rip out q proxy BPF is an alternative a better alternative for IP tables which is the default implementation of Q proxy we're using the power of EPR for example to use to do multi cluster voting we also provide network security so ability to define what pots what services can talk to each other we use an identity based mechanism to do so we are DNS aware so we don't only allow you to define what pots can talk to each other we also allow you to define what DNS requests a pot can make we allow you to specify policy that a pot can talk to this DNS name and solar will automatically track the IP is returned by the DNS server and then the pot can only talk to that specific IP address which allows to in simple terms establish policy egress to services that are running outside of the cluster we are API and data protocol aware that's where envoy is coming it will talk a little bit about that as well so we know whom they understand services talking talking to each other we also understand the HTTP REST API calls that part are making we understand Kafka Cassandra memcache T so you can define security policies to not only allow services to talk to each other but for example to say this part can talk to my Cassandra but it can only do a select on that table and everything else will be denied so we allow you to do API where security in one-point-four will also establish transparent so you basically turn this on and all part-to-part communication inside your cluster will be encrypted now photo release will actually allow using this technology and integrate this with SEO and envoy to enforce em TLS using kernel acceleration the last piece we have envoy initial integration we do Sitecore acceleration so as you deploy something like sto you can we can accelerate the communication between your party or applications and the sidecar proxy and we have a very exciting new feature coming up which is called transparent as a cell visibility we can actually provide envoy and sto with visibility into data even if your application is using SSL encryption we will not have enough time to discuss all of this I will focus on the multi cluster but I think a lot of people have heard about selling for the first times I wanted to give a quick overview what is BPF that's the official semi official logo BPF highly efficient sandbox virtual machine in the linux kernel that's the accurate academic origin what it really means is we can it's making the Linux kernel programmable at native execution speed and that's really really really powerful this is driving a lot of stuff that's happening inside of tunnels Chrome kernel right now the technology is jointly maintained by engineer and engineer from us and an engineer from Facebook and then we have many our collaborators across the board we'll talk a bit about that who is using vpf right now you will see a better view will will be if I show you the list of contributors but I think a very prominent one is Facebook Facebook has just presented their latest developments in how they leverage PPF and they have actually been using vpf for all production traffic publicly exposed on the internet so every packet that went into a facebook data center on the last one a half years went through a DPF program there are many many other use cases Facebook is using is Google is using this for traffic optimization QoS that just started on implementing network security Red Hat has upstream developers working on using PPF to replace IP tables I think many of you may have heard about Brandon Gregg talk BPF on profiling and tracing if not go check it out it's fantastic cool stuff there's a large investment into vpf into this new kernel technology who contributed to BPF you can see a lot of companies here a total of 190 contributors have contributed to the kernel portion of BPF over the last two years that's an intro like Sulley maybe PF gives you a background on like where we come so now we're going to talk and dive into how we can use vpf and sodium to connect multiple cuban Etta's clusters together i want to level set a bit we'll go over this quickly what I want to make sure that we all have the same understanding difference between deployments and parts deployments is your desired state right you will if you have two services running you will have a deployment for the front end and a deployment for a back end parts or a collection or group of containers as you scale deployments as you define how many replicas you want if a particular deployment that number of parts will be scheduled right so part is what's being used when you scale your deployments if you do queue control get deployment you will see the desired state the currently running parts and so on deployment in parts so as you deployed this part every part will get a unique IP address and they're just living on your cubanelles cluster so you have let's say three front ends and you have four back ends how does the front end talk to the back end does it need to know about a specific part IP of the back end no this is where the service is coming in a queue Boneta service so you define a Cuban at a service it will get a so called cluster IP or a service IP and as the front end talks to this IP address kubernetes will automatically load plants to a healthy back end to a healthy part if you do a DNS lookup cube DNS will respond if the cluster IP there's some special cases that I'm ignoring here headless services are planned so on in simple terms this is how Cuban Edit services work relatively simple and straightforward how does queue bananas know whether a part of a canned is actually still healthy whether it's still alive kubernetes allows us to define a liveness probe and converse that then will probe that back in part every end seconds ask why you're still alive are you still alive are you still alive I are still alive and if not it will remove it from the list of parts that the service low balances to and this is being represented at as the so called end points so as you define a Cuban a disservice Cuban RS the API server kubernetes will automatically also create a end points and the endpoints resource or object will contain a part I piece of the healthy back ends of the healthy parts this is all you need to know about Cuban ideas in terms of multi cluster we're not introducing anything on top we're using this exact construct to do multi cluster if you're an advanced users you can manually or yes manually maintain endpoints and and put arbitrary IPS IP addresses in this and queue proxy and other solutions will automatically local ants to those IPS as well so this construct is not necessarily limit limited to parts so what we're going to do seeing is global services it's a standard given at a service with one addition and a notation where every mark a service as global as we marked is for those that don't know an annotation is a way that allows Cuban edits for third-party plugins or users to define or add annotations to a standardized object and plugins or users can then interpret that annotation and do something with so we can annotate a standard object so with this annotation we're marking the backend service as global and Salim sees this and automatically does load balancing across clusters sounds good let's do a demo and I really hope that the Wi-Fi is working because I will need a connection to my cloud provider so I like to do like theme based demos some of whoever I've seen a demo for me before it's going to be it's a new demo it looks similar but it's going to be a new demo a long time ago in a container cluster far far away it is a period of Civil War the Empire has adopted kubernetes and continuous delivery despite this rebel spaceships striking from hidden cluster have wonder first victory against the evil Galactic Empire during the battle while the Jedi have were busy studying the book of Sto sto co DS the Empire managed to capture Leia and bring her to the Death Star so we're here in the movie in this scene Leia no Alderaan is peaceful we have no weapons you can't possibly then name the system where is the rebel base looks at Aldrin for a moment then resigned then tween they're on Dantooine so that's the setting for our demo so I'm switching tabs here and you will see a terminal split and then the upper half is my cluster one running on gke on the lower half is cluster two running on e KS and GK is actually in Europe and II chaos is in in the u.s. so you're trying to make this as difficult as possible so let's check whether we actually have connectivity to my cloud provider yes okay it's a bit lag but so I have three nodes running in GK and I have two nodes running in chaos I have a bunch of parts running here CVF love rebel base and we have ax wings and in my eks I have two rebel bases I also have a Cuban ettus service rebel base so let's look at this but I was wondering case just my short elias for cube control so we have a rebel base service and it has a cluster IP and you can see that this cluster IP is different from the pod IP so these are the pod I piece right then we also have this end point structure that we discussed before I can look at this so we have a rebel base and you can see that this end structure contains the two-part eyepiece like this one of the tool rebel base parts right so kubernetes is actually only aware of the local parts commanders does not know about the other cluster at all as I saw when I had when I did cube control get notes I did not see any additional notes so our global service routing does not actually connect to the control plane of cuban artists together to keep it as clusters are still I've still have exactly the same scope so I can go into an x-wing I will do control exact I won't just launch a bash so I'm now piloting an x-wing it's pretty cool and a yaks wing can curl the rebel base and the rebel base will return and say I'm in galaxy Alderaan and I'm on GK we can also curl the service name and let's do this a couple of times but all of a sudden the response was different and somebody from Dantooine in eks actually responded so how did that work so Liam transparently did low balancing and synchronized the services in in the in our cluster - in eks and made it available in the first cluster and then those standard load balancing based on Cuba's health check information so if my if my part dies or becomes unhealthy and cost it - it will be removed so let's try that let's do a for loop I'm really bad at typing let's do one more so it just doesn't add to exit so we'll just do this a bunch of times and then we do coral rebel base so we'll see that sometimes it returns GK sometimes it returns eks right so what happens if we scale our deployments so you met a scale that come how can change the number of replicas risk we scale our rebel base let's scale it down to zero now it's only Chi K so this is very quick so like we scaled it down and now it's always going to the local cluster I can scale it back up now see that cube analysis is now creating the containers and as soon as the containers are up on the health track is responding we see eks coming back in nice so we have low planting now what if I want to do policy right I still need to do security in all of this so let's look at how security works so I have a Yambol file here that defines a in this case it's Saleem that what policy which is our cod4 policy but this works with standard network policy as well look at this policy rule this policy rule it has a name it says allow cross cluster has a description allow x-wing to rebel base within cluster one description is actually wrong it must be within 2 e KS 1 so this policies us it matches on all parts with the name x-wing if that part is in cluster 1 which is the chi ke cluster and then we allow egress to end points 2 to 2 parts which level which layer which match the label ek s1 the cluster name and the rebel base so let's go out here and actually apply this policy and I'll have a Salim Network policy loaded I will need the IP address here so we can go back into our x-wing and I should be able to do let's see so this is the IP of the pod in the auto cluster which is what we allowed that still works what if I try into a curl to the IP in the local part that will not work so all policy that Salim and all policy that network policy allows will work in this model and you can define policy to either include the cluster name in which case you can define what clusters can talk to each other or you can define a global policy and say I want all my front-end to talk all my backends across all clusters so it's a flexible model and you saw we did not encode any IP addresses at all the reason why this is working is because psyllium does identity-based enforcement and not IP based enforcement which does not require cluster two to be aware of the IP addresses in cluster 1 and vice-versa that's a benefit of the identity based enforcement model so let's talk a little bit about design principles the first of all and this is I think 80% simplicity like we wanted to usability of this to be as simple as possible right now this is an annotation but I think there will be interest in the community to actually standardize this in some form we also wanted to avoid a need for a networking degree so you don't need to understand subnets or routing we'll talk a little bit about what you need to understand in terms of setting up VP C's and so on but that's pretty simple all the very important is must be absolutely simple to troubleshoot and debug right you don't want to be in a position where this is not working and you have to wait through thousands of IP tables rules or figure out with TCP dump what's going on so a design goal was also to make this simple then security obviously like as we cross cloud providers encryption is is elemental we want security policy spanning multiple clusters we saw that in the demo and we want compatibility with mutual TLS to run sto or other service measures we want resiliency we want to be able to not impact order clusters if so if something is wrong in cluster 1 that should never have an impact in cluster 2 we do not want to actually run something globally that you have to manage and then all clusters connect to so we preserve the fate of domains we preserve the ability availability zones make sure that clusters can live on their own and Salim does the connectivity on a networking layer efficiency so we want this to be as efficient as possible so we do not want to introduce additional termination proxies and so on so we did not want to go across egress proxy ingress proxies and so on so we want to use standard networking principles which gives us the native performance so we saw one annotation in the demo which is Global Service true or false we support a couple of additional ones you can actually define whether a service should expose its endpoints to other clusters so you can say I have a service back-end but I don't want to expose my own pots to other class I just want to use endpoints of other other clusters coming soon will be service affinity so you want to be able to define a policy and says go to the local cluster first and if there is no healthy back and then go to the remote cluster you want to do local node right always prefer a back end on the local node then go to a local cluster then coded go to the remote node and so on not implemented yet what's coming soon if you want more talk to us on slack control plane so actually let me show you this always better right so solium is running in a namespace called slip and you will see that we're running cilium itself that's the agent that's running as a daemon set on all nodes then we have an operator which synchronizes cube another service is true and HCD that EDD is actually managed by Salim itself so we have a cilium @cd operator which then manages the @cd operator and the actor the operator manages the HDD cluster so suelen comes kind of bundled in one namespace deploys itself and synchronizes to the community services that you have annotated into at CD we have done a service which is called Salam this et Cie external this is a type local answer service with an annotation to make it V PC internal so this is not exposed on internet this is exposed inside your your V PC and it is protected by TLS so the operator automatically creates secrets and then we have scripts to extract the secret import them into an order cluster to establish control plane connectivity so this is not just free for open you need to share so in order for a class to be connected cluster one needs to share the secrets to cluster two in order for class or two to connect so that's our security model we will then have all agents in the other cluster connect to that remote data D and watch for service changes so whenever something changes on its new service is added or the list of endpoints is changing the HD key will change and the agents in the other clusters will receive a notification it's important that all agents do this connectivity it means that we don't have a single point of failure so if there is a problem if one solemn agent in one node in one cluster this will not have impact on other node so we want to be as distributed as possible I included the example here on how you can do the internal load balancer type on on gke there's I think all cloud providers support this I haven't checked but at least Amazon and Microsoft support this as well so let's look at the VPC or at a kind of setup on the infrastructure side it's pretty simple you you create two V pcs and you can create as many clusters in those mini pcs but you need one V PC on each side the cider block the IP range assigned to the V PC must be unique so you cannot just use the same IP address and that's the requirement number one we could support overlapping IPS what I think for simplicity you actually want to have unique IP addresses anyway you will then establish a VPN gateway and so that's kind of a single click or a single API call you then set up VPN tunnels you want at least two you want you want redundancy and then you set up routes to more out the IP the V PC 1 to V PC to across the the VPN gateway you can do static routing you can run BGP you can do whatever you want follow your cloud provider guide how to do this all of the cloud providers have simple guides to follow and everybody supports IPSec so you have compatibility to cloud providers or even on Prem as well you then set up kubernetes and you can set up cuban errors in any way one to automate you want self manage manage whatever i don't think we even need a specific you latest version i think even one nine would work fine you deploy psyllium Concilium will deploy the demon psyllium will set up at CD will manage it you will then extract the secret which trance access to the order at CD and deployed at it into the other cluster and that's it at that point you're ready to go you can deploy parts deployments annotate them and you will have global services use cases so i already mentioned one high high availability right running services across cloud providers or across clusters and if something back in is failing in cluster one it automatically fails over to cluster two but they're all the use cases one use case is shared services you you're probably running common services and i'm using Walter as example but Prometheus will be another one DNS will be another one services that are needed by your other services right so by running Walt or something similar in in a cluster you can make it accessible from other clusters and then you can choose basically add more and more classes as you go without having to deploy Walt in every cluster so you can bundle your shared services in one cluster and manage it there this may also have security benefits because whoever is managing or operating Walt is not actually to have access to the other clusters all right so you can do separation of concern and limit access I think this is very interesting very practical so split stateful and stateless right you will have a bunch of stateless apps if you limit a cluster to stateless only that cluster can be thrown away we created very very simply right you probably want to have if you care well you will care much less about your stateless services but for the shared over the for the stateful stuff cassandra order order order databases you will need to put a lot of more thought into when you upgrade that cluster when you move services and so on so splitting this will definitely simplify your operation another one like cv right so we all heard about the kubernetes bug that was fixed quickly bug in the api server basically all versions of cubanelles were affected a lot of people thought okay we actually have to upgrade our cuban edits class so we can't get away with just running an ultra latest version so everybody was like time to abandon a cluster how does this help well let's look at this simple version or simple example you're running cuban s19 it's actually affected by the cv you're running a front end which is stateless when you're running a data store what if you could evacuate your stateless services to a newer cluster version 113 use global services to still have the front-end talked to the data store in the old cluster right you can just move this over as we saw in the demo like the the the handover is basically seamless and then you can reach restrict access to your old cluster to your 1.9 cluster we actually have time to upgrade that cluster or do whatever you need to do to fix it and you can actually have your developers work on the new your cluster version it's your integration so we get this question asked a lot so what is the difference on what is do psyllium kubernetes so it's geo primarily and i will be simplifying this a lot but SEO and kubernetes are basically control planes right they do service management identity provider integration telemetry collection a lot of other things it's basically knows what to do right knows what services don't run run knows what diplomacy run and so on so liam envoy data plane actually handling or packets all seven requests doing routing local and seeing security policy enforcement and so on they plane knows how to do it both sto and Cuban areas right now have an an idea of how to do service definition and management so we have like these do services and we have Cuban other services we chose to implement Cuban Alice's first cumulated services first for simplicity but we can implement SEO services as well that's not exclusive at all so Salem is completely compatible with SEO and enhances it to summarize so Salome based on BPF technology it comes in the form of a C and ina or keep logging we implement kubernetes services primarily replacing to proxy there and can bring advantages such as global services better scalability and so on we implement network security identity based very important for multi cluster routing DNS server IP aware HTTP arrest your PC and so on data Pro data protocol aware Kafka Cassandra memcache TMI sequel we just added Co extensions to envoy to extend envoy we've go and we're using this to add more and more data protocols like it literally took us a week to add Cassandra support so it's becoming very very simple to make unroyal we're of additional data protocols so we can do more finer grained security provide better visibility and so on and then super interesting we do cite core acceleration so if an application is talking to a site called proxy - envoy locally it's typically doing TCP TCP is invented or built force like lossy Network environments if you're on a single node it's always lossless so what Silla will do is it will see app talking to a sidecar socket the socket take how TCP and just copy the data over the socket it's about three to four times faster thought about the other element is SSL visibility as your services talk to let's say a cloud cloud provider data store services database services storage solutions all of that data all of that communication is SSL encrypted why a technology called katie ellis kernel TLS we can split handshake and encryption and defer the encryption - after envoy and provide envoy and is do a lot of service meshes with this without seven visibility into communication that's leaving a cluster that is always encrypted with that I'm closing I thank you very much for coming if you want to learn more we have a slack channel which is very active all of this is open source that's our github repo we have good Doc's we you can follow us on Twitter to stay up to date if you want to hear more about Solem there's another cute con talk in this room at 1:45 which will focus more on the security aspects I think we have do we have time for questions all right we have time for questions yes I will repeat the questions yes so the question is do we sync anything other than services right now such as secrets the answer is no we don't would love to have a conversation on what makes sense definitely careful in just randomly syncing secrets but I definitely interesting concept the definitely love love to chat and follow-up yes so whatever okay the question is how does this relate to Federation you can use Federation this does not depend on Federation at all we are definitely integrated interested to kind of evolve together with our Federation we did not want to take on all of the complexity at Federation brings so right now it's completely separate but compatible yes okay so the question is can we do low planting decisions on orders other elements such as health checks such as latency and so on right now we do support weighted round robin so you can say that certain backends should receive more traffic we do not have latency tracking yet but in the end it's all implementable BPF gives us the flexibility to do all of that but right now it does not exist yes the questions can be replaced on what we don't even want to I think honor is great we Salim has envoy filters on metadata listener Network and HTTP level we heavily use envoy and the accelerate envoy so I think envoy and cilium are a great combination together we we can do for example faster al for low cleansing so if you don't want to use one way for just al for I think so Liam low planting is a good fit if you're talking about offloading HTTP maybe but there's no plan right now to fully do that so I think one more insulin master is a great combination to run yes the question is a coup proxy you have iptables rules you can you can manage those you can look at them you can use TCP dump so we provide tooling that is very similar but we believe it's simpler because it's not rule-based its intent based so if you look at for example the policy you will see about you will see the security identities that are allowed you're not you know you're not seeing IP addresses you can still look at that you can still use TCP dump we think we actually provide better truly I don't have the time to demo all of that but have a look run it and provide us feedback in case something is missing yes so the end what is the performance penalty of the encryption it's using standard IPSec there is clearly a performance penalty if you are encrypting IPSec has the advantage that it actually uses accelerated CPU instructions if available and modern CPUs do expose that and to allow for that so it will be it will be this performance will be the same as if you use IPSec right it will make use of Hardware if possible I don't want to quote the specific number because it will really depend on what hardware you're running and what are you running VM or bare metal and so on yes so what about the case where you want to maybe such as stateful applications across clusters where you want to split speak to specific pod versus a random service a random pod behind a service can you repeat it I'm not 100 percent sure whether I understood it sure so if you wanted to say span and application across clusters but you needed to address a specific pod such as you when you run stateful applications so you don't want to go to a random pod behind a service but you want to go to an exact pop yes how would you do that here okay question I think everybody hold a question so first of all Salam is split there is pod IP routing you can talk to the part IP directly in your cluster it will just work so the global service is the lopa lensing is split from the palo p4 well you don't have to go through the service IP at all you can talk directly to the other part I can in your cluster but then you need services call ray that tells you what that IP yes thank you yes question is can you use solely me phone Auto CNI and just use the global services yes and no we just were just disclose to publish our initial innovation with flannel to run on top of flannel and cube net was a very basically leveraged at the forwarding decision of flannel and cube net and you new policy or low plans in whatever you want we've cilium we already run in combination with cube router so you can run psyllium in a mode where it just passes information on QQ brother you can also run your own routing daemon BGP daemon and just tell Salem's the desired routing and it will route according to the to the Linux routing table so yes we want to split it right now it only works for cube router flannel and cube net will be next yes questions how does the D yeah so how does the DNS in the resolution work is using standard cube DNS cubed NS will return the service IP and so Liam will do the lope Lansing from service IP to the to the endpoints which means that in the cubanelles cluster cubanelles does not need to know about parties in your cluster right cube DNS will return the service IP the cluster IP yes sorry knows yes yes go ahead yes yes so I actually had three clusters connected about I just wanted to keep the demo a bit simple if you can connect as many classes as you want - I need a VP see we don't actually care about her you have a VP C or now you need to be able to route traffic from notes in cluster 1 to cluster - so the VM or whatever is running your node in the cluster one needs to be able to reach the note you cluster - does that make sense yes yes this is was a big cloud provider specific but as long as you satisfy no - not connectivity which will work ok yes since you're doing some some fairly deep packet magic here under the hood have you run across workloads or add-ons that get confused by this or that that this doesn't work well with or you need to do specific workarounds or is this just completely transparent to everything you've seen this is completely transparent so this is on IP level right so it will work for UDP TCP SCDP whatever it works for ipv4 and ipv6 I think there are some specific applications that say Prometheus what actually looks at the endpoints resource of cuban areas and talks directly to the power IP in that case it would not see the part IPS in your cluster just that clear so like well networking level is completely transparent but not everything is actually using the cuban at a service IP how it's in a standard way the other thing I was wondering about specifically is anything that tries to verify the IP of what it's talking to and compare it with where it is have you some run across any workloads like that that that get confused by this I'm clear on this when you say is transparent is transparent as long as both ends are running cilium right and then you require both ends to be yeah well you can totally define a service define an endpoint structure and put arbitrary I piece the Lopo Lansing will work but if you want with automatic service synchronization then you need to sell them in both clusters ok we're at a time thank you thank you so much
Info
Channel: CNCF [Cloud Native Computing Foundation]
Views: 6,360
Rating: 4.9148936 out of 5
Keywords:
Id: U34lQ8KbQow
Channel Id: undefined
Length: 40min 55sec (2455 seconds)
Published: Sat Dec 15 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.