Why Kubernetes Is Inappropriate for Platforms, and How to Make It Better

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
who's here for the Lego talk in the audience now hence there will be plenty of Lego and there's a coincidence there's a Lego talk in the Next Room so but we not from Lego but we like Lego so it's everywhere in the talk all right so before we start a short introduction who is who here on stage so hey I'm MJ M I work at cast we do cost optimization and I'm interested in control planes this is why I'm here Sebastian one of the founders of kmetic and yeah all we do is all around control planes yeah more less the same for me Stefan I'm at upbound doing control plans you know myself probably from CS I was very involved there so if you use them that's probably partly my co my code so our talk today is why kubernetes is inappropriate for platform so this is of course a big claim and how to make it better so this talk is about ideas it's not about setting a project or a product or anything like that so it's more more about um not just accepting Cube but thinking about how we would have to change Cube to get something better for platforms that's our motivation today and we were thinking okay the last days were all about Ai and ml so let's build an AI and ml platform here um and keep in mind we like Cube but we also want to show you what potentially could be improved or what we dislike so let's start with this experiment so if you want to build a platform uh the first thing what we need is we need to provide our developers some kind of possibilities to create objects so we need some kind of objects we have S in kubernetes we have CS we can create like an object uh called model and different teams can create it different user can create it so it's there perfect the next thing as we want to provide our platform to different teams we need to have some separation we have this also in kubernetes called namespaces so we can give each team its own name space they can deploy this object and it's completely separated we can also separate it by our so also this we have in kubernetes and of course what we want to have is a uniformed API so potentially we need even more objects and it should every time follow similar principles and kubernetes gives this uh to us and if in the future we want to add another AI service or we want to add a database service it should not be a complete different way how to consume this how to operate with this service so this part we have perfect let's go further the next step is we now have this objects what to do with it and kubernetes we have this reconciler pattern like the control Loop um to interact with it and then we build operators on top of this who do some actions who either deploy stuff on kubernetes if you know cluster API it can even or cross plane it can even talk to other apis so there are tools for this most popular controller runtime to build this but there are many more like for example Cube Builder or meter controller if you don't want to um write so much code and want to make it more uh simple and also like if we want to really deep go into it like C go and you don't only need to be using go for it there are many other ways uh you can use Ras python Java so there's a big ecosystem of tooling to work with this cool so multi tency so in kubernetes we all know that multi tency is implemented in a name spaces but there is a question mark at then is this really a multi tency so when your tenants are split by Nam spaces you're developer you're developing on that internal developer platform you're interacting with a platform via clients sdks and it's explicit your developers usually has access to one namespace or all of them and the SDK has to be specifically typed but that's not a challenge you can work with that your teams can create a rappers around that the real challenge happens when you start interacting with crds within the Clusters so when as soon as you install cluster wide services like CT manager crossplay Argo CD and you as a platform owner needs to upgrade those components now you have to synchronize with each and every development team to make sure they they are okay with the upgrade they okay with the versions if you want to introduce some clashing crds you have to go out and basically get a consensus from your platform to do that and in the end this makes a platform owner a very unhappy person because usually these personas has to deal with those things so yeah short status check so looks okay like we we live in that world so but there are cracks already visible in this model if you want to build platforms but we know this pain several years we live to them and okay we can continue like that but um if you continue this this picture here is actually not the picture of today right um people have more more cluster so single region multi tensity via Nam spaces we had that like 7 years ago didn't look different single source of truths in one cluster the real challenge today is that right multi- region multic Cloud everybody wants to do that and you need more isolation in name spaces so there are there are ways to get that like more clusters essentially of some kind and we talk about that in a second so more clusters means a SP of clusters and this brings complexity so you have to think about how to share data like um the volumes maybe just LIF in one cluster and you cannot talk between Services VI the cube API um keeping config consistent needs tooling and um applying policy gets more complicated so there are many many complexities and um so we we represent it here in the in this picture as Bridges we have to reconnect those clusters right there are connections logical connections maybe even network connections but they are connected they they're not living on their own and to start to start with that like you have to the Clusters there is a a giant ecosystem already to just create clusters and you will have in use maybe one or two of those tools maybe other tools but you create more clusters more tools um are available and basically if you have more clusters you have to tame them so there's another class of tools to do everything I I showed previously config compliance Aro CD so gitops tools cost plane um help you with policies and with with application deployment cluster Federation helps you to a degree with comput uh with comput or federating um deployments across clusters and there are cluster managers like the open cluster manager project for example and you see there's a there's a big set of um yeah of tools and they are all very scoped they have their use case and yeah you all know this this cartoon there at the bottom um isn't there something which can unify them of course this is a immediate idea that Engineers have but those tools are not just technology um the problem the real problem for platforms is every tool dictates a view on personas and The View on personas maybe doesn't match what you actually want to build and maybe they have been developed like three four or five years ago when platforms also were not a thing so they were not built in a way that they are really compatible with platforms so personas um basically people um they are the actual um challenge in this area and the usual personas and we talk about them later on platform owner service provider and user so we focus on those three and what makes it complex Cube was built basically with one Persona in mind right there was this Ops person deploying application on a cluster this admin basically can do everything and those personas they have very limited and partial responsibility so you have to find a model for authorization using airbu maybe or other tools cavano policy things and you have to basically implement the responsibility they should have and not more because then it's a security problem and it's getting even more interesting when you think about third parties so you want to use a third party tool like a service which is installed on that big setup and this big setup is your setup if you ask your neighbor how your platform looks like the neighbor's platform it will be completely different right everybody is proud to have solved the platform problem in some way by a clever use of tools but it's it's it's very diverse and um the tooling we have have in the ecosystem like a h chart doesn't make sense in this world right it cannot talk about multicluster it's it's just not made for that so one question you could have what is actually a package a service which is um able to span clusters how to install that how does it know about your clusters and you get those other Bridges or those other streets there so many many um ways to reconnect the cluster but this this Territory between the Clusters is basically undefined right if you go here to the to the boole above um every multicluster tool builds their own Bridges and there is no common language so complexity explodes this thing is hard to support like it it can break down easily by just new requirements for example because you just found those tools which basically Implement just the requirements at the point in time but tomorrow there's a new requirement and those tools don't do it anymore because yeah it's hard they're not integrated they just do what what you wanted at the time and it's hard to integrate into them like installing something new and um supporting the same personas is hard and in general I think it's a it's not a good experience you can build it to a degree like a platform which kind of works but it's not a good experience for any of those roles and the reality check again Cube was not built for that like we trying to build platforms via a technology was which was built for containers that was a purpose all the time and platforms is something we came up like in the last year or last two years and um we try to to to make uh yeah use of kuun for that so but remember we like Cube like the ecosystem is of course very big lots of companies and projects around so we want to keep that but the real question is so we want to keep the ecosystem but if we build a c a kubernetes a cube today for platforms it wouldn't look like kubernetes it would be similar in certain degrees but it wouldn't be kubernetes for sure so um yeah last summary about that so we know where to go to like we have rough idea what a platform could do we did something similar before but this time Ambitions are different so we want to build something which can lift up a ecosystem to platforms personas change obviously and Kelsey once mentioned cube is not meant as a end game it gives a pattern like it builds a plat basically a PL form for container workloads but it's for container workloads and the big question is what is the next step like if we lift up or we leave the sca orchestrations um uh area and we build platforms for other things and yeah coming back to Lego we are on the left side right we have small building blocks built for single cluster environments and everybody can build a car or something but again if you ask the um your neighbor for his car he built with the same tools it will be very different and they don't integrate it's great for creativity everybody is proud about the car but we have to get to the right side like a grownup approach where basically we build towns and towns are an environment where every component lives in where I can have an off the-shelf component which knows about this is a platform there are clusters there are apis and everything this is a the vision basically and we should get back to a a place where we can collaborate like a Common Language talking about platforms where you can build a service which just deploys onto this big platform worldwide multi- region multicloud and so on and one of the challenges what we see Stefan mentioned it already nowadays we have much more personas in the game so when we started with kubernetes typically it was a developer who was responsible for everything so but now we have at least three different kinds of people who are involved in this but there could be even more so it's starting from the platform owner so the platform owner really holds the key for everything um he connects everything together and he provides this generic platform what the users can consume but the platform is nothing without services so then we have the service provider so these are dedicated teams who are really providing the services in our case it's the AI or ml team who's providing this as a service to the developers so and who's also operating this so all this complexity like how to upgrade how do I maintain this is done by the service provider because they are the experts of the tool and then we have the users they want to use this they are developers or data scientists in the IML space or application owners who building higher level services on top um so definitely if you're talking about platform you at least have this three personas but there could be even more so let's look into more in detail let's look into more into the platform owner so the platform owner the main focus is really the enablement so they give you a well- defined flexible platform um to make the consumption and to make to consume the services for the user as easy as possible also they want to abstract the complexity as a user you don't want to deal with like how do I deploy this AI ml service on a cluster uh you are put if not a kubernetes expert you want to consume the service so the goal for them is like for the platform owner is like how can I provide this generic platform that the service provider then can use to build this higher level services on top of this and another part is then also it needs to be scale scale horizontal on different providers potentially on different regions adding new services to it adding new users to it so that's not a platform for only AI in our case we want later to add databases or storage as a service to the platform so the platform needs to build in a way that it's really scaling and that's um working to add other services and that the other services can easily consumed in a similar way especially without renting the wheel for each and each of of the servic so it should be easy to add new services with similar pattern similar ways and not like that every service later has its own API its own way to provision so as a developer or as a user you really want to have one way how to do this so that you can also build higher level tooling on top of this which makes it then to keep it really homogeneous so that we have one way and one way to consume it one way to provide it and as a user you don't care if you're consuming our AI service or potentially later consuming a database service additionally it should be easy to integrate into your stack into your to Tool stack and you want to have this uh simplicity similar I mean we are all here on the um cubec con similar kubernetes did for containers so kubernetes really enabled this whole ecosystem um and I think similar we need also for the platform uh we need this standard way that everyone can deploy and adding services to to this so we really need to build it for the user um so they should not be forced from the platform team in any opinionated way they should should use the services if you have multiple Services they should use whatever is best for them um and not what we potentially from the platform team thinks because there the value comes um it's primarily giving them the services and if your services are good they will consume it they will use it um if it's easy to consume um easy to easy to upgrade easy to adding new Services they will definitely start consuming it and um potentially even getting into the role that they later add their application as service to the platform as well so that other then can again use their services and you st St it up service by service yeah next one Service uh providers so imagine you are developers and you want to to build some tooling so imagine um when I talk about tooling I don't know a policy engine or something like that and in kuus you have you know controllers obviously you can run controllers in kubernetes but they are basically limited to the cluster scope right they cannot go outside I mean they can but tooling like controller runtime is just not built for that there's even not multicluster in controller runtime nowadays so what we we want and this is of course this a vision um to have this this rail track for basically the service provider behind the scenes of the users so users are in those small um plates there the card plates and we are on the track now and uh we are building the service via controllers so to do that we need some some system and again the example KU does it for cluster where you can get the request from users and build something in a consistent way whether it's one region multi- region basically you need awareness of the apis and the tooling you have for this PL platform use case and if if if you have that you are efficient right and you can safely operate um the service in the setup so what we need is something like a set of tools which not only work for single region but also for this bigger setup and you you see those cranes there on the track so those are the controllers and you need a way to deploy them without even knowing that there are seven clusters in this region and 25 in that region today there is just no way to make those aware especially not with standard tools like you cannot deploy third manager um globally no way there's no no no way to do that and the yeah the vision is basically we need some some some system which can talk about those Global Services as a product a product you build your own a product you sell or something you buy from some some vendor and it's I mean if you think about a vendor you buy something from somewhere somewhere else like you have this consultany you you hire because they have this very useful TurnKey solution for some some use case and you want to deploy that I mean you have a big platform there's lots of data inside lots of secret data maybe even and kubernetes cannot protect that really I mean it's it's pretty limited what KU itself can do but the system must of course allow secure third party services and yeah if you go into details here zoom in a bit what are The Primitives and we will not show them here we will see some of them in the demo later on but um the question of course is the CDs we saw in the beginning are there actually the right thing for the setup or do we need anything else of course we can ship around cads like we can synchronize them and install tons of Helm charts but is this is this the right um abpe and um yeah maybe not and airbu same question if I'm a service provider I own want to see what I have to see otherwise there's a risk that I I leak data or I some exploit in my application does something back for the platform so I want to see only what I need like the claims of users and nothing else anything I don't I don't need from my service I don't want to see with airbox this is hard all right sounds all very abstract there's always compute of course compute is Central obviously in I mean in kuus it's this thing like a center of the world in a platform there must be compute but it's just another service so think about apis which provide compute this can be a um kubernetes API or anything else but maybe you have seen that like when the cloud movement uh was starting this term of utility computing was the thing and basically it's what Aro CD flux or other tools like crossplane um what they are modeled around right they're running somewhere but basically Aro CD or flux or so they can deploy stuff elsewhere it's always somewhere else on some compute service and there are different ways different um kinds flavors of Computer Services I guess many will use V cluster for isolation reasons for example it's just an API to create the V cluster and um get it into the system in a consistent way it doesn't matter that if you run an application you don't have to know it's a v cluster right it's a detail it's something the application team could just do behind the scenes you want compute which is Cube compatible and of course VMS and similar things also exist Federation there are many more more variants and the core idea here is cube is attached to the platform so attached we will see it very visually later on when MJ shows the demo compute is not the center of what we are talking anymore about so comput is the service on top of a platform not the center of the world cool so let's look a bit to the user Journey so we talked about two personas but in the end users are who will use the platform so as a user what I want I want uh spaces to do my job to work to interact with and in this concept we call them workspaces they uh they should be distributed and as a user I should decide where I want my workloads to run where my jobs to be either it's on premise data center or in a cloud it's a distributed isolated but at the same time all logically collected it's like last thing I want to do is keep jumping different cube configs different access modes it should be seamless in my control I should choose which API to consume and wear and you might argue that this picture is not the one which is abstract the complexity but as a user this is what I want to see only things I interact with there's few blue bluish blocks which represents the attached compute single API endpoint I can navigate from one workspace to another so isolation at the same time connectivity together and for me if I'm jumping from a application to application from region to region from Cloud to Cloud I should not care about implementation details under the hood that's a platforms owners and service provider teams responsibilities yeah and what we describe here it's a vision uh this is not cube right Cube cannot do that just by adding more clusters to to our platform I I will not get this user experience this experience how to use uh a platform how to consume apis and so on so it looks like we need new Primitives and Primitives which are not about just containers but in a way um there should be for a world which are inherently multi- tent region and Cloud Cube was never built for that and again the big goal is to regain um yeah a state where we speak the same language where we can innovate together and I mean what we all like at cube is the API and we want to have this kind of uh API and so the there is something there already so we have kcp kubernetes like control planes it's now a cncf Sandbox project it's really a framework to brute swp kubernetes like Zas or platforms um to build this to build higher level services on top of it but there's even more work done also in Upstream kubernetes to make the API server more generic so that you have really generic control planes which not all the details you need for containers because for platform you potentially don't need containers you won't need to have services but in the real compute cluster you later need comput containers again and I want to underline the word framework this is not a product not a project it's a project but it's not something you you install usually it's it's a framework to bootstrap experiences to build an experience for a platform so what MJ will show in a second is basically an example it's an example platform experience built with kcp kcp is a library if you want to do those things and it doesn't have to be a platform it can also be a product like there are companies building um Cube commun compatible products which are not about containers on that basis and the core component that is shared that's kcp okay so demo time I know thanks for sitting all our Visionary talk let's see this in action in this case we have a platform deployed across two global locations globally so we have two two regions one is named root and it's based in Europe because cuon in France Europe and we have a second region which is beta named beta and it's in us and I will act will be acting as all three personas in a demo so I will try to represent it as best as I can so let's see how it goes so first Persona I'm as a platform owner so service team came to me I said like hey we ml team AI team we want to start providing model training API to your customers in your platform it's like sure like I can do that so just to get showing so I have two locations rout beta single API represented here and I showed the current view of my current workspaces Nest to CH these are platform system workspaces nothing to do with M lii so it's um MJ this is one kcp right and it's running already worldwide in two regions it's a single instance it has this spanned out across the locations and I will show later on how this how the users will interact with it and choose where we want W workloads to run so let's do first thing I need to do is I need to bootstop my ml teams configuration into the cluster that's your Custom Tool you WOTE right yes I'm it's a custom code and Nothing Stops this to be self-service so teams can come in and onboard themselves too in this case I want to have platform owner wants to have more control so he says like yes okay I'm going to onboard you but this tool is basically aware of the clusters right or the regions so it can apis for that so let's show the different view what we have now and I'm nervous typing yeah in WS of course is workspace mentioned yes and I can see the new workspace appeared ml for the ml team and sub workspace training so I created this playground for a Mel service team to go and provide the service so how this looks like if I go now inside AP export Dash no so it boostrapped an API so it created this API as a service object and it's called training it's already bounded to both locations this is enough to enable the platform the service team ml team to start serving Behind These apis like create their own Services globally so this is important right it's uh so they can WR controllers and they are just aware of the topology in uh yeah different so at this point I'm shifting personas to be a more like a service provider team so let's get some layers off so I'm switching Cube config and just to show where I am from the service side I have this structure created for myself as a user and my team runs everything in a kubernetes controllers we like kubernetes controllers so we're going to be serving those apis using standard kubernetes controller patter reconcile so I need to deploy the controller to reconcile the apis globally across the state so let's just jump into compute prod so and all those commands they look very like a file system right to change deer or something I went into the cluster where my controller will be running let's spin up the so you have a cluster now in kcp right so like if to look for the Linux uh terminology I mounted a compute cluster in my workspace as as auxiliary compute so compute became part of my ecosystem because I don't want to be jumping Cube con Cube figs it's the same experience for everybody and looks like Lego talk just finished so and I'm deploying the controller so let's see ml controller is running so the service team ml team who provides apis they did the job they compiled it so now I'm a user as a user I care about only one thing at this point I need to run my ml jobs get results and they need to run in us and Europe because of data rules and everything so let's switch to user role now so as a user so same workspaces let's create a two workspaces and if you notice I I have it two location selector one is name root another one name beta and type ml training so I'm playing hey platform give me a workspace place for me to work which is ml enabled so if I do now vs3 I see the two new appeared and I have this crd created which I read from documentation this is how you do these things I need to change train some chat application on Lama 2 some parameter same crd and I need to train it in both locations because of different data like uh France accent language based chat and USA English one so I go into Europe's workpace simple command just get in and I create a crd o and I see it got accepted by location bestest Europe because I just instantiate in the workspace I didn't provide it it anywhere else apart like give me this workspace in that this means platform itself knew that the workspace needs to land in Europe location and do the job there let's do the same now for USA just to show that it's not a Serial demo I creating a save model same thing and if I do I can see it got accepted by Us location from kubernetes standpoints this feels looks like kubernetes experience all the crds are there API is there and one thing which I want to show last is the model CPI was provided to my workspace in a form of bindings so this means the service team when we created export now you can bind one to many and teams just interact with VPI and they do the heavy lifting and you don't see anything about the controllers right they're just invisible it's a it's a service team's responsibility to handle those things and if we see now here like model in US completed running job done so that's that's how it could look like if somebody would be built somebody would build ml platform as a service yeah so very quickly um kcp is a Sandbox project we talked about that it's based on community source code it's based on CS so everything here is a community API um so all the tooling controller runtime Q CLE you saw in the in the presentation all the tooling just works AR CD just works and um everything that The Primitives we are building here they're inherently multi multi region multicloud so it's more than just what kuties offers workspaces are in the center workspaces are our um yeah basically the unit of the user experience uh to to work on uh to work in everything is in one like you saw the switch around between the workspaces and the hierarchy there's one endpoint behind that of course the endpoint can be ha like in different regions or different uh Cloud providers but it's basically logically one endpoint and it scales like we saw two sharts here so the thing was running already in two sharts but you can have 100 sharts in theory and the API export and the API binding this is it's based on CDs but it's not CDs it's more than that because we need different Primitives uh for API management so he saw that already yeah I mean we are here also after the talk talk to us we want to get your input um we also have stickers with us you can find us on the kubernetes leg kcp minus def um of course it's a sandb project so the code is open source um go to GitHub kcp minus death kcp follow us on um X so uh KP or our dedicated handels and if if you have later questions or come to our boost upbound boost kuber metic boost or cast AI boost uh there you can find us and what we need and what we want is your feedback feedback to the talk how you like it but also in general what are your thoughts what are you doing with platform um and where you would like to use this so that kcp and in general this whole idea can uh evolve thanks everyone thank you
Info
Channel: CNCF [Cloud Native Computing Foundation]
Views: 4,267
Rating: undefined out of 5
Keywords:
Id: 7op_r9R0fCo
Channel Id: undefined
Length: 35min 25sec (2125 seconds)
Published: Fri Mar 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.