Building a Storage Cluster with Kubernetes [I] - Bassam Tabbara, Quantum Corp.

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right good hi everyone my name is Bassam tabara I am the CTO at quantum where we started as far as a culture which I'm excited to tell you about today and walk you through storage and kubernetes so who's here to hear about storage ok I thought you're not at the right place no kidding what I thought would be interesting to do is to actually walk through I think you hear the term storage and persistent storage and people think about storage in different ways so maybe maybe we can start with a review of storage and kubernetes and then lead up to where I think we should be going or we are going with storage can everyone hear me OK in the back okay so here's my fancy drawing of a kubernetes cluster there's a masters and three nodes typically when you you know when kubernetes first started there was no persistent storage there's nothing outside of a cluster so this is a camera which version that but essentially if you wanted to store data that survived a pod moving between nodes you're you're kind of screwed all storage was essentially on the actual node and that was all that was available you heard the big announcement today around 11.6 with dynamic provisioning where we've done is that we've with kubernetes can now manage volumes that essentially are provided by external storage clusters so if you think about what that means just like just like managing the network or managing pods managing replication and all those things kubernetes can now manage the creation of volumes and the mapping or attaching of volumes to pods right but kubernetes does not manage the other side of the the actual storage cluster providing these volumes right so if you're in a public cloud environment like AWS or Google cloud that's typically not a problem because you have persistent disks or we have EBS or you have you know the your cloud providers storage that with dynamic provisioning kubernetes can essentially Mountain EBS volume attach it to your instance it could make it available part as part of the pod yeah and the same with Google system disk if you're not running in a public cloud environment you're kind of out of luck you essentially have to run a storage cluster that can provide volumes to Granny's somehow and the way you do that is it's an external thing it's not part of the kubernetes environment so if you're running a software-defined stack such as staff or Gluster you essentially have to run a full deployment of self and Gluster and/or Gluster and you do that in a traditional way as you do it as part you typically have storage admins that are managing it it is essentially a completely external to kubernetes or you use traditional hardware whether it's NetApp filers or emc boxes or whatever your enterprise has for storage you can certainly use those but it's the same thing there completely outside this environment required with a require some admin or ticket system or something to provision storage in this world and it's just essentially external making sense so far I would love and interactive sessions you feel free to ask questions I'm happy to happy to make this as interactive as you can so what if we brought the storage cluster into the kubernetes environment instead of having it be an external least that's managed and just like we've done with say networking or like we've done with monitoring or like we're done with load balancing what if we actually brought in ran the actual storage cluster and had it be managed by kubernetes itself so if we did that then to create the storage cluster I just cube cuddle I start up a a new storage cluster like I would say a load balancer or like I would say Prometheus and what if that storage cluster could actually provide volumes through dynamic provisioning to other pods running in the same cluster we think that that's what we have to do to solve the storage problem in kubernetes we think that that's not only going to be a the default way of running storage in kubernetes but it's going to unlock scenarios around portability across clouds and the ability to actually reduce the management and the reduce the surface area of what is running in Kupa days now that doesn't mean that you kind of have to run in a converged mode like that you can actually if you wanted to create another kubernetes cluster just for storage and scale your storage independently from your compute and your applications but there is no reason if your organization is settling on using kubernetes as the you know the single way of managing and running applications there's no reason why you can't go in your storage cluster even a dedicated one in the same way that you run your applications make sense so what I thought I would do at this point is just switch over and let's talk about one of these storage systems that now runs in this way it's a project called rook rook is essentially an open source file block and object storage that is runs as part of your cloud native environment it runs as an app in kubernetes it's completely integrated and now I'll demo it and we can walk through all the details of how how how we're implementing it now before I say that I think there's going to be a number of these coming up right I think I don't know of another one at this point that is running in this way but I think that that trend is going to is a good one we're going to see more more storage running in this fashion and then the reason for that is if you think about software-defined storage its itself a distributed application it's made of many different parts it has it needs to be scaled it needs to be scales according to certain dimensions metrics so it looks like any other application and kubernetes is designed to deal with these distributed application so it's a really good fit for for essentially the software-defined storage so what is rook rook is essentially battle-tested software-defined storage and I say battle-tested because we base truck on the version of stuff embedded stuff that we make part of their raw binary stuff is awesome EF has ten years of production deployments on top of it and runs the world some of the world's largest storage clusters [Music] so what you can think of rook as taking stuff and making it cloud native making it so that it runs really nicely on environments like kubernetes and integrates it's very deeply into kubernetes and then we'll see that rook itself is open source it's an Apache 200 license it is you know designed so that people like you here and I'm hoping that you can all help make rook better can participate and make in actually driving storage into kubernetes okay that's pretty much all the slides let's actually do some demo and talk locate look at Yambol files I know you've are all excited to see more llamo files all right why don't we start with the first UML file can you guys read this this is readable okay yes I can make a larger for the guys in the front here the popcorn section up here to the left to the left to the right all right let's see okay so the first thing we're going to do is that we're going to teach kubernetes how to manage storage clusters by extending kubernetes and the way that you can extend kubernetes today is by essentially creating an operator or canoe controller so part of what what Rooke does is it has a controller that's designed to manage storage in the cluster we've designed this again to be so deeply integrating the kubernetes so we're using third-party resources to describe every aspect of the storage cluster now I'll walk through that so first I'm going to create a deployment that starts the rook operator and again the rook operator you can think of as an extension of kubernetes that helps a deal with storage concepts pools storage clusters file systems block stores object stores etc and it's a really simple deployment right now just one replicas the rook operator is of tiny binary just runs and essentially it's going to watch for changes in the environment or instructions from the admin about running storage and clusters so we'll start with that and sorry it's quite simple we will I have a window here that it's kind of watching the cluster and showing what pods and resources are in it and I have a window down here where I'm going to run a command and so that's pretty much it the rook operator is now running but first thing that there happens at the rook operator creates is sent a set of third-party resources which is again a way to extend the kubernetes api the ones we create are for storage cluster for pools and a number of others that that essentially why what's up I don't know I think I create I've been creating and stopping it a few times maybe it's the first creation good question there's no magic this is a live demo and so that once the operator is running that you only have to do this once in a cluster once the operator is running from this point onwards can go create new clusters you can create pools you can do all sorts of things with clusters and and we'll do that this is a what again a one-time operation now let's actually look at the ammo for a cluster so again this is a fairly simple description you'll see the kind here is a new kind it's a real cluster it's not so real cluster is a third party resource by default kubernetes does not understand rook clusters but we just define them and this will what's up which company do you work for all right we'll do this idea and when the demo fails all right so now that they have the operator running and all the third-party resources have been declared we're going to go create a cluster and we do that by creating a yellow file of kind or cluster so what this says is the declarative declarative statement it says I would like a new cluster a new cluster to be deployed on this kubernetes cluster and you can also set you can set the namespace that you'd like the cluster we deploy it on and a bunch of metadata parameters that are essentially define the cluster right now it's very simple which version of image to use a Brooke we're just using latest weather to use devices or data ders and a bunch of device filters this is well you will see a lot more here but this is this is essentially the minimal cluster so I will go ahead and do that let me just create this cluster and what happens is in if an instance of the cluster is created the operator is watching this there are party resources it sees that oh somebody wants a new cluster and here are the parameters for the cluster so the operator wakes up says I'm going to go deploy a new cluster and what the operator does is it actually looks at the size of the cluster it looks at the number of nodes in the cluster it looks at storage nodes it finds it uses a number of heuristics to figure out what the right size of cluster to deploy and in this case and it starts with creating essentially the set monitors bootstrapping them creating a quorum waiting for them to come up putting the networking setting everything else around the cluster and as I said that this is the time when the screen should come up there we go and so you'll see that it created three monitors this is a step term if you're not familiar with chef Steph has essentially its own Paxos role where it uses it for essentially maps and and consistently in the cluster so this rook operator created three monitors in a three node cluster it created three storage nodes one for each device on each of the nodes and it set it set those up and it created a rook API server that's also running as part of the cluster if we were to grow this cluster the operator will actually notice that that has the node size has changed and it will grow the cluster if we were to shrink the cluster then the operator will actually also react it'll reduce the cluster size it will move and rebalance the data as needed now a lot of this machinery is done by SEF but the rook operators is essentially what's making SEF work in a kubernetes environment time to think about it that way question so the operator is not on the data path so the data path is identical to SEF you or that was one of a explicit design point for us like I said it's ten years of production experience awesome code base we did not want to be on the data path yeah yep yeah so the other thing that rook does is it essentially abstracts a lot of the complexities of stuff around not that they're not important concepts but we think that we can expose simpler concepts that essentially right on top of us crash maps are the ones here you mentioned are a good example of that super powerful constructs what we're doing is at a PPR level you can specify a location in topology torque notes and that essentially results in crushed maps being created for you and managed for you by the PI work any other questions I thought someone yeah yes yep SSD nvme there's different layouts on each of the storage nodes rook is in alpha so which is near alphabets I'm sure lot of room for improvement in terms of which what kind of layouts and all these things that were talking about but so here we go um yes yes yes so the basic policy is that if you use if you one of the in cluster you could you'll see I said they use all use all devices equals true this essentially is a flag that says I'm essentially I have a dedicated storage cluster all physical the storage devices will be turned into a rook cluster right so will will in you ate all your devices will figure out the layout and we will use them for storage you can use so if you can turn that off and you'd have to tell us which devices on each of the storage nodes or you could also use just directories and we'll we'll actually just use the directory as you can imagine there's a lot of different policies that people want to specify here and there's a whole proposal so what about is that all face what about the data yes so surf surf is really good at all of that stuff has internal mechanisms for detecting wine storage note goes away what was both temporarily and permanently and triggering essentially their rebalancing and and and the rear application of data so that we take advantage of all of that what at the rook level will also recognize when a new node shows up in a kubernetes cluster and then add it to stuff and manage that that part of it ok yeah you do yes not not right now I mean if you have a storage cluster and you shoot all the nodes and add your shooting your data in ed that's right that's right right now there is no you know that how typically the way that these things are set up is that you know they're not on an replications not at a node basis it's a logical basis and so removing a node might trigger you know some data to move but it's not a loss and it's not necessarily affecting a single pool right it might be affecting different pools but if you're asking how do I if I want to decommission this set of notes harder and how do I know which order to do them there isn't any sports for this right now yep ok let's move on a little more so I've created a cluster at this point I've deployed as a rook cluster that's running and has an API server and every else in these pods it's running in a namespace called rook so I have essentially a dedicated namespace my storage cluster let's actually do something with it so I'm going let's what I'm going to do next and I'll show you this it's a this one has a little bit of a hack until we get our volume plugin but I'll show you so I'm going to do two things now that I have a physical cluster running I'm going to create a pool and a pool is essentially a logical construct that says I want you know this kind of replication policy or maybe it's a racial coding and I can select devices whether it's SSD and p.m. et cetera kind of alike and I can I can its I could use this pool in for certain applications or for certain workloads it Maps very closely to the concept of a storage class in kubernetes so the approach we're taking is that a storage class can actually reference a pool but you can create the pool directly from using a third party resource so I'm going to do that here the top part is kind equals root rook pool so that creates a new pool it's set to replicate and right now just one replicas as in very very durable and on the bottom section shows a storage class and this is a this is an existing API code concept as part of kubernetes and there will actually point the storage class to the rook pool so I'll let's actually do that so if you're not familiar with storage classes they're essentially a way for admins to design a policy for how volumes get created right from which pools which discs in a say AWS environment it could specify the kind of EBS volumes you want mount and once you create a storage class then people can use the storage class application developers can actually use the service to create volumes and volume claims and we'll walk through that so at this point I've done I created a pool I have a storage class let's go ahead and go create an application I'm going to use WordPress WordPress requires persistent storage itself for its art for its content but it also uses MySQL MySQL requires persistent volumes to store the data to be reliable so let me show you what that looks like and then just quickly so the top bar creates a persistent volume claim for the MySQL database that's going to be used by by WordPress and you'll see that there's a storage class set to rooked look rip block and then a set of labels and size of 20 gigabytes and the access mode read/write right so this claim essentially matches up to the search class and the storage class matches up to the pool and the pool is connected to the cluster that's essentially the chain of how we know which devices end up storing the data that's sitting underneath the spool and it's also how we know the kind of replication policy or what are using a ratio coding or all that stuff to define on that path but it's defined in different forms in different ways so that there's a nice separation of concern between an admin and an application developer Lucas I think so yeah I think so it's supported it's supported by the size of volume plugin and by others yep okay so I will skip the rest let me actually maybe just show you the usage of this in case you've not looked at this so here is the actual MySQL database the pod for that you'll see in the bottom here is a volume which uses the claim so at the time of when the pot is launched or scheduled kübra days will go and it will create a new volume and I'll do so against the storage class in the pools and everything else and then when the pod is scheduled on the machine it's running as a part of docker then it will mount and there is the mount that volume on that path all right so let's do that and you'll see the bottom two lines show that two volumes have been created one for MySQL the other for WordPress 20 gigabytes each now after those are created the WordPress application is running and so the MySQL database that's supporting it all of this was done automatically right all the volumes are created everything cluster all of it let's actually start it up and see it running there's WordPress let's see we're going to install WordPress well again and here we go so a couple of things I want to mention I only use cube cuddle to provision this whole thing I didn't use any third-party tools I didn't use any other CLI I truly just used kubernetes to bootstrap an entire storage cluster and use it itself again a testament to both the extensibility of kubernetes but also the fact that you should think of storage as just another app that runs as part of your cluster um question can somebody press the button again kill the pod I I planted him yeah to restart another node yes promising you know you know it's a very good question I'll tell you that some of the early users of rook do exact how the same thing on AWS and there the numbers are up to an hour to detach a volume from an instance and attach it to another instance could take up to an hour and so if you're running something like in this case Prometheus on top of storage you're you're beautiful charts look like they have big gaps in them and you don't know why and the reason is it's actually the just how long it takes of storage the other the other reasons that people look at this in these environments is that they you know at least in Amazon an EBS volume is you know essentially within one availability zone you can't say detach it from one zone and attach it in another zone you have to snapshot and move it right by running a software-defined stack you can actually create a cluster that is across multiple zones and a pod could fail over across zones without needing to actually do anything and in such environments so so again I was hinting earlier about well I said Amazon all these questions came up why we do that afterwards I don't even know how to question yeah yeah good question right now we're using the standard CFR Biddy plugin that's part of kubernetes that big big statement here with all these exports and magic acts in it is to work around a set of issues with the RVD plugin we're creating a rook plugin we hope we hope will be part of 1.7 kubernetes 1.7 that makes this whole experience a lot better the other thing I'll say is that rook is not just like stuff it's not just blog storage so you can actually run a shared file system here as well and so you know you could not you can go through a path of not just RVD that sefa 'this is the other path um and there you just use you know whatever your favorite SP client is from or swift client that's right yeah you could the way it would look going forward is that there will be a TP are for object storage you created TPR and then Rick will manage the object store in your own essentially or an s3 running in the cluster yep another question [Music] network traffic there very good question so so in general I'll say at the managing resources kubernetes there's a lot of facilities for managing memory and CPU and you know with labels and selectors and all policies you can set network traffic very different story typically with CNI you have you know one interface that you set on a volume or on a container and that's all you get you can play games with things like SRV io and others at the host level to do this you can play games with v-line colouring essentially you can mount multiple networks in a container even with C and I there is there are ways to do that right now we're taking the simple case it is not the fastest case the fastest case will be one where you have to color the traffic and prioritize it in one of the ways I just mentioned question yes yeah that was one of the pictures I showed earlier yep that's pretty much it I was going to show one more slide and then wrap up I'm happy to take any more questions so let me talk a lil I have it on adjust we have a bunch of stuff on the read media helps you walk through all of us I'll put this exact demo on there as well so just to wrap up kind of what the storage operator is and what it's doing essentially it's defining desired state this is what I want to happen in my cluster through third-party resources and you can take those extensions to kubernetes just like there is a deployment and a service and all those concepts that are in kubernetes we now added a cluster or storage cluster in a pool an object storage and all those those are essentially concepts that are now extending kubernetes the operator tries to achieve this desired state the you know the will of the admin the desired by watching both what's happening in a network what's happening in the cluster and watching the actual desired state and it's sitting essentially in the loop just like other controllers trying to make actual state approximate desired state that's the role and it knows a lot about the storage cluster itself so it's doing it in a way that is very you know unjust understanding of the fact that it's a stateful data and needs rebalancing and you need notes come out and all those deficits and then the last part I'll say is that it's not on a data path can be offline for minutes not everything will run but when it comes back up everything is everything's normal and it can run in an H a pair to a deployments if we need it to and finally if you want more info go there Lucas here is doing a talk tomorrow that has showing cube Adam and rook on raspberry pi do not miss that talk should be really cool it's tomorrow at 1400

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 9,057

Rating: 4.9407406 out of 5

Keywords: CloudNativeCon + KubeCon Europe, CloudNativeCon + KubeCon, CloudNativeCon + KubeCon Europe 2017, KubeCon Europe, CloudNativeCon Europe, CloudNativeCon 2017, KubeCon, CloudNativeCon Europe 2017, KubeCon Europe 2017, CloudNativeCon, CloudNativeCon + KubeCon 2017, KubeCon 2017

Id: 6p0GKjrYzg4

Channel Id: undefined

Length: 37min 10sec (2230 seconds)

Published: Mon Apr 10 2017