Kubernetes Storage 101 - Jan Šafránek, Red Hat & David Zhu, Google

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right we're gonna go ahead and get started I'm Tauron I'm from Starr I work on open policy agent but today I'm introducing David Zhu from Google and Yahoo phrenic from Red Hat and they're gonna talk about kubernetes storage 101 and break down some really important concepts for you guys so go ahead all right hi everyone thank you for coming to our storage 101 talk so let's just get started then so first of all kubernetes I'm sure since this is the second day of the conference you already know what kubernetes is but basically you know it's like a container Orchestrator but we like to think of it as a pod Orchestrator because it actually works on pods which are collections of containers right and so as we may know containers are generally stateless they're cleared on exit but this is unless persistent storage is used so that's what we're gonna be talking about today so here is an example of a pod and this is actually a stateless pod right it's just got a my sequel image and notice that there's no like storage related concepts in here and what happens is that this pod actually on exit the entire database is cleared and pods actually are quite ephemeral they can you know go down for any reason and they can go down because of preemption so another pod with a higher priority comes in or they can go down just because the node went down so this my sequel database is not going to be very resilient so that's why we introduced some storage concepts in kubernetes so we have these three main objects here on this slide the pod I just introduced and the two main storage objects that we're going to be talking about one is the persistent volume claim and this is the applications request for storage so the user will put this on the system to request for storage and the persistent volume this is the kind of the admin object for storage so this is how the admin will create a pointer to actual physical storage and it will contain all the like gory details of our Alang storage and and these two objects kind of bind together and don't worry we'll talk a lot more about these in the future in detail I just wanted to introduce them right now so you might be asking okay so if I want storage I just want a volume right I just want a disk that connects to my container why do I need two objects why do I need a persistent volume claim and a persistent volume well by separating these concerns the concerns of the user and the concerns of the admin on the other side we're able to get both application portability as well as a little bit more security right so let me talk a little bit more about the portability side since the persistent volume claim online contains very basic information that the user workload requests of storage so we'll talk a little bit more about what is inside it later but it only contains very basic requests you can make these types of requests on any type of cluster on any type of underlying infrastructure right and the persistent volumes which the admins create that are not portable these ones are different per cluster based on your underlying infrastructure when you create these Christa's and volumes they contain all of the details about the underlying storage like for example if you're using an NFS storage underneath this would contain things like your credentials and the IP address with where to find this NFS and the security concern is also you don't want your users to be able to kind of create these persistent volume objects that reference underlying storage because generally you don't want users to be able to just point to whatever underlying storage they want to write in the case of NFS you don't want them to just put some random IP address and find whatever okay a third object that we'll be talking about today is this storage Clause object John will be going into more detail about this later but basically what you need to know now is that this is for dynamic provisioning it's a collection of persistent volumes so if we take a bunch of persistent volumes grouped them kind of by class this goes all under one storage class type object and we'll also use this to provision things dynamically alright so in a little bit more detail I'm gonna be kind of role playing as the workload or the user in this case so I'm trying to put this my sequel pod on on my kubernetes cluster and I want it to actually persist data instead of you know when it dies because it's an ephemeral pod I don't want the data to go away so I'm gonna persist it and I'm gonna use a persistent volume clamp so we can see here that in this container we have some volume mounts and we've specified the data volume and we want it to go into this path into the into the container okay and we've also defined in the same yeah mol file actually the what this volume is and we've just defined this volume as a pointer to the persistent volume claim that were that we're gonna reference okay so what does this persistent volume claim that we're referencing look like here's the name that we're referencing and as I told you before it's actually very simple right it actually only contains the most basic information that we need for our storage so we we only care about you know we don't care how our storage is given to us or what underlying volume we have as a as a person putting this workload on to the mate on to the cluster all I really care about is the size things like what type of file system or whether I want block or the access mode right so there are many such parameters but basically they're just the ones that the workload cares about okay so yeah so for example in this one I want one gigabyte of storage and I want it and read write mode okay and I don't really care about the rest here so in order to create this claim on the cluster you just do a cube total create - F which is for the file and we have the claimed animal that we just showed earlier and we can also get the PVC and this gives us some more information about the PVC on our system okay we can also create our pod and we'll see that since the PVC has been created it has something to reference and the pod will start running okay a quick debugging thing for PVCs if you just get the PVC sometimes you'll note that it's pending and you won't know why it doesn't note how this doesn't actually say anything about why the plot of the PVC is pending really important to know describe your PVCs okay describe all of your API objects when something goes wrong you should think cube control described okay so when we describe this we get some events at the very bottom and this actually lets us get into kind of what the actual issue is here so the actual issue here is that our storage class is not found right so cube control describe and now yawn so participant volume is the first object that is for administrators to play administrator in this talk the persistent volume is just bunch of better metadata and pointer to the rear storage compartment if supports I think 15 different volume backends right now all of them are hard-coded in kubernetes so if you want to introduce new back-end you need to go to kubernetes code and write it there and [Music] the kind of declared stop on new volume plugins and instead we provide two extension mechanisms how to connect kubernetes to a different storage back-end that's not listed here we provide old flex and new fancy CSI container storage interface so you can connect kubernetes to a storage back-end that kubernetes doesn't know anything about it if you look at the details about the metadata kubernetes needs to know the capacity of the volume how the volume can be accessed and to which storage class it belongs so this is some cheap storage on my old NFS server running somewhere in my infrastructure so I created a stretch class called cheap and the last bit the sort of mandatory is what should happen when this persistent volume is not needed when Burnette is is done with the volume and it should do something we will cover that in detail later so this person volume is just pointer to some storage perhaps outside of kubernetes and the last bit that or so created by Christ our main storage class it's basically a collection of persistent volumes that share some common characteristics like speed so I can add fast storage class and slow storage class or price so I can have expensive storage class and cheap storage class or I don't know replication characteristics or I can have safe storage class with free replicas of data or unsafe storage class with just one replicas of the data it also contains parameters for dynamic provisioning that is used when kubernetes when there is no available persistent volume but kubernetes needs Sam so it reaches some storage back-end and provisions the data the world data volume automatically it needs to know whom it should talk to it's the provisional parameter and you can find the exact values you can put here in either in kubernetes documentation or in documentation of storage back-end if you use CSI or an external provisioning support and the provisionary usually needs some parameters that will provision a persistent volume of this given class so this class is called fast and these parameters will provision the fastest volume is available on AWS the parameters are different for each storage back-end that means this object is not very portable if I take this this run from Amazon I can't take it on GC it will not run there I will need to provide different provision error and different parameters and each cluster I strongly recommend should have one default storage class the four storage class is a class with this magic notation and what happens is when user like david creates person volume claim without referencing any specific storage class he will get the default one so in this case if david creates a claim without any storage class you will get the first one the persistent volume can be created in two ways the first one is dynamic provisioning at it you should use that in your clusters where administrator just just creates a storage class and when user creates a persistent volume it either references concrete sludge class or no storage class and gets the default one and kubernetes then goes and creates a volume on the storage back-end and a persistent volume object in kubernetes pointing to it kubernetes then by binds the persistent volume and persistent volume claim together and nobody can use the persistent volume except the user through the claim nobody else can reach the data at least from kubernetes and user can finally run samples using the present volume claim the other option we support is manu provisioning where i as administrator can create a persistent volume object in Burnet is pointing to some storage outside of kubernetes for some legacy data that are pre-populated and i want to use them in kubernetes and user creates exactly the same persistent wouldn't claim like before referencing some storage class and kubernetes will find the existing volume and bind it to the claim important note here is that kubernetes always looks for existing volumes first and if there is no existing volume that would match the claim only after that it goes and dynamically provisions the that means if I if this is some pre-populated data some voting with repopulated data I should probably create a special storage class so somebody accidentally doesn't create a persistent volume claim requesting basically empty volume and accidentally binds to the volume with the data so this is really for like brown few scenarios where you want to use some old data in kubernetes you should use dynamic provisioning for the rest your life will be much easier and when user can create support you second delete a pot or something can delete the pot the data are still retained user can create as many pots as you want and still use the data that's fine and when user doesn't need the data he deletes the persistent volume claim that's the signal to compare Nettie's I don't need the data do something about that and what happens is encoded in persistent volume reclaiming policy field in the persistent volume I showed you before we support three policies we support recycle we support it only on an NFS a kubernetes basically deletes all the data from the volume and put it back to available state so it's available to a different claim it's kind of clunky it doesn't work well with different NFS servers and so on so it's deprecated it may work for you but accept trouble expect troubles the more useful policy is delete when kubernetes just goes to the storage back-end deletes the volume there deletes the persistent volume object in kubernetes and the volume is deleted or you can use policy retain we're just kubernetes puts the persistent volume aside nobody can bind to it because there can be I don't know qualified numbers on on the volume but the data is still there and is faster administrator can reach the data or bind it manually to a different persistent volume claim if David comes ask nicely I can give him the data in all these three cases the data are lost for the user so the user can't access the data it either can go to me and ask nicely or the data are deleted so be very careful and DirecTV sees you will lose your data and finally highest administrator can delete persistent volumes manually it could be useful in some cases but again be very careful you are dealing with some data don't make users angry important note here if you delete persistent volume in kubernetes the object it doesn't delete the data in the storage back-end you must do that as a separate step so now that you know all this kubernetes objects is the time to show you how to use them in priority right so we've learned about the pod the PVC and the PV but actually we don't recommend using pods in production environments and that's kind of because pods are not really for end-user production right there kind of ephemeral I've kind of I alluded to this earlier right they can be kicked off for many different reasons and when your pod gets kicked off of your machine you actually don't have access to your workload or your data anymore so if you're running the service on that pod you can no longer hit it when your pot is gone so how do we fix that well we have some higher level objects in kubernetes that can help you with your with your issues here we have the deployment which basically just runs many replicas of your pod template so you give it a pod and it just Eck's many of them right and when your pod gets deleted for any reason like your someone comes in and preempts your pod from from a node the deployment will actually go and recreate that pod for you so if you want three replicas you're kind of always going to get three replicas eventually we can scale this up and down we can have more less but one of the big caveats with the deployment for storage especially is that all of the pods in your deployment share the same persistent volume claim so this is kind of what it ends up looking like you have your deployment which spawns multiple pods but they all reference the same persistent volume claim and as we know from the previous slides that means that it will reference one persistent volume which means we'll get one underlying storage right and you might already have some clue as to why this might be a problem right so all three of these pods are writing to the same volume writing and reading from the same volume so unless your workload is specifically designed to do this to write and read to the same volume it's very likely that they're going to overwrite the data of each other right and this generally causes applications to misbehave in different ways we've seen crashes there are a lot of cases where if your volume is read right once your pods will not actually come up on a second note or a third node because you know you can only attach them to one node so you could have a lot of issues with a deployment like this while using staple volumes but you know there is probably a case where you do want this and you have to be aware that this is exactly what's happening right so if that doesn't exactly work well we have another type of high level object for you to use which is called a stateful set so this does a very similar thing in that it creates many replicas of your pods it has all the same kind of it does similar things where it goes you can scale it up and down when the pod dies you can put a new one in but the very the main difference here for store specifically is that each pod will now get its own persistent volume clamp and the way we do that is in the object you actually specify a persistent volume claim template and that templates out the PVC and each pod will get like kind of a stamp out of that template right so this is kind of what it ends up looking like in a three pod case we have three pods here and three PVCs each one connected to a pod right so in this case each pod gets its own volume and this is generally more like a good starting place for what you want right but unfortunately kubernetes doesn't really handle your application needs so when you're using a staple set it's very important to keep in mind that you still need to come up with the mechanism sort of to have these pods be aware of each other and be in the some same member set so that they can do leader election or run in some sort of active passive active active mode but kubernetes does not do this application specific synchronization between your pots right so you can use a stateful set as a building block for your workloads but in the end you still need to kind of determine how these pods interact with each other and who gets to serve data who gets to write data this all needs to be determined by the application okay and now yon will talk a little bit more about some storage features in kubernetes to be accumulated a lot of features this is not a complete list I'm going to present this is just very high-level overview what you can use and you can mix and match all the features together and build your stateful applications so the first feature is topology topology over-scheduling until now we sort of expected that if you have persistent volume object in kubernetes we expected that the volume is available to all nodes in the cluster if you ran a cluster across multiple availability zones in cloud that's usually not the case a persistent volume is then available only to subset of nodes kubernetes can know about that using topology over scheduling and schedule pots to note where the data is accessible and we also introduced something we called delayed provisioning where if we create a if user creates a persistent volume claim it's not provisioned immediately because it could be provisioned in a wrong zone but we wait until a pot until user creates a pot and then scheduler looks at the pot requirements like it means a GPU it needs eight cores it means five megabytes of memory and the claim it needs like I don't know ten gigabytes of storage and schedules both the pot and the person were inclined to the node to a node that has enough storage and has enough CPU power and GPUs so in reality you can find you can create a person Emporium claim and find out that this pending and the reason for pending is that it is waiting for the first consumer and after you create a pot that uses the frame or you create a set of a set of course not a pot only after that the claim is by gets bound so don't panic if your claims your are pending this may be the default mode in your posture especially if you run across multiple availability zones there is a talk today later today Michelle is covering it in a bit other feature we have local volumes if you have some spare hard drive or SSDs attached to your nodes you can use them as persistent volumes of course you get super extra speed because you are using local storage but the trade-off is that if the hard drives is broken or the node is go broke your data is lost so maybe that's what you need but for generic storage be aware this is dangerous this could be dangerous it's useful mainly for caches and this kind of stuff you don't really mind if they get lost until now if you use the volume in a pot it kubernetes created a file system on it and mount it in inside the pot there are some workloads that don't like it they won't extra speed or they want to handle the block device by themselves so now you can use robot devices and you can get a road block device inside your pot and you can do whatever we want with it kubernetes won't touch any data on the volume you can also resize your volumes if we support only expansion you can increase the size you can't shrink it currently what we have as better feature is of fine resize that means that kubernetes resize the volumes but you must stop your pod first so if you have a stateful set you scale it down besides your volumes scale it up and new pods will have bigger volumes we are working on online resize that means that you won't need to resize to stop the volumes at all we have it as alpha you can try it and see bug fix bugs and bug fixes in line volumes initially it was not on flood on the slides because it's damn dangerous but you'd need to use persistent volume claims and persistent volumes at all you can reference a volume directly in a port we call them in line volumes they are in line in the pot so this port runs test container and the test container gets mounted this Amazon volume this makes the pot completely not portable you can't take the pot and run it somewhere else because the volume is not there but maybe that's what you want you need to just test something or run a job it is usable what more usable for inline volumes are ephemeral volumes you already maybe you already met them in kubernetes we used config maps and secrets and Denver API where you can get a configuration or some secrets user names and passwords or SSH keys as files inside your pods you can create API object called concept map or secret with which are basically key value this is a key value store and for each key you get a file on a file system in inside your pod and the value will be saved as data in the file this is useful for distributing user names and password or I personally use it to distribute SSH keys these volumes are not persistent you basically kind of write to them because it gets overwritten by the value from API server and last feature is container interface so yes we also have some new features to talk about some of those that Yann talked about when you as well but we have the container storage interface which is an industry standard that kind of enables storage vendors to write a storage plug-in once and deploy it anywhere so a lot of different container orchestrators have implemented the container storage interface and they're all going to interface with the same drivers so kubernetes as well we've implemented support for CSI and we have a growing number of drivers in the ecosystem I believe it's over 30 now that we're aware of that you can use in kubernetes so from the user perspective there's actually no change whatsoever you still use the pods and PVCs that we've been talking about as usual but this adds some extra work to the cluster admin which written up here but not only extra work it adds a lot of benefits as well right so the benefits of the container storage interface are going to be well it's an extensible plug-in framework for your storage drivers right so now these plugins are not in the system itself if they crash or something you're not going to bring down all of kubernetes with it they're the vendors are going to be able to push out more frequent updates to their drivers that you as a cluster admin can pull in much quicker as well as many other benefits that come with having a plug-in interface instead of what we had before which was all the plugins being built into kubernetes another feature that we're actively working on that is currently alpha is volume snapshots so this is a part of the container storage interface basically it lets you take a snapshot of a PVC at a point in time and we get the snapshot and then we can then provision a new volume based on that snapshot and and use it later in a different pod or the same pod or something okay we have one final feature to talk about which is CSI migration this is still also in alpha but this is a push to kind of deprecated sort of the entry plugins the API is going to stay if you're using entry plugins today I want to make it very clear that you can still continue to use those in the future but what we're doing is actually removing all the backend implementation and shimming that to the CSI driver so as I mentioned the container storage interface before 30 plus drivers more frequent updates better security guarantees we're shimming the back end to that so you can still use the same entry plug-in interface okay so to summarize the main takeaway that you should have from this talk is this image right there are there are two main parts of the storage system in kubernetes right we split it out into a Schrader's and into user workload and we have the persistent volume claim for the user request and we have the persistent volume for the administrator for all of the gory details of your underlying storage okay and also we have the storage class which kind of groups persistent volumes together and we can use that for dynamic provisioning these are the main concepts we also have those higher-level workloads deployments staple sets and all of those features you can look up documentation for online so that's not all we want to if you're interested in storage we have a ton of talks about storage in at cube con about half of these actually have already finished but I want to call out if you're interested in contributing or interested in more about what we're working on currently I would recommend coming to the intro deep dive for the kubernetes storage sig and of course if you're interested in any of these other specific storage related tutorials or talks you know take a picture of this look it up and you'll find it on the schedule you can reach out we're the kubernetes Storage sig we have bi-weekly meetings we run slack and we have a mailing list so this these slides are online on the schedule you can download them and click on these links thank you so much for coming and thank you for listening again my name is David and this is Yan we're just gonna open up the floor for any questions now hi um with things like snapshots that might or might not work here anyway uh as a user of the closet I can find out if snapshots are gonna work on the particular cluster I'm deploying them so the question is if a user is there any way to know whether snapshots will work for the for the storage that I'm using it's not anywhere it's not in the storage for us it's not in anywhere you just try and do it will fail it should be in the documentation provided by your Questor administer admin your closed or admin should provide that documentation yeah any any bugs you want to file any issues you have you can file them on kubernetes kubernetes and tag sake storage in them all right any other questions I saw Sango so can we make the put to the local volume to crud stray so I mean when we create a percent volume inside in the local and then light something and eventually copied to the crowd switch like we Gugu or a depressed and then my question is can we mount the copied volume as a new Pashtun poem in the crowd I'm not sure I understood the question from if I could try to rephrase were you asking if a pod writes to a local volume and then we copy that data to a cloud volume then can we then mount that cloud volume into a different plot or back into the same pod in any pod okay yeah so so I think we can just break this down into two separate steps right the pod writes to the local volume you just use all of the kubernetes concepts we've covered to mount that and use your local volume copying the data to the cloud I think that's a separate operation that you would do outside of kubernetes kind of right and then mounting that cloud volume to a pod again is just using the same basic concepts to just create the pv for that cloud volume create a PPC that then references it and then consume that by your pot does that answer all right any other questions hi I would like to ask so we have a particular situation we have an NFS storage that we want to present to a governess cluster but it has its very particular every directory needs to be accessed only by specific namespaces so basically we want a namespace to be able to mount one and only one directory within this storage so as far as they know persistent volumes cannot be bound to namespaces and we cannot create a storage class because for host path it's not supported so for now the only way we can control this is using port security policies and preventing the prefix of the path to each user that is connected to each namespace but this is a very complex solution for a very kind of simple problem so is there an alternative way to make sure that a particular persistent volume is mounted only by a specific namespace there is no easy solutions there is no easy solution resistant volumes are sort of global for everybody we don't restrict them by namespace but what we do restrict our storage classes so maybe we didn't say it during the talk but you can restrict a storage class with quota and give access to a storage class only to one namespace that's possible but then in your case you need to create a namespace a class for each namespace I don't know how many namespaces are you going to have if it is view that it's probably feasible if it is thousands maybe you need some operator that will pre create it for you or something like that all right we're out of time for questions now thank you so much for coming and have a great rest of the day [Applause]
Info
Channel: CNCF [Cloud Native Computing Foundation]
Views: 2,968
Rating: 4.939394 out of 5
Keywords:
Id: _qfSzrPn9Cs
Channel Id: undefined
Length: 37min 25sec (2245 seconds)
Published: Thu May 23 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.