Deep Dive: Rook - Travis Nielsen, Red Hat & Alexander Trost, Cloudibility

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome to the rook deep-dive session i'm travis nielsen one of the maintainer Jean the rook project I've been with the project since its I was originally created a couple years ago we open sourced it at coop con Seattle when it was a little bit smaller it's amazing to see how much it's grown and the project is grown and it's great to be a part of it earlier this year I joined the Red Hat team got to work with the the Ceph team so all right so what is rook so to get an idea of who's in the audience so who all was able to make it to the intro to rook a couple of days ago so a lot of you that's good and Jared also he's here he gave it and who who has actually run rook before in test or other cluster all right we got a lot of users great good to see all of you here and who's running in production okay yes we do have a good collection of production or near production users tomorrow okay perfect those are the stories I love to hear that it's you know that it's working for you that the production is ready for it and you know that you know earlier this year well September we graduated to be an CN CF incubation project thanks to all of you the community and knowing that it's working in production so what so what is rook as I've talked to people in the booth the last few days I've kind of refined how I'm talking about it and sort of the one-liner I came up with is so it's open source it's and it's a storage couldn't control plane and it's for kubernetes so buy storage control plane what that means is it provides operators and CR DS to orchestrate or otherwise configure and manage storage backends so we'll get into more what that means but basically think about we don't provide the implementation for the storage we bring the storage platforms to kubernetes in a kubernetes native way so I'll go briefly over this that you've already heard in the intro section but so an operator like you heard in the keynote this morning the operator applies desired state into your cluster so you say here's my desired state the operator goes and makes it happen and the way you do that is with the CR DS let's see our DS or your standard kubernetes the ml file it looks like any other kubernetes type and but you know ruk defines those CR teas that are specific to the storage providers and so in addition to that it look really is more than operators and and CR DS so a framework and this framework is it's still and it's in its youth but we're evolving it to make it even easier for storage providers to join the kubernetes today's native landscape so testing efforts anyway all these things there's a lot of very specific needs for each storage technology and among those you know automated deployment bootstrapping configuration upgrading migration and every storage platform needs to do these if you're going to trust them with your data so what are the storage providers in Rooke today so we we have kind of what I'm thinking of as two categories of storage providers we have those that provide the data platform itself you know block object and file storage so SEF Mineo NFS and they exempted Jeff s fall into those that category the databases then are cockroach DB and Cassandra which which might build on the other data platforms so if you want to learn more about storage providers and adding adding them to rook Jared has a presentation this afternoon at 4:30 come come join that he has new code running I hear that I haven't even seen yet and then right after this session we have a meet the maintainer x' at this main CN CF area in the in the booth hall okay so for the deep dive what I wanted to really spend the time on is SEF being so Steph was the first storage provider that that rook really set out to or get straight only recently we been adding the others and creating this framework so SEF so who's familiar with SEF a little bit or more okay a good good portion that crowd great I don't need to explain the details of it though Steph has a number of demons they have to be orchestrated they need to be running in and they need to be they need to be always running so that your data is available and safe so the the general orchestration approach we took for rook was the operator it's going it's it's very familiar with what's F needs to do so Steph needs to you know creates ethics keys and generate it we need to create crush map we need to do other things to you know with placement groups and if you've run rook but not SEF hopefully you haven't had to hear about any of those terms because they're you know they're difficult to manage at the end of the day so rook takes care of those those for you the operator then after talks to staff or Wallace talking to SEF it's creating kubernetes resources like deployments to manage life cycle of each SEF demon you know doing things like creating Anette containers to generate the SEF kampf and then running the SEF daemons directly with with a version of SEF that that you want to run and I'll talk more about that versioning here soon and then the last thing that the operator does is it really manages the health of the system if your SEF mons go down kubernetes doesn't really know if you know if they're really down it only knows if the pod stops so the operator does extra work to say hey are your SEF mons in Coram and if they're not it'll take action to start up a new deployment and get a new man running okay so that's the approach start you know configure your storage in a very storage provider specific way and keep it healthy so what does a CRD look like so in our release our dot nine that we just had a few days ago we declared the SEF CR DS as v1 so we're excited about that that means that users are trusting it while it was in beta in production even and we're yeah we're just excited that that that's going to keep rolling out and that means you know they'll always be backward compatible upgradeable and and we believe you know that that you can do what you need to with these well be adding new features that doesn't mean they're locked in place that will be backward compatible always so just to look at a few of these properties here at the CRT dated or host path here on the right so what this is is some of the SEF demons require persistence its storage of course if you're if your node reboots if it just goes down momentarily and comes back you need to make sure that data is persisted so the host path is where we store you know basic configuration and persistence of the metadata it can hold data as well but the data is generally held on your raw block devices mounted on on your local nodes the Ceph version let me skip over that one that I've got another slide for that settings for dashboard do you want the Ceph dashboard enabled or not and network yeah what kind of network are you running in do you need host network access or are you running on a virtual network or with the kubernetes CNA so the the versioning so in the the latest release one of the big changes we made is that rook is no longer tied to a specific version of SEF so rook was tied to Stef 12.2 which was luminous before this release but now you can say I want to run rook V dot 9 o die 9 but I want to run SEF luminous or SEF mimic or SEF Nautilus and so the version of SEF is independent of what rook is running this is very important because people care a lot about their data plane they need to know it's safe and they need the admins need to decide when they want up creating their data plane so that's the purpose of that separation so you can control when to upgrade the data plane and yes you can up to upgrade independently from rook and the support lifetime really then depends on the Ceph project not on the rook project for a specific version of stuff the the images then that you can run force F they're just in the Ceph docker hub repo and you can go to the link there and see what what all tags are available alright so what are the differences it's the operator actually behaves differently between each of these versions of stuff so aluminous and just these are the main examples right now and over time this difference these differences could grow in implementation but so right now with luminous the operator will start up the SEF dashboard running on port and just on port 80 it's it's a read-only dashboard it doesn't need anything special or less special in mimic then now it's a configurable more feature-rich can dashboard in SEF it needs to run in HTTP so rook will generate a self-signed cert on your behalf and generate an admin password and store it somewhere safe for you so that you can have a working dashboard right when you launch and then in Nautilus which is still pre-release it's you know it's planned to be released maybe the next couple of months rook will allow you to run it we just need this flag to say allow unsupported true and then we'll do things like enable the new Orchestrator manager modules which ya have have a plug-in model and work with the dashboard anyway we don't have time to go into that so a no dot nine the upgrades of rook are mostly automated now it's it's a lot easier than ODOT eight who's who in the past has upgraded a rook release a few of you probably okay there's we have a big long guide a lot of manual steps those manual steps are much shorter now there's really just a couple of manual steps and I'll I'll get into a demo of what the upgrades look like now so it's it's really about the operator managing that complexity to you because or for you because if you have to manage any anything manually it's it's bound to fail and then your date is at risk so what it looked like to upgrade rook so you tell with this command at the bottom of the screen here you can tell rook what version you want to run it's really just the image of the rook operator that's running so you say coop cuddle set image on this kubernetes deployment and here's the new image version so rook slash staff vo9 is what you would use to upgrade from ODOT eight currently so this to be clear so it updates the rook operator and the related orchestration pieces but it does not update the data path so after the upgrade you'll still be running luminous just like you were running before or not on a date and okay so the Ceph upgrade what does that look like this will this version is specified in the CR D and you can tell it the image of you want to run like we already saw and I'll I'll go into a demo of this now but in the future as the complexity of the upgrade might might increase or maybe it won't the operator will manage whatever steps need to happen if the data plan needs to be upgraded or data migration we don't currently have any need for a data migration between these versions of Ceph but it's possible in the future depending on what changes you're made all right so what do I want to do with the demo now is first of all I'm going to configure luminous and then update that from luminous to mimic and we'll watch what how the operator deals with that so let's see if I can find my console all right hopefully that's big enough to see so the first thing I want to do is bigger okay I really should get a better terminal here this is all right over here I shouldn't be doing window management during a demo all right so Oh to do a demo efficiently I thought I better keep using my same aliases I always use so let me just show you those real quick so you're not totally confused about what I'm doing so here my alias is Kay for group cuddle KN to do it in the rook namespace and etc a command to look at the operator log all right so though that's what I'm running all right so let's start the cop the operator first assignment I have my V o.9 manifests here so I'm gonna say create the operator and before I hit go on that I wanted to show down here what resources are running so no pods are running you up for the operator so after I hit go on that it's going to create a namespace for the operator and then it's going to start start the operator deployment we should see a pod any moment here after you operator itself starts it starts up a couple of other demons that need to run on each node there's the agent which manages mounting the storage from your PB sees with our flex driver ORS and in the future that will be moving to CSI the discovery pod this one discovers hey what what storage devices do I have on each node in the system make sure I discover it and then the operator you can reconcile what it's going to configure on each node okay so the operators running nothing new here from the intro so let me run the cluster and I'm going to show the other namespace so we have two namespaces ones for the operator and ones for the cluster so this is starting up the SEF cluster the first thing it does is it detects what version of Stef you want so it should have detected luminous and it's going to start creating sechs Kies store them in kubernetes secrets and any moment now we should see the first Mon pod there we go so each Mon is going to go through you see it's in the init stage so it's going through these init containers as part of that pod definition that rook generated and then finally you know each Mon started one at a time make sure it's in Coram before you go to the next one so after it gets to the Mons then it will start up the ACEF manager which is where the dashboard and the Prometheus modules are running and then it'll get to the OS DS so with the OS DS we get this prepare pod you see here at the bottom so this one it goes out to the node and it says ok you told me to configure these devices on this node and for my configuration I just told it you know configure all devices just whatever you find consume so it what it should be doing is it found three devices SDA SDB SDC I think they are and then it tells F hey configure each of these devices with OS DS and after those OS T's are configured it goes back to the operator and says ok I'm done now go ahead and start a pod for each one of these devices and now we see those being created so that each OSD can run independently and if one pod fails it doesn't affect other OSDs running on on the same node alright so we have the basic cluster up and running and to show I guess we're on Luminess what I want to do is is bring up the dashboard so - to expose the dashboard on my on my Mac host I've just got to open up a node pour it into the mini cubm so let me I need to create a dashboard on HTTP so since Luminess is on HTTP now let me get the port that that's on and the mini Q by P and then I'll go to my browser okay so the here's the IP and the port the dashboard is running on is 31 374 whoops that's from a previous run-through okay so we're on this IP with 31 374 same port okay so here is the the Ceph dashboard running on luminous so it's you know read-only we can't do a lot with it all right it's running so now let's go ahead and update to mimic I don't like this dashboard I want the latest and greatest dashboard so now I will go into here and so in the rook namespace I'm going to edit the Ceph cluster rook SEF so the SEF cluster Ciardi called rooks F so it opens me up in this little editor and we see the image so this is the image currently specified and I'm going to change this to be 13 just to make it simple I'll use the top-level tag so as soon as I save this CRD the operator is going to wake up and it's going to say oh you want luminous now I mean sorry you had luminous now you want mimic I'm going to go through and update each of your demons that are running and any additional configuration that's that's needed for that version for example running the mimic dashboard as a secure site now so let me save this and we'll see what starts happening on the bottom with those pods okay so immediately see the a pod that says I'm going to detect the diversion it's already done it detected mimic and now it's going through the Mons the moms are always first to get updated one mile at a time updated to mimic so while that's going I'm going to look at what's happened with with them on so describing them on if I look at the image that's running so now this Mon that I they picked to describe the it's pod for it's running 3:13 okay so the Mons have all been restarted now and it's confirming okay do I have quorum and it looks like it does because it continued now with the manager pod that restarted and what it's doing now is configuring the dashboard with the with the self-signed certificate and then after does that it will proceed to the OS DS and and restart the OSD emic okay so the OS DS will be restarted one by one to reduce any disruption from restarting services okay so now to make sure we have mimic running now what I want to do is start at the dashboard see it's on HTTPS now so I have a separate service I need to create so my laptop can look at it in the browser so dashboard HTTPS alright and let me look at what which can get service so now HTTPS if I tried this try this old port now it's the dashboard isn't running there anymore but now I need to look at this secure port okay so if i refresh I can't get the the dashboard on HTTP anymore it's just not there so let me put HTTPS with the Newport and okay so it's a self-signed certificate so we get this big red warning don't connect to the site well I'm going to anyway for the demo thank you very much okay and here's the mimic dashboard rook generated an an admin password so it's kind of small so this when it generated the password it put it in a kubernetes secret and i need to go get that secret so i can log in so our dashboard help topic it's you know getting a secret out isn't exactly something that's easy to type so let me copy it from our documentation you know get the secret with the dashboard password and decode it from be 64 okay so that's a big long ugly command and we get this password out of it all right if i go back and login now we have the mimic dashboard alright it worked so now we have the cluster is upgraded we have three monitors running all the OSDs are running and you can go and do things like configure pools and block devices file systems from this UI now and you know with each release of Ceph this you know this dashboard gets more feature-rich okay so now one more thing i want to make sure i show a couple more things we have a the toolbox if I could you're right I'll create the toolbox and so what the toolbox pod is just a convenient place to run SEF commands so I'm gonna connect it to this and just show you how simple it is to look at this F status so connect to this we're inside the toolbox now which knows how to talk to SEF so I say hey Steph give me the status and well we already saw in the dashboard we had the Mons and everything running and I've got an error I need to investigate after this but the cluster is all running all right great so so what happens with persistence now on the host so remember how we put the persisted configuration in barley Brook so let me just connect so I'm going to SSH into my mini qbm and just show you basically what's you know what's inside the host path okay so if I go into Varela Brook I see a directory for each of the mons now I only have one note here which isn't a terribly interesting cluster so they're all on the same node you know in any production environment the mons would need to be on different nodes and the OS DS yeah whatever devices you have there so the what's in the Mon directory is not terribly interesting but you see there's a data directory which persists the important Mon data and same thing for OS DS so these are a little more interesting where the osts are what are what's holding your data behind the scenes and I've configured blue store on top of the device so it created partitions for for the data so the block has a symlink over to that partition for the data the boost or database and the blue store right ahead log and other basic configure so that's in a nutshell where this where your date is where your data is stored all right so let me go back to the slides now and move on so there's a lot of options in the in the CR D that we we're running out of time on already talked about but Mons at the end of the day you need to have an odd number of them and you need to control are they on separate nodes so so usually 3 mons as a default it's tolerant of a single failure place them on unique nodes majority is always necessary for them we've let skip over this we've already talked about their basic orchestration and how we need to maintain quorum one thing I want to do is okay if a man dies let's see how it's how the operator recovers that so if I go back over here I'm just gonna delete one of them on deployments it's okay delete deploy rooks F Mon Hey alright so you're gonna see that pod go away now and at this point the the Mon is going to be out of quorum really quickly the operator you know does its health checks periodically I turned the the frequency down for the demo so within the next 15 and 30 seconds we should see the operator noticing the the Mon is gone and bring up a new one so if I look at the operator log we probably see that yep okay is not in quorum and the it's going to wait a little bit is this just a network glitch is this something that I need to recover by default operator we'll wait wait five minutes before it starts up anew and but yeah we should see really soon another one starting now and and here it is so decided yep it's dead it starts up Mon D and now now we should see quorum again and if I went back to the dashboard I should be able to see the new Mons so okay it's still starting up the dashboard isn't aware of it yet okay so it's coming it should refresh itself momentarily all right let me come back to this in just a minute and make sure that that's exceeded but the new Mon came up the old one was taken out of out of the quorum so OS DS there there are a lot of ways to configure your storage devices how do I tell work to go consume you know my my device is on my heterogeneous environment everybody has very specific needs for how they run it but overall you know what Brooke does it schedules a job on each node and then creates a deployment for each OSD when new new nodes and devices are added so the clustered rook can also can also add those to the cluster although currently it does require an operator restart to pick those up usually all right there are different modes for selecting OSDs so the first mode that I was using is automatic selection where on the right here I just said use all nodes use all devices just let go find everything and configure it for me that's easy but I might not want work to do everything so the next option is well the kind of the other end of the spectrum be as prescriptive as you want to be so don't use all devices or nodes set those defaults now you can go set you know which nodes and which devices by a name do you want to configure and work will only go configure OSDs on those devices now that could get a little tedious specifying all those and a big cluster would be very difficult so next mode is filtering so if you filter the devices you can use a regular expression here with the device filters so any right now it'll say all nodes use a device filter and and go go set those those weren't very communities native so that occur Bernays native way to do a similar thing is to use node labels so you can set a label on a node so in this example we've got role equals storage node and then in this CRD you can use this placement then that says hey for OSDs or only place OSDs on on nodes that have this label that says role equals storage node it's a bit of a big yeah Mille snippet but that's that will allow you to place the services therefore OS DS and that same filter or with note affinity can be used for all the demons the Mons and OS DS rgw and all those that's yeah more Cooper native alright so the next option we've got is sort of a performance optimisation so some people have like they've got a high performance they've got SSDs or nvm ease just one or two of those on a node and then a whole bunch of spinning discs so the way you can configure that is you tell okay my my metadata device whoops let me go back my metadata device is this nvme oh one and then you know pick up all the other devices like SD star and and put the data on those but yeah this gives me a higher performance cluster with Ceph put the blue store right head logon database on one high-performance device put the put the data somewhere else on the spinning of disks and one more option is if you really have high performance clusters you've got all nvme devices which would pretty pretty nice I want one of those someday you know envy means are so fast that they you know you can actually put multiple OSDs on the same on the same nvme device so there's this new setting OSDs per device to put say five on the same one so you can have your high performance data okay so another new feature we've got is our buddy mirroring that was we just exposed well it's been around for a while and stuff but ODOT 9 exposes this so let me go back to the demo and we're to go nope not there okay doesn't want to show up all right here we go so if I edit the llamo again so edit the SEF cluster SEF so by default we don't configure our BG marrying but here it is in here so our buddy marrying how many workers do you want three I heard of three okay so it's done so momentarily we should see some new RBD mirroring pods show up on the bottom half it to actually go marry or anything are be marrying is about replicating your data to another cluster it does require more configuration in the SEF at the cellar so you under the tool box and you tell Seth which which images you want to want to orchestrate and alright any moment I'm waiting for them to show up at the bottom there and now we can look at the operator log to see what it's doing and it's anyway it's it's thinking for a little bit alright but it always stays in a reconciliation loop it will come back and and make sure it happens sometimes it just takes a minute or two alright I think we're getting close to out of time so rgw you know this is what the CRT looks like you say create me an object store use replication or erasure coding to make that happen set that fast same sort of thing there's a CRT you create it and and the operator makes it happen I wish we had more time the agent I'll just keep over this so how to get involved you know go to our github now we've got a slack with an active community love to hear your questions make sure your issues are getting fixed always welcoming new contributions can meetings are every other Tuesday I look forward to having in those calls and I think that's it on it and it disappeared okay [Applause]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 3,947

Rating: 4.8730159 out of 5

Keywords:

Id: Mb7oiXQb1ZE

Channel Id: undefined

Length: 35min 50sec (2150 seconds)

Published: Sat Dec 15 2018