Tuesday Tech Tip - Intro to Ceph Clustering Part 4 - Self Balancing and Self Healing

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hi there welcome to 45 drives tech tips my name is Doug Milburn I am co-founder of 45 drives and a lover of South clustering and I the good folks here at 45.5 tech tips have once again allowed me to get in front of the camera for episode 4 on introduction to South Hey so right off the bat we talked about how a cluster strategy diverges from a single server strategy and how you get things like high availability an infinite path forward and capacity and performance right and we talked about minimum starting the sort of the the minimums to start a cluster and we talked about the fact that in sap we have basically a layer of software it's software-defined storage ok this stuff's all stored as objects that doesn't matter that that's completely transparent the stuff software through something called a crush algorithm manages all your servers in this diagram these are my servers thinking the storage bins the storage bins are no matter how many I get they're completely managed by the Ceph software there's a beautiful dashboard incredibly easy to use dashboard that you administer this thing through so on the hardware side of it just add servers yeah at hard drives going your dashboard plug them in you don't worry about what's in them totally managed by SEF ok we go up above that we say ok now I need to do something useful for my clients end of the day it's about providing storage to clients and how do we do that we create something in software called pools ok pools are defined by parameters that drives the crush algorithm ok we have one is the redundancy method and the numbers related to that and which is either replication or erasure coding and the other thing is failure domain we're limiting our conversation at this point the failure to main server why because the vast majority of our clientele they who go from people you know might have 10 or 20 by terabytes on the low end and are really into clustering for high availability up till we have clientele with many many tens of petabytes that are moving data in at many many gigabytes per second so we go right over the whole end of that but real interest is usually its redundancy at server level now if you're a mega organization I go up to Rack or data center level redundancies but most of our people it's server level Y server level gets you into basic high availability gets you have the ability for one of your servers to go down one or more your servers to go down and for your not to skip a beat not have a groundhog event it also a frees you up as IT staff to be able to do maintenance whenever you want middle a day no more coming in at night and weekend and ask people to close files and all that kind of stuff okay so that's that's where we are and we talked about these pools and so what I want to talk about some of the the magic things about staff about how it manages stuff but one of the first things I want to point out as something minimum cluster requirements okay we sell a lot of minimum clusters people set up a cluster why because once you set up a cluster you are on your way and you're now in a track where you never have to worry about running out of data you got a continuous expansion plan okay and you'll never have service interruptions through they never if you manage it right hey so but that minimum configuration Seth needs three machines and typically all the software that drives it is co-located in those three machines so you end up with a cluster no single point of failure in it and typically and if you're really interesting and just put redundant networking and do a little bit of bonding multiple network connections and you have just huge redundancy failure of a switch failure of a server you're completely insulated against it okay so but this minimal configuration does all that but let's look at what happens in a minute minimal configuration okay so we start writing in SEF and I also got two pools set up here that I used in the last video I had pool number one which is two rep so data comes in there okay one chunk is my data the other chunk is a replica of my data so it's a two chunk system okay so in pool one they give me all kinds of nice colored markers so and let me at the board so good my data comes in and it I get a chunk number one is my data chunk number two is an identical replica that okay it gets stored let's say my clusters empty and they go okay let me pick the server that is the least full on a percentage basis okay and so it's any one of these so it goes into number one once it's in number one okay failure the main server it's gotta get put somewhere else it'll go to the the server again with the most available space but it can't go on the same one otherwise I'll lose my redundancy hey so that chunk will get put in over here okay so let's talk about the other share this is erase your coding two plus one so any data comes in from its clients to that pool to be saved the Ceph system says take the data break it up so if it got a piece of data break it up into two chunks okay now I create one other chunk which is my parity chunk okay that needs to get stored these need to get stored again based on the crush rules so the rule for this thing says now now they have three chunks go put them through there go find the server that chunk number one on that is gonna go down here because that's got the most space number two is gonna go down here doesn't matter and then my parity chunk is gonna go over there okay so again I'm stored and I'm safe and everything's good just know if I'm largely relying on a three chunk system note there's no flexibility and where they go those three things have to be put in servers there's only three servers there's three chunks to go in so servers will fill up equally okay so if you're going to set up a cluster and only run a three chunk pool on it you'd want to buy servers that are the same size why because otherwise you're limited you're gonna run out of ability to store stuff under those rules the minute your smallest server fills up okay really key point to understand but here's the really cool thing this all changes the minute I add another server so I know my servers I'm just putting those as sort of being storage bins let me let the width of them represent its capacity let's say I had an old legacy server if I want to start off my cluster and I'm gonna use let's say I'm using pool 2 only I'm using 3 rap then yeah let me just get rid of my pool number 1 I'm not gonna use that right now so if I go there if I just had 3 servers it's gonna fill up 1 2 3 next one's gonna come on 1 2 3 next one's gonna come on 1 2 3 it's kind of dull it's kind of boring works I got high availability but I don't have any ability but but I need I'm constrained I really want to put in symmetric size servers that be rotten paths to have to grow on righted be highly constrained what if I want to switch to bigger servers for better cost later and what if I had smaller service add legacy servers I want to plug into my cluster which is a thing you can do and it's worthwhile to have any decent legacy server that's smaller you can join me your cluster but you can't do that for your first three if you have three chunk ok because I need symmetry otherwise they just have wasted class now that you can't it's just to waste the extra capacity the minute I put on another server that server can be any size because now when if I come up with another if I want to store on that again where's it gonna go first chunk is gonna go in this guy because he's got the least in it where are the other ones gonna go well those ones are all the same so I get maybe two in there and maybe I get my parity over there ok what I want to store again my first chunk is going to go in there and it's going to strip it it's got different ways to distribute that data so it can use an asymmetrical server ok and and of course I'll kind of run out if I had a great big server there I won't probably won't be able to fill it up all the way but that's ok it gets more flexible more servers I put on there the better and oh yeah if I'm putting in data from pool number one which is a two chunk pool that's got more ways that you can put it in there so the more variety I have in my cluster the better it works which is pretty cool but that's the minimum configuration thing so let me go from here I'm going to talk about two other really cool things vote Seth let's talk about self-balancing I think let me go to my representation over here it's and I am going to say I've started to fill these things up in a serious way and I'm reading in I'm writing and reading and writing and you can get an imbalance on that and I'm gonna represent the amount of data in these things as a water level okay think of those tanks with water in it let's talk about self balancing so self balancing there's an engine and Seth that's able to look at how much data is on your servers and to do it in intelligent way knowing the crush what the data is and what the crush rules are for that data and it's just it's simple self balancing is just like this it's just like I connected a water pipe through there it'll drop the data level in the ones that are full and by shuffle them over to the ones that are less fault okay until it achieves you know it'll go to as closest balance it can achieve you may get in a situation you're constrained and you can't get perfect balance in it but it'll get the very best it can it'll balance that all out and you'll come to an average level and you have control over that it's it there's a flag for for moving allowing the system to move data if you do that self balancing of you if you turn it on it will allow data to move from one server to another and you can do it at a server level okay and it's used for example if I'm doing maintenance on a server I shut off data movement on okay and I can flip that back you might go to my maintenance and then flip my data move them back on one interesting thing if I wanted to add a new server to my cluster okay say I put on a great big whopping server on there first thing I do is shut off data movement until I got it in there and the minute I flip it on data is gonna rush from those ones over to that one to try to balance everything up but I actually have control I have the switch that I can control my maximum rate of data movement because you don't want that going full speed because it'll use up all your all your CPU bandwidth so so you throttle that back to something reasonable when you're way out of balance come back into balance you can flip that up turn it up a little more and let it go a little harder at self-balancing okay and and these are realized these are I'm talking a very very conceptual level and they are oversimplifications and this actually got much much more complexity when you actually have to do it but it just to understand it you know if you want to think about stuff and you want to think about architecting a system to you know we're adopting stuff and doing an initial system architecture and think about where you want to go it's good to know about these things so let's talk about one other my last thing I want to talk about in this video is self-healing okay I'm gonna go back to a very simple representation of my data and I'll go with this cluster that now has five servers on it okay and let's say it's really simple I have my pool one data which is is two reps so if this is two chunk data up here okay and I'm gonna represent this in pink so I got to chunk data that went in here so till there there there's file number one here's file number two happen to get put here file number three and pink land here okay and I could put it fill up as many of those I wanted so my green pool it's a three chunk system so I got one a goes here when B goes here one parity goes there so that's been my first the file in that system and let me go to my second one okay I store something else so my second file I got the first part of it'll go here because that's the least used they're all looking pretty balanced so they're just going to go here so there's the next chunk of that and my parity chunk so I call it to p1p okay so you got what I'm talking about here and that's all I need to do let's say disaster strikes hey and there is no sunspot activity and zap and rays come from motor space and a cosmic ray hits the CPU of this server so this server is suddenly struck dead ouch oh hey and normally if you were on a single server strategy you're going on I got I'm down I gotta go see what I can recover and I got to do it an emergency basis you go now you're soft system notifies you go in the dashboard you stop notifications gives you a text says yeah one of my servers went down you burn one of my servers went down yeah but I've set self-healing on so what's self-healing well you know this thing go back and I have a piece of and it can rebuild my data on other servers right and you go okay so what did I lose so on there I have file two in this pool okay there's a chunk of file two on that or a replica that but I have the original over there or one of the replicas there's no such thing as original they're just both replicas got one replicas there so I got to go and I know it was stored there and I know the server's down it's inaccessible easy problem I just go and I look okay which is my next most available server it doesn't matter I got three of them so I take file two I take that other chunk of it let me stay in my true to my color scheme here so I'll take that and I will reproduce that I don't have to move it over there but I just reproduce it why cuz I got it over there so I'll make it over there I'm now he'll die now and am faithful to my crush rules for that pool okay so that guy's looked after I got number three there's one piece of it over there good let me create another piece where I'm gonna create it and we'll look at that guy that's my least filled so I'll take that I will move him over there and I'll have chunk three over there okay and I have a piece number two a piece of file number two from this pool for my green pool okay so what do I have for file number two I have two parity here and I have chunk one of that I can recreate chunk two by using my parity math so I do that okay where I'm gonna recreate him which is the least full right there is it gonna go there new why not because chump - over there I can't put its parity in the same server so it says that one's out where else to have chunk to have two parity there that one's out so I got to pick the best out of these to me so no problem I do chunk two of file number two goes in there that's it I am healed okay so that's it that is conceptually oversimplified that is how a cluster can self heal that is how a cluster self balances okay and like this controls over everything except these things what do you need yet minimum cluster I need three machines I need three machines to have three copies of stuff software running I can't do any less than that now I can have you know no one replicated that that's a one chunk system I can have a two chunk system with two wrap and I can get to a three chunk system these the simplest one of those is is ec2 plus 1 if I mean to a 3 chunk system on a 3 server system a lot of my flexibility disappears it can't self heal there's no place to put those recreated though the recreated data it doesn't exist you can't do that and sorry you can actually recreate it but you can't do it in a way that maintains your failure domain AC you're constrained there so but once you get oh yeah and things have to be symmetrical right otherwise you got orphaned capacity okay however the minute you start putting extra servers on this stuff all changes so and particularly change the good as you put larger servers on there as you add up although it's still a value it in a smaller legacy server on top of that so anyway I really think SEF is just such a cool system that whole idea storage bins contain objects what are objects with chunks of data but you don't care about it why managed by SEF software and what you care about is these pools you create them you set up the rules you set up the failure domain level you set up the shares and it's dead easy okay and done through dashboard GUI and you do them as per your requirements why would I do that I don't know I might have archival data that I want really high really high storage efficiency good once my cluster gets up there I can create and race your coded pool once I get a large number servers I can do an EC with a big first number on it becomes very storage efficient not a lot of redundancy on it but it's pretty storage efficient and at the same time I could have I could have three wrap data because I got mission-critical data that I can never lose and I can do those all inside the same system and never again never be worried about taking a specific server up or down or anything like that and what date is on and who's using it I think that that is gone that is a thing of the past so it's just absolutely liberating software to find storage running on good solid enterprise level equipment is wonderful okay and we have 45 drives that's what we do we provide those solution and we will sell you the hardware we will pre configure the software for you pre install it and we will take you through every part of the process including working out a client level to share everything and we'll get it all up and running with you and you can buy any part that you want from us you don't have to buy full packages anything like that no license fees and yeah and a pretty good model to look at if you want to if you want to go on this or you can do it yourself and we're there for you anytime you need anytime you need to call us anyway thanks for watching this final series of our intro to Seth and yeah please if you have any questions leave them below we'll be we'll be maintaining that and give us a call or an email if you want to talk about your storage needs thank you very much

Info

Channel: 45Drives

Views: 2,888

Rating: undefined out of 5

Keywords: storage server, storage nas, nas storage, server storage, data storage, 45 drives, storinator, stornado, ceph, ceph storage, storage cluster, clustering storage

Id: jBWVcJYNjeA

Channel Id: undefined

Length: 19min 22sec (1162 seconds)

Published: Tue Jun 16 2020