An Overview of GlusterFS Architecture Part 3 - Replicated Storage Cluster

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so okay so make me a let's just for simplicity's sake let's forget our foreign eighty terabytes where let's talk about replication cool so and this is synchronous replication this is things that are always replicated so the instant a server dies you're gonna be able to survive so then it's taking us into the world of high availability you got it okay well if I'm doing that I'm gonna lose capacity right you got it yeah that's inherent you can't get away from that if you want true high availability you need two copies of something yeah because it needs to be sitting there lying in wait to be failed over too sure if it's gonna be instantaneous so right away if we're gonna have a highly available cluster bare minimum is we got to make two copies of the data we're gonna have half of our effective storage now I can replicate out of higher levels right yeah you can you go to three and I think you could do forth cluster FS as you can tell I've never gone higher than but nobody ever wants four replicas no yeah no yeah that's hugely expensive would be ultra available it would be ultra available oh yes but very big yeah and the conventional reliability of conventional gear you're just not gonna lose that many so you got it you know that it okay okay so let's go through an example here so here's gonna do Brad I gonna set this up just to be simple and I hope everybody listen here it does minor contrived use cases but they're just intended to be illustrative so because we're gonna lose half our capacity what I wanna do brides I want to set you back here from our beautiful diagram mm-hm yeah and you guys you know my beautiful handwriting and Brett's beautiful handwriting I'm gonna erase this okay and I set this up this way we said 480 except 500 which was turned into 480 ended up moving the tag and we round it up all I'm gonna go to 240 why am i doing that because we'd have to draw 6 boxes there we have to draw 4 so we make them higher than that yeah so we just use the same hardware yeah let's say the boss came all I said I don't need as much storage but I need high availability you got it creep me a high availability thing is replicated Gluster cluster from nodes ABC with 30 hard drives in each of those a B C so I got a site and let's say so more raid arrays yeah we're using the same rate figuration as we did in the other one although we're replicating with cluster it's still the same storage underneath as the other one and the other almost every cluster is still based on good ol raid arrays and all the stuff that you know if your system a to use Linux right yeah and that's what makes cluster so simple it's it's another layer on something that most people are already familiar with okay beautiful okay so we got this now how you gonna set this up to be a replicated cluster that'll survive that disaster that happens gotcha okay so first of all I'm just gonna erase these 15s because we know what raid array we built underneath we built our two 15s and we have our LVM pool and now we're ready to create our bricks and so in the other case we made one big brick one big brick one big brick and we just tied them all together okay but now we're gonna replicate and you might be able to see right away if I have one brick one brick one brick he could replicate to him but what's see doing so at this point to have an odd number of servers replicated and cluster we're gonna have to have an even number of bricks on each server okay odd number of servers requires an even number of bricks such that everything can replicate to something else you got it beautiful can you draw my bricks in in red in there so we got raid volume of each cool tangle I'm just gonna get rid of this little black line and put it back in in red just uh Oh a nice sinusoid there it's good take the ingenuity all right all right cool so the idea here is what I just Illustrated is and we'll call this B 1 B 2 B II 1 and B 2 B 1 B 2 brick 1 brick to brick 1 bit 2 bit so that'll be brick 1 on server a brick to Anna yep gotcha ok so how's my replication work ok so you need a we're gonna build a replica too so we're making two copies of the data ok so we like to build these things so you can expand them one server at a time that's a very economical way of our we have big servers so people don't want to always buy do more big servers every time so we're gonna limit this to building a glossary cluster such that I can just add one more server and keep growing it that way and keep it replicated so what I'll do in this scenario is I'm just gonna start with this one brick this is where data would live and when I create the volume I'll tell it that it's pair is B brick two on B so that means I'd say jeez I'm reaching pretty good here I'm gonna draw it over here a brick one it's replica pair is B brick two and then I'm gonna follow this same idea again and I'm gonna take B's brick one and I'm gonna make its replica pair over on note C brick two and the same thing follows here we're gonna go B brick one and that's gonna replicate over to see brick - okay you might be able to see where my patterns going here I'm gonna take brick one on C and just kind of pretend that it wraps around like that it's gonna go over there so I'm gonna go see brick one and it's replica pair is no de brick - so what this is is each one of these replicas pairs are an exact mirror of your data so that lets follow your example last time and say we lost nodes see ya note C went down so that means I'm missing node sees brick one and I'm missing node sees brick - nice that it has in its exact pair over at b and b is still up sees brick - is i already did that sees brick one its pairs a B - still up as far as your users are concerned that date is still accessible due at all so you're out sorry me I'm the system in here yeah and I'm the guy with the gun pointed at the head and I don't hear that loud decibel noise out in the boat in the cubicle area and the bosses are all happy they don't know what's going on and I'm just going is my hardware down and I power supply failure and I go all work Oh somebody knocked the core power cord Oh hmm so in our scenario our first example we did he'd be at your door yelling at you what's going on that same scenario happens with this he didn't even notice that and you already fixed the problem before anyone did I'm now mission I'm now critical because I've lost my redundancy you're right so act now but you don't have anyone knocking on your door yeah if you don't act fast maybe yeah you have oh cool yeah but conventional we're talking storing his enterprise-grade reliability enterprise great hard drives unlikely for money my raids to go down or whatever and remember to the raid is still underneath so what does Gluster lair is doing is not so much allowing data protection it's more allowing data availability so we're still like the raid still keeping it safe we still could rebuild from the other one if we wanted to if we wanted a whole brand-new Servier like say this thing literally burnt to a crisp that's fine put in another server and Gluster will heal itself back over if I had the same server and it just went down because somebody knocked the power cord out probably got a it's gonna notice think a little bit when it comes back up and my raids are still alive and and and got most of the data is there with integrity what all does it take to get it back okay so and then there we were just clusters cluster the heal daemon that runs in the background at all times of the cluster volume Westwood Sandman what sense that there's newer files here than there are here there's a third member watching oh because I'm gonna get to that in a second and it just fixes whatever's broken here so I just mentioned third member you might have noticed earlier I said all good let's pick three servers that'll help him we get to here so a problem that we inherently have with replicating things instantaneously is called split brain flip brain is when for example say we only had two available at all times yeah if one blipped and came back up and this now had a file that was altered three seconds ago and this one was two seconds ago yeah the cluster volume doesn't know who's right so pauses yeah because that's safe and then you had you as the administrator have to go in choose which one's the right file and then bump split brains field and you're good to go gotcha so to avoid that altogether you know you have a third member watching at all times okay and that's beyond just the datasets it's doing a watch to see what was written yes yep and the idea there and that holds true if once we go on and talk about SEF still do the same thing of voting split-brain whenever you're replicating anything you run the risk of split brain and there's numerous ways to try to avoid that and this is how we get it right into it with cluster okay three three three two okay very good so we didn't talk about expandability on on on our distributed simple distributed cluster but let's just make a comment on it now I mean just take another so it really is easy so like you'd have to make your raid array and all that but again that's easy we've got our scripting you can call our support team up we'll help you do that or you can or you can run the same what support is there for cluster command add a brick it literally is one it would be two commands cuz you have to peer your new member and as in like let let these guys know they have a new friend and then you say okay here's your new brick and then it has a new cubbyhole to start filling up so let's get into the whole digital cubbyhole yeah okay so let's get into one next area so how long comes great we've just set up distributed cluster we've gone through a server going down let's get that X off of there as she's back there's back and you're here oh and you told the boss after work what happened and now you saved them in whatever else and how you're now back into your redundancy bosses are happy just gave you the grace whoo love it okay great so now the mouse comes mine says I'm sorry but we've run in a space we just I acquired a new company and we need more data storage so so he says can you buy another server and put another server on so purchase order was light oak and all of a sudden I have server D so we got our fourth server in there and we still have our replication as in C b1 is still tied to a b2 but we want this in here yeah okay so how do we expand the other ones very easy you just say here you've got another brick but we've got a little little more work here because we've got pairs and who is this compared to every pairs taken so what we do is we set this forth server up the same we would set up all the other servers build our raid we make our two bricks on it and we'll call it brick one Brit - and I'm really seeing Brett right now as you talk about this why you want your bricks to be the same size right yeah bricks being the same size Easy's well actually for a replicated volume it has to happen as you can see if they're gonna have pairs how can you replicate if one pairs smaller than those bricks it's got to happen the nice part about using ZFS or using LVM underneath since we're making virtual almost volumes for the bricks we can we've got some leniency on size good see what I mean yeah so like even if for save some reason you build the next one out of ten Tara but someone made a mistake and you get stuck with ten terabytes no you can just virtually make them you can make it work no right probably that'll never happen okay I'll do whatever you want right yeah you guys so anyway so let's expand this guy so we don't have to touch any of those we have to get that in here so so I'm good in this replication I'm good in this replication but this guy is pointing to here yeah and that's no good no it needs a new it needs to go so this is what needs to happen we need to destroy a replica pair so we're killing this guy we're killing that guy all right so this has all the data it was replicated to here we told us where those two are not a replica pair anymore so this bricks gone that replicas broken all the data is still here what I need to do is make a new brick with this guy tie it to this one and what that will do then to take all the data is on here and sync it to this new empty spot and you've got a new pair straightforward the downside being that you got some network traffic for a while you got it and that's some of and that's some of the which we didn't tie what we just made to quick bricks but you can imagine if you have to play an expansion on this but your brick is 30 this isn't too bad but say your brick represents 200 terabytes that's 200 terabytes of data you're gonna have to move so expanding with cluster FS well it can happen sometimes it's almost better just to build what you're gonna be expanding to rather than almost just like starting to small and that's what I was yes it's not the end of the world expand it expand at the end of the world but you had old actually understand that your is a data transfer that us hop and figure out your network size and your network speed mm-hm and figure out the amount of data you have to move and you can figure out how long that's gonna take to sync up right you got it beautiful you know what I just did here I'm not done yet either so what I just did is I took the data that was on here and I replicated it here good okay so as far as this is concerned I have the same capacity same amount of replicas pairs as I had before but now I have an empty brick and an old brick so I can just kill this brick and that's what we do we usually we just get rid of it which means the data they just ignore the data but what what we would do is we make sure this finishes first cuz it's always nice to have a little bit of a backup right no one knows not to you world if you had a second copy of your data because to feel a little safer so screwed up and such happens exactly ya know you're supposed to swear any time just kidding anyway so we can we'd mark that whatever make new data set and then we just add a new brick in just like we did the first time it's except this is a new empty brand this is empty so just pairing it up there's no data transfer time or anything like that you got that this and this is your new capacity yes Brett so step number one is to move your replicas from there over to there keeping that let's back up when that's done and you're safe and actually the moment you finish that it actually have three cops you Dan right so you're extra safe and at that point then we only pair that guy we pair up that mark that is empty pair of the two up exactly and like I was saying with distributed pretty easy nothing too scary there this this can be a tedious process well I it can be a tedious process and anytime that you have to move a lot of data people want to be very sure that they're gonna do the right thing because like you said happens so this is we got some scripting tools that we do this but we also we've got something scripted we also like to do this part manually because as you would know moving data deleting data whatever automatically sometimes you just want to be sure of yourself so that's what we do we go in and we break that manually move it so this is the 45 drive support team really wants to help you expand especially with the replicated just ensure that everything goes smoothly okay and observing one other thing if you do your expansion you have to have one bricks worth of data to move no matter how many add right so if you're gonna add if you're gonna addition you might as well do them no it's the very doable thing to add but the more you do it once then if you add at the morn at a time and then the next month by everyone no I do them in chunks because your Savior that's a great point don't don't yeah expand bigger less often then small more often that's very completely and that's the yeah and me to be of the time it's not special or any daunting or ultra high-risk it's just work exactly yeah just just work and you got transfer some data so your networks gonna fire up a little bit yeah exactly yeah cool good so that's replicated clusters [Music]
Info
Channel: 45Drives
Views: 1,724
Rating: 5 out of 5
Keywords: storage, clustering, storage clustering, big data, 45 drives, storinator, glusterfs, distributed storage cluster, replicated storage cluster, geo replication, cluster volume, raid volume
Id: YQXXUfbDC5U
Channel Id: undefined
Length: 17min 39sec (1059 seconds)
Published: Tue Jan 15 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.