An Overview of GlusterFS Architecture Part 4 - Geo-replication

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

and we could go and I guess the same thing goes for three replicas and we don't have to talk about yep yep beautiful okay last thing okay let's start here Brett okay last section of this mm-hmm I have my replicated cluster which I've expanded mm-hmm which I basically has the equivalent of 15 30 45 60 hard drives - the capacity that I've lost on it my two per 15 that I've lost for my raid and so 2 per 15 so I'm lost eight drives out of that so I have the equivalent of torture myself doing the math in the head but I've got 60 - 850 two drives and then I have 90% of that mm-hmm so I have 40 36 or something like that and it's got 46 drives worth of capacity okay along comes the boss comes in again and the boss says you know what this is extremely valuable data what happens if a meteorite strikes you know there's a meteor shower coming up and you get bigger problems in August right so I know that's what you want to save yeah but that's what you want yeah yeah you don't say that's the boss no no wait don't put down meteor shower risks right okay so this meteor shower coming up and he says we gotta make this safe we gotta put this somewhere else gyeo replication asynchronous replication tell me what asynchronous replication I now have I said 45 hard drives I mean round up to 50 I have about 400 terabytes of capacity for terabytes useable check and I want a Geo replicate usable what all has to happen in that okay so the quick high-level easy thing you need a cluster volume of equal or bigger on the other end and you're done good so now I'll go in a bit further of it but this is synchronous replication meaning when you make a right it's committed to the brick and the brick pair and until both say I got it it doesn't tell the client it's done it's right there cool so that's fine low latency length 10 gigabit between all these were laughing easy peasy okay so if you do that same synchronous replication over a long distance you can quickly see that if you have to wait for both pairs but there's a good 30 millisecond delay in between those you're gonna be waiting a long time to commit it right and then so boom that's what your application comes in it's asynchronous in the way that it's built to be sent across high latency links like a internet connection away in connection and the idea is it'll send changes incremental changes as it finds them to the receiving volume so in this case you don't even have to build I hope you don't have to build the exact copy of the volume it just has to be a cluster volume it has to be as big or bigger so we could take that same distributed cluster that we built in example one and literally just tie them together like this cuz we don't care about high availability over here maybe it's a backup yeah you don't need to buy twice as much don't care so we have what I think we had 410 480 years plus our fudge factors yeah so you there's this is and there's the best part is your replication how easy it is to set up like most of time if you need to replicate something like this you got to get our sink oh you got to write your own little bash scripts and then run your test and hope it all works G replication is make sure the two can communicate with each other across an SSH connection okay start your session and then just watch and then it's really cool actually you'll go in on one computer you'll write some files and then just like 5 to 10 to 30 seconds later it'll just magically appear across here yeah I know it is really cool and then and that's that's kind of it I know that's it so for your application and let me throw one thing in here on geo replication if we were gonna do a replicate 400 terabytes we'd probably want to bring these machines into the same building and put a high-speed like yeah like like 10 gigabit or multiple 10 gigabits together to get the original initial sync yeah good a good seed is definitely useful there because you raise a good point if we're gonna talk about sending something over a internet link and say you're sent it from like LA to somewhere in the middle of nowhere America and your latency are you not even your latency your bandwidth is 10 megabits or something like that long months or years sometimes when you get up into the hundreds of terabytes or the petabytes right so what you can do is you can bring it in bring it to build this cluster load your data onto it start the Geo replication it'll move it over relatively fast across your 10 gigabit link and then you just pause the session bring them over the new spot turn the session all sync up anything that's during the time and I wanted to do it'll start a it'll start a history crawl and then it will understand all the things that it's missed it'll start its initial crawl and then eventually you'll see it popping the files in but it's one of those things where it's it's a set it and forget it type of thing watch the status the status watch the status of your geo replication you would do this from your master notes like from your your main cluster and it'll tell you the health if something went wrong and whatever and I take it you can get alerts if it oh there's it's all built along the alerts you're logging and then again they love plugin our support team but at any point there's that google we've seen something funny so if your replication you've said over and over to me it's just one of the joys of cluster right yeah it's the joy replication is so easy and so reliable yeah yeah yeah and it's in I there's other cool features like you can cascade your geo replications so yeah so like so where we said we got site a and we're going into just safe spot but that's not safe enough you want another copy you do this again the third and then you just you literally cascade it and then you can and it just you work off this one actively and then it sends to here and then it sends to the next one you can even do to like you don't have to cascade you can have two GeoRef elicitation geo replication sessions go into different nodes so you have some flexibility there to make it work within your needs and yep one thing I found fascinating working out storage bandwidth when you start working you know what we work in and you know hundreds of terabytes and petabytes is where we start and then you start to realize that the highest been with can we have is FedEx carrying her drive yeah exactly I know it's funny you hear that you go oh yeah he's right that is the fan we always think we're data can actually be fast it's like yeah I mean you can get up in them you know huge number a Giga bits when you put a batch of hard drives 10 terabyte hard drives in a FedEx package and they zip from New York to Los Angeles overnight yeah that's amazing yeah and honestly this is kind of an aside but like we were saying how ZFS is so simple another one of the amazing things about CFS is you have a big ZFS volume you pull all those drives out you send those drives away and then just randomly put them all back in the same system it'll know it is ZFS is just awesome isn't it awesome so like instead of just like sending one Drive oh I gotta send a raid driver gonna make sure I get all the slots you put it just send a box drives and say make sure you put them all back the same so everything I'm your good to go that's all you need yeah pretty nice okay I got one other topic I really like to touch on and and maybe this will fit in when you talk about this we had we have number clients who have done really cool systems where they have two locations and they put a cluster in each location and in their clusters are split up and they geo replicate and they use half their cluster and in store they got their other half the cluster stores let me draw this sure may it may I step up to the board again Brett and and this is just really cool and some people this may be useful to some people storage cluster in my New York location and then I have my Los Angeles location they both need storage okay and I create a cluster I'm using that box as an entire cluster and split the cluster in half mm-hmm so this is active active in Y and there's active and when you say active nobody this is a this is a replicated cluster just as we talked to the last example yeah it could be or a distributed it could be distributed it doesn't matter and okay this is a disaster recovery to your application thing split in half so I got all my people out there and the others cluster iced active it means it's sharing those through my network to you know all the all my clients out there so all my employees in my location and in each of those sitting at their pcs or networked on they using active cluster that's where the whole storage is happening yep okay I don't sin and and they're going yeah but we need disaster recovery we need backup when AG replication the same cluster we can replicate la over to New York and New York over to LA you've got and then if disaster strikes and the meteorite strikes Manhattan okay and just one little server room so I still have a problem because it was a little meteorite just hit the server room then we could do a failover to there you could operate over a slow connection but it still have my data yeah exactly and I still have my data and then at that point you could yeah you could send the drives over you can push the day that you could send the servers there's a couple different ways you recover that way yeah we have people operating on that and it works really beautifully and just that flexibility or Gluster to set up and you can split things up and you can configure and it's just the yeah it's really beautiful system and again within the the realm of what it does yet you know and and and really the biggest exception on it is number of I know it's right yeah the metadata performance metadata well the metadata performance is is sometimes lacking if you if you if you fill up directories to in the they you know I'm just gonna say well arbitrarily the 1% of situations where you break that rule the other 99% of situations we've got a reasonable number I know it's exactly then then it's the spot to be that and just remember when you expand on your replication you got to move some data around so that's right the data move around and if we get back and again so it's not the scope but if we want to talk about Gloucester because you love oh sorry about that stuff because you love stuff so much stuff we're gonna do a video on Seth and again just talking about architecture of a soft cluster and what you're gonna notice both stuff is the expandability is the expandability is a lot more fluid a lot more wing it if you know what I mean on the floor on the fly yep yeah it's a bit more of a learning curve to maybe understand the new concepts but once you get it up and running and you understand it you'll go this might be the easiest storage system I've ever administered yeah yeah once you got it figured out once you got it figured out yeah cool but Gluster is a great soldier replication makes it just if that's part of you need your disaster recovery that your application makes it a winner for you yeah 100% that 100% yeah so like Forrest Gump says all I have to say about that Brett yeah and if the people watching at home have any questions comments anything at all tweet us instagram youtube comment below call us email us simple order I know we're old-fashioned email yeah we do all the fashion email we even can do in the mail post good life still do we still receive those yeah we're flexible yeah screw you wanna put it on a floppy drive and send me a message that way I'll still read it right on hey thank you thanks guys [Music]

Info

Channel: 45Drives

Views: 1,457

Rating: 5 out of 5

Keywords: storage, clustering, storage clustering, big data, 45 drives, storinator, glusterfs, distributed storage cluster, replicated storage cluster, geo replication, cluster volume, raid volume

Id: quYDVroKIHI

Channel Id: undefined

Length: 13min 1sec (781 seconds)

Published: Tue Jan 15 2019