A Conversation About Storage Clustering: Gluster VS Ceph (PART 1)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

LizardFS for the win!

👍︎︎ 1 👤︎︎ u/r3dk0w 📅︎︎ Oct 13 2018 🗫︎ replies

Captions

hey guys brette Kelly here R&D engineer at 45 drives I'm here today introducing a part one of a three-part series of about clustering and storage clustering with our hardware and this this part in particular we discuss really the the whys why cluster why is this beneficial to you for the main three reasons of its scalability high availability performance we introduced the to clustering software that we specialize and work with well in-house here SEF and Gluster FS more specifically we'll be speaking about sefa fess and Gluster fest and how they compare and contrast anyway so let's get into it so here we are in 45 drives our YouTube studio and call it studio 45 a Bret I'm here with Brett Kelley and Brett Kelley's our lead engineer and had a product development at 45 Bryce and I'm Doug Milburn and I'm a co-founder of the company anyway welcome Brett thanks Doug and maybe I'll welcome myself because it really it's the or you should be welcoming me right I guess so yeah both work here how does that work we both work here but you spend more time in front of the camera so okay well welcome to my spot like that Thank You Brett okay now we've got all that out of the way we came here today we want to talk about clustering clustering is where we take that leap from having single storage servers on our network plugin and need more storage put on another storage server they're not joined together you got to know where to go which namespace they're going to find your files great could take you a certain distance in your storage architecture but you want to take that next leap and storage clustering is where we tie them all together a software you go there yeah we're going to talk about two software packages gonna limit this to open-source there are great proprietary options out there which we work with all the time in 45 drives but we want to stay with two Biggie's in in open-source clustering so and those being your favorite that stuff FST Thunder that ya beautiful steal my Thunder anytime you want Brett and the other one being Gloucester FS so Brett why would somebody wish to cluster why not just use discrete servers and we need to scale just to add another server in your network there's three main reasons scalability scale performance but mainly highly availability as in they need their service to be up 99.99999 percent of the time that's hard to do with discrete service because you need manual people intervention and a very large working so it's easy just have your clustered source of storage doesn't do it for you okay let's go into this so uh scalability so if I want to grow a cluster in very general terms what do I have to do to to scale it out yet another storage server or if your storage servers that you have are only half fill yet smaller address and of course there's a little bit of software administration of course it's saying you just plug it in and stop that magic trick and I wouldn't have a job right so but it's pretty minimal phrase very minimal is very very minimal ok let's talk about scaling performance what do you mean by scaling performance ok so that's a good question because a lot of people think of performance like Oh more servers my throughput goes up it's not so much throughput a single client transfer if you'd imagine if you were sitting at your desk and pull a file down you're not gonna get a big difference from just a single server or you have a cluster server what you're gonna see is as your business grows or if your business is exporting to the end user the more users you have the more parallel performance gains you'll see so when I talk about scaling performance in a cluster as your business grows your performance grows with you as your user base grows your performance grows with your cluster ok so so the load distributes over multiple machines exactly that's exactly it so a great reason for using a cluster just explain to me and again in very general terms availability what what's that mean in a cluster is a cluster more available in a single server ok we'll keep this simple you have to Center they're both exporting as a single namespace and one of them fails typically if you're running single server like heads pop up in the desks and everyone's go oh I can't can't do anything but with a highly available setup you can imagine these two servers one goes down the second one picks up from where it left off so like I said if you lost a server your performance when in a case like that might fall down a little bit but you do not lose access Jason you mentioned did two server example but as it scales out tomorrow it's it's only gonna get better I did that for simplicity sake yeah the scales bigger and bigger and typically with clusters the bigger you make them the better they get okay so okay Bret so somebody decides they want a cluster and we're talking we're limiting this you know good storage clustering and in many environments proprietary environments but we very specifically want to talk about open source clustering we're going to limit it to AF FS mm-hmm and we're gonna limited to Gloucester a festival that why those two platforms why are they should they be a big particular interest well mainly because of their adoption in the field both by users and by developers as in both have like professional backing by Red Hat and for example Facebook uses a lot of Gluster in their infrastructure CERN some of the most advanced physics experiments that we are running as humanity store a lot of their data on SF cluster it's just a well adopted on both user and developer side plenty of documentation around it it's really so these open source projects have critical mass it's a great way to put yeah that's it so it's like with a lot of open source some people though I don't know what if the developer just decides to stop working on it that's not the case there's a lot of businesses who invested a lot of money into these two technologies they're not going away they're only going to get better right on and I guess the other thing we're gonna add is proven performance that the two of these are all the performance cram these are mature platforms ready for use now there's no waiting for anything to come around I guess that's a great question as in with open source there's always a little bit that I does this but not quite yet yes they are fully production ready like I mentioned a couple big big big players in the world use them of course there's always a bit of Fringe features that were still waiting to get at it in but you'll see that in every software package but to answer your question directly yeah the performance is there if you know how to use it of course these are kind of complex IT systems right there's sizing the right application for what your workload is but you have production stability performance wise they get the job done so I make the leap to clustering I guess a sensible threshold would be when you get a bill on the size of what you can keep on a single server if you're gonna go beyond there you see yourself growing that it's time to consider clustering you got it I'd say if you see yourself growing if your data needs are greater than what you can fit in one of our excel sixties one of the densest servers you'll find or if you need that always up highly available like I mentioned that's when you should really start thinking about clustering you know what so one thing comes up if you're an IT professional and you're running an IT infrastructure in an organization that is running off single servers right now should I'm that person should I be intimidated by the thought of clustering no not anymore ten years ago or so yeah it was complex Linux you were always kind of playing that game of is it supported by the big guys but where the way software-defined storage and open source Linux base of technology it's a lot open or of home market sorry to put it in that horrible way but open remote server but so yeah I'm reducing i engineer not english yeah but it's just it's it's a it's a lot less scary there's a lot more interplay from open-source tools with proprietary Windows like Windows Active Directory and stuff merging those in it is painless now about to answer your question glossary FS is a very easy learning as in it's a very traditional way it's just bridging a file system across a bunch of servers so if you're familiar with traditional storage servers and file systems you won't be that afraid of it the concepts that make sense I just do this now SEF is a bit scarier at first look because it's a whole new beast it's it's objects underneath instead of files but once you kind of stop think about and understand the concepts you'll go home while this is actually kind of simpler so that's why the kind of deem storage seth is the future of storage as in once we grow to these huge huge huge limits where we kind of run the end of file system we'll talk about it a bit later but why file systems start to slow down when they get to big object storage is just the best for that so once you kind of get over that little bit of a steep learning learning a steeper learning curve plus F it's uh it's not that scary of a technology either it's interesting that just evokes memories for me I developed a lot of software in the past and I bridged the world of structured programming in the early days of object or in programming uh same thing in software development coming along and that you know objects were you know a little intimidating when it was brand new concepts but then after a while you'll have it yeah you got it and I see that happening as well of course there will always be a need for fosston that's why this ffs exists because it's a file system on top of the object you'll never go away you need file system but it's definitely a great way to deal with a lot of data so for the availability so your comments and availability the more servers the better so it's almost like in a smaller volume cluster you might be better off using small more smaller machines if availability and and performance scale Ottis that's a great it's a great great question it's something I answer with my sales team and and the customers a lot do I go least amount of servers as possible as dense as possible or like the scenario you said and I'm gonna give you the answer that they always hate hearing for me case by case basis sometimes but yeah you're right for more performance like if performance is paramount you care about nothing else then yes Galya get more parallel and only half fill your server and fill the rest of it as you need if you have cold archival cold data that you need to keep but you don't care about performance or get as long as it's there it's safe and you're good go dance get three servers and the hive you like and then high availability same idea if it's you kind of have to rank what you need performance scalability available like how what's the most important to me that'll help size your nose count okay so hopefully that was a concise answer than what I sometimes give them but and your group you guys do this in a regular basis we help customers architect yeah right from the architecture point of view and what do you want you know is it performance there's availability is it densities of rack density or whatever I also got all those choices you get it and making the comment I guess wanted to comment is something we should touch on in just general architecture of clusters when we talk about performance and you talk about high and low performance when you're talking about store store inators you know especially the mid and high powered store inators you know large today's large drives and large raid arrays reading on parallel we're talking very very high speed anyway it's with your connectivity tends to be the limit for most of our most of our customers yeah you got it there um a network well almost all with sorry almost always be your bottleneck in that case but like with perform like whenever you talk about performance everyone always focuses on the throughput number and the throughput number while very important also needs to be quickly addressed with the latency or i/o per second as in if you have a pipe that can give you data in three gigabytes a second but it takes three minutes for it to find the file and start giving back to you that's not gonna feel like three gigabytes a second so is that balance point of getting that the latency the i/o performance of everything as well as your pipe in and out of each has their cluster clustering software so exactly and then and that's where the nuts and bolts and mechanics of how the clustering software does its job comes to play all right so that was part 1 of 3 hope you enjoyed where we kind of talked about the whys and why clustering and specially clustering with our hardware can really be beneficial to your storage requirements so why do you continue on and join us in part 2 of the video where we really dive into the nitty-gritty of the difference between safa vests and Gloucester vests and how to choose which one's right for your environment

Info

Channel: 45Drives

Views: 23,050

Rating: 4.9197326 out of 5

Keywords: storage server, storage clustering, storage server clustering, gluster, glusterfs, ceph, cephfs, 45 drives, storinator

Id: 4XrU-zLaaqk

Channel Id: undefined

Length: 13min 24sec (804 seconds)

Published: Tue Oct 09 2018