Demystifying Gluster - GlusterFS For SysAdmins

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay greetings salutations welcome to the afternoon portion of the glossary communion workshop Dustin block is going to speak in a minute but before we do I just wanted to post the URL to the is go to this survey the survey for this workshop so please give us your feedback about this survey at that URL and we'll use that to you make a better workshop in the future so be great to have your feedback there next on the line up is Dustin black and is going to give the all about sis admins or ghost FS versus admins and I'm going to connect his computer can somebody shut that back door please where's the stone oh you have all these fancy all right testing everybody give me it's going over the speaker okay thanks for everybody for joining us this afternoon after lunch hopefully you're not too calorie ridden to start snoozing off on talking here I'll try to keep the pace up my name is Dustin black I'm a Red Hat certified architect and a Technical Account Manager for Red Hat I'm a systems guy I'm not a coder I want to make that evident right up front so there's a lot of the deep down questions I'm not going to have the best answers to but luckily we've got some great resources here if if your questions go beyond what what my expertise might be just a quick little bit about what I do for Red Hat I work for Red Hat support I am a semi dedicated resource for a set of customers a lot of my customers are in financial services industries and pharmaceuticals I tend to them as personally as possible get to know their environment and try to provide a high level of service for them in his personal way as possible we manage a lot of the things such as multi vendor support issues and we react one of the biggest things that our customers really like to take advantage of is acting as a customer advocate to red hat I've actually got some some customers who use the TAM service specifically for that because they they want to have that high-touch access to engineering as much as possible so what we're going to do this afternoon is take a look at Gloucester FS from the sysadmin perspective what I hope for everyone to get out of this give me a second here I hope you're gonna see a lot of repetition in the beginning here of some of the information that's been discussed but I I hope by the end of this year we get a little bit more of a tangible outcomes so that you can walk away from this and and this evening if you want to go sit in your hotel room and and be a geek you could spin up some VMs and install a cluster system and start feeling like you're actually into the community and and confident to use the system so we'll start right off with the technology overview so what is Gloucester ifs we've talked a little about this we've got a POSIX compliant file system this is the no central metadata server which is going to get rid of a lot of those bottlenecks and problems that you have with those central metadata servers as a B and John market talked about earlier we treat this as a network attached storage system so you're going to be able to access this over standard network protocols via a couple of different client methods one of the big benefits here is being able to scale out the system with heterogeneous commodity Hardware so this this is really sort of the the building block for Gluster it's how am I going to take all of this storage that's that's already there in a simple format and be able to expand out and use it as simply as possible without a lot of administrative overhead and that's what we get into with the you'll see in the administration of this how simple this the scaling out really is so we're able to take all of that Hardware all of the resources from those systems and essentially aggregate them so in the end all of that storage all of all of that processing power memory is essentially pulled into one storage system see if the clicker works so what is Red Hat storage I'm just going to touch on this a little bit so you have an understanding of what we're trying to do with the Gluster FS system and turn it into a supportable piece of software so what we've done with with Red Hat storage is we've taken the code of Buster if that's similar to what we do with all of the upstream products you know fedora going to rel we've taken that code we've taken the the pieces of it that we feel are most prepared for an enterprise environment and we've tested those and certified them with hardware configurations and with our partner vendors and created something that we believe is a supportable environment for Gluster FS we package that and distribute that as read at storage uses an entitlement process very similar to how we do Red Hat Enterprise Linux and you end up getting the the world-class support that right hats able to to offer with the Linux environments also with the storage what you also get in and this is you know one of those trade-offs when you're going from a community environment to a enterprise environment is you want the support we're going to end up kind of limiting the way you use it that's just a natural thing here so when we build this it is bluster FS running on a rail system plus XFS now we include X FS which is a generally unreal 6 as an add-on entitlement but it's part of the Red Hat storage system we use a subscription model just like we do with all of our entitlements for Red Hat Enterprise Linux and we have a couple of different flavors of this the main one that you're going to see is the storage software appliance which is for deployment in the data center and there's also the virtual storage apply mode appliance which is for public cloud deployments so comparing Gluster FS to your traditional storage solutions where does this really going to fit in where am I going to use this if you look at your basic math solutions your NFS servers you're going to be limited in scalability and redundancy in enroll your other systems that they've talked about but I'll do Liu for scaling out are typically going to be relying on a metadata server or a metadata distributed set of servers and in either of those environments you're going to be limited limited to total scale and capacity you're never going to get linear scaling then you can you can go all the way to sand and of course saying is great right San is expandable it's fast but it's also expensive and you got to pay you no cost lease and administrators and network engineers to handle the thing it's a it's not simple to deploy and it's not inexpensive to deploy so Gloucester kind of fits in where that hole is it's like we made something that's Nazz performance and Nazz simplicity but we need to be able to scale like we can in a sand environment so what we're able to do with Gluster with that commodity hardware is is get it into a linear scaling fashion and do this with little overhead high redundancy and make it as simple and inexpensive to deploy as possible the clicker is going to work the technology stack so we will just take a look here at how how this is built up quick a little bit of terminology some of these things you've probably heard passed around but we want to put in some definitions here real quick so that everybody has a good understanding here a brick is what we're going to refer to as a filesystem mount point the Gluster is going to use as a place to store its data so that term that term gets tossed around a lot with lustre but that's it's it's really that simple it's just a file system that we've told Westar to use a translator is a piece of logic it's it's what sits between that storage and what's being presented in the global namespace to the user so the brick being the storage component John Mark had used the analogy of Lego bricks right so when you get to these Lego building blocks from a storage component I can snap them all together and I can scale up the amount of storage capacity that I need the translator gives me the similar capability with logic it allows me to say okay I want to do different things with that data and I can stack up the pieces of logic that I want to do and create the functionality and you'll see that in the examples that I give here in a little while a volume is essentially bricks that have been passed through translators we have the backend storage we have all the logic that sits on top of it so the thing that gets presented to the user at the end is the volume and then a node is simply just a cluster server that actually is sharing those bricks so foundation components building this thing from the bottom up what do you need to start with well in your private cloud environment in your data center you're just looking at commodity Hardware your x86 64 servers and if it's a Red Hat storage server it does have to be on the Red Hat hardware compatibility list now that's a for Red Hat storage right now in the current release that's going to be a limited set of servers and storage components that need to be used if you're going to be using the community version you're going to be able to deploy this on essentially any x86 64 Hardware in the cloud what's what's realistic right now and what's available is deploying to Amazon Web Services obviously as that landscape changes you think we'll see some other options but that's that's what we have at the moment for deploying to a public cloud so we have this piece of hardware let's build the components up on top of that so we need some underlying storage this can be just some direct-attached storage j-bot disks this is this is just storage like you know it I'm going to attach simple storage to a server and make it available now on top of that we do want to do Hardware right now a lot of people wonder why do I need to do raid if I'm going to have the redundancy pieces that are built into Gluster why do I also need hardware RAID well you really want to be able to handle a failure at the simplest point possible right if you have a disk failure do you really want to be managing a disk failure at the Gluster system or at the at the operating system you probably don't the easiest way to handle a disk failure is going to be with your raid environment so it's highly recommended and for Redhead storage it is required to have a rate environment redhead storage requires raid six logic volume management also not a piece that's technically required you can deploy a brick without LVM but highly highly recommended and again required for redhead storage and then on top of all logical volume obviously you're going to deploy a file system of some kind this is going to be some sort of some file system that has extended attribute support that's really the primary requirement there because we're going to be storing metadata information in those extended attributes so you look at an x FS ext 3 or 4 butter FS I throw up there as an example it is possible but keep in mind also again for Red Hat storage the requirement is this is X FS and I see why he wasn't using the clicker ok data access so I've got this data on the server what are my options for getting to this data and we talked about some of these options and the OP these options are growing over time as well but this is what your the main resources that you're going to see on the client side for getting to this the Gluster Native Client which is the file system and user space client is typically going to be the best choice for most workloads there going to be cases where the NFS is going to be preferred or one of these other methods depending on what your architecture requires so the file system user space is the capability of doing file system functions obviously in user space not having to cross over into the kernel environment this this is built into Gluster when you it's it's the native piece of it it's what is most of the time going to be preferred access method we also have built in NFS as NFS v3 as was mentioned earlier this is built into Gluster so the Gluster services are actually going to be running an NFS v3 server by default now if you if you want to do samba this is a little bit different of a deployment it's not built into Gluster NFS is so4 sifts you're actually going to use the Gloucester Native Client to mount the share locally on the system and then re export that local mount with Samba so you're going to have a separately configured Samba environment to to share those back out and then there's a UFO which you've also discussed and I believe echo is going to get into some more detail on later I'm not going to go too deep into this but what we have the ability there to do is through some some proxy logic that's in places is to access your data simultaneously both as a file and as an object which is which is a really nice benefit of bluster to have that capability so the okay I guess I'm gonna have to stand in front of this nobody hear me okay excuse me second so if you take a look at a Gloucester node what you're going to see as far as the services are running are these primary ones here so you have the gloucester d service which is the the elastic volume management daemon this is the the main service that handles the gloucester communications you're going to see this running once on every note on every export server you're going to interface with this through the Gloucester command line tool which is what we're going to actually show some examples up here in a little while the Gloucester FSD daemon this is going to be running once for each brick so you may actually you may actually share multiple bricks from a node in which case you're going to see multiple instances of the Gloucester fsd daemon running this is this process is also managed by the by the Gloucester Dee process then you have the Gloucester FS process this is going to serve a couple of functions on the system one of those primary functions is as the NFS server you'll see it running on the server side you're also going to see this running on the client side handling the fuse communications and then as far as the the user tools that you're going to interact with the mount Gloucester FS tool is the fuse native mount tool and then there's the Gluster command which is going to be your management interface for for actually working with the Gloucester daemon so I like this graphic shows you how all of these pieces go together and if you walk through the stack there you'll see we built this up from the bottom we have storage Green moment here so we have our storage on the bottom which then has the hardware RAID layer on top of that with logical volume management and then sorry I can't see my own graphic very well you can see how all those components that we talked about stack on top of each other - in the end provide you the services that you're that you're getting from the guster I'll have all these published out you'll be able to take a look at these in more detail but I like this graphic for understanding how all of those components are stacked together so scaling system this is this is the most important thing of bluster really is its ability to scale it's also so simple that I have two slides here just to kind of give you the the basic concept of what's going on here so our linear scaling scaling up is our I'm sorry this is so looking at a single node scaling up adding disks to a node so this is this is as simple as walking through those components that we discuss so I have a server add this to it a Delvian I put a filesystem on that I tell fluster that that filesystem is a brick and now I have additional storage I can do this over and over again on a server I can always add additional storage to a server that new storage can become a brick and we're able to scale up on individual systems but the real power is also going to be in scaling out you we're going to take that process that we did building up the system and now we're going to just replicate that process over and over with glisters ability to scale you just do this again and again and again and you're always going to get the linear scaling capacity because you're not having to deal with that central metadata server we also talked about this being able to be a heterogeneous process so if you want that to be a larger server that you deploy that's that's possible that's something you can do you can also take away a server so we have this flexibility without being able or without having to interrupt the end user to do this kind of scaling now there's nothing there's nothing limiting the way you do this scaling but there are certainly use cases that are going to define how you you might want to choose your heterogeneous environment as you're growing a cluster over time you may be adding newer and more modern larger hardware and that's most likely going to be your use case for things not being homogeneous so what's it look like under the hood the elastic hash algorithm is the the core piece of logic that has Gluster doing what it does best the distributing the information between the nodes so again no central metadata server so no bottleneck there what we do is we we hash the location of a file based on its its path and file name essentially so that what I like to compare this to for anybody who's not familiar with this is running an md5 sum on a file on a server so if I have a file I run an md5 sum on it I'm going to get some value of a specified length if I take that same file an hour running md5 sum on another server I'm also going to get the exact same value it allows me to validate that that's the same file you can imagine Gluster doing the same thing every time a file is accessed in the system it is it is calculating a hash and it's using that hash to understand where in the distributed namespace that file is located because every server is going to calculate the same hash every server always know the knows the location of the file it's able to use that this is the primary thing that's keeping us from having to use that central metadata server so the elastic part of this is is we're not we're not hard coding that hash to a specific storage location where we're coding that cache sorry that that hash value to a logical volume virtual volume sorry and that volume can then be assigned to multiple physical locations multiple bricks and this is what gives us our flexibility in the elasticity so taking a look at translators this is just kind of show you that pluggable and stackable capability that we see with the logic of the translators the translator there's going to be a stack of translators always on the server side and on the client side and every transaction is going to be running through that stack of translators this is just a simple example here that shows a fairly default scenario where the logic is passing through the client side and the server side and the communication and you can see sort of a typical stack of translators for this if you were to get into a more complicated scenario you might see this branching off so that there's replication involved or some of the pieces that try number the specific example there was a there was a good example I recall of one of the translators it's commonly run on the client-side that if you can also run on the server-side for a different bit of functionality so you can think of these things as modular and you can get pretty creative and how you put them together and also in developing your own translators which obviously we would HIGHLY encourage so we're getting the meat of this and and look at the basics of what we can do with cluster and a distributed volume as the starting point this is what Gluster does by default if you don't tell it to do anything else so a distributed volume is cluster spreading the files essentially evenly across the bricks that are available you can sort of think of this as file level raid zero now granted evenly is going to be a relative term because it's all going to depend on what kind of files you're writing different sizes of files different types of files but it's going to do its best by default to spread these across so you can see in this example here when a file is written you see file one getting written to server one and this example shows file two to sort two server one export one and file three is going to the other brick keep in mind that we're looking at things the brick level in this case we have one server with one brick but this can get a lot more complicated but we want to understand that the pieces that make up the distributed volume are the bricks themselves so look in another basic scenario instead of distributing we're actually going to tell Gluster replicate we're going to we're going to give it a command that says okay when you build this volume you've got these two bricks but now instead of distributing among these bricks copy it to both places again you can you can think of this roughly as file level raid 1 and you can see that every file that gets written is getting written to both bricks in this case it's just a simple case of two replicas scenario so if you put these two ideas together what I can do in this case is I now have four bricks involved and that told bluster and make two replicas well what Gluster is going to do by default without me telling it to do anything else it says okay you've given me four bricks you've told me to make two replicas so now I'm going to distribute between the two replicas so I've now layered these functions together and I have a replication of each file that's being written between two bricks and I am when the next file is written it's going to be written to the other set of replicated bricks so let's look at a little bit different scenario the Geo replication so what we were doing there was replication within a Gluster environment we had multiple bricks on multiple servers and we're replicating our data inside of our data center geo replication gives us the ability to handle remote replication for disaster recovery so instead of it being a synchronous replication like it is within your data center we're doing an asynchronous replication across a network environment now right now this is a master slave model as a 3/3 it's still only master slave right okay so this is essentially a one directional one direction excuse me one direction replication you can get a little bit fancy with this by replicating out in a cascading model like shown in the bottom graphic there once this is setup though this is a continuous and incremental process in the background this is essentially using rsync to keep the data synchronized between the remote environments one important thing that I pointed out there is that time should be synchronized on all of the master nodes otherwise this could get really out of hand and messy but that's the basics of this is is that we're taking our data asynchronously pushing it out to a remote site if you need to recover that data you're going to be going through a manual process as it is right now you do have it nice and clean setting on your remote site but there is no way currently with cluster to replicate that back there's no two-way replication yet that is something that's that's being planned so this is just comparing what you're looking at for the two different types of replication you get in data center replication and multi data center across the network replication here you know replicated volume is is mirroring your data across clusters agio replication is for geographic distribution so we did distribution and replication you saw how these are layered functionalities but we're going to get into some more of those layers that you are capable of using with the basics of Gluster cluster F asset so one of the other options we have this has a relatively limited use case but just so you can see how this how that all of this stacks up we talked a little bit about striped volumes so you can think of this as similar to raid zero what happens here is I now have two bricks to find I've said that these two bricks are part of a stripe so now when a file is written the individual file is actually broken up into chunks and written again as evenly as possible across the multiple bricks what you end up seeing on the file system level is it appears on each server if you were to look at the local file system the local brick you would see what appears to be the whole file but if you if you analyze the actual contents of those files you would see that file one one brick contains part of the data and a file on the next brick contains another part of the data of course depending on how many stripes you have so we can take that striped volume and do what we did with the replicated volume and stack distribution on top of it so once again I've given it four bricks and I said I want a stripe of two so what it's done here is it says okay well you've given me four bricks and told me to use two of them so I'm just going to do that twice and then distribute between them so you see in this scenario that I have a striped volume between two bricks and a striped volume between another two bricks and the file is being written the first file is being written to the the first striped volume and when the second file is written the distribution protocol tells it to write to the other set of stripe Williams this yeah this is the graphic that has a little bit of a issue here though yeah you've you're striping in this case to multiple bricks on the same server but we're going to do graphic here in a little while show there's a good learning example because one of these graphics was actually written kind of funny so look at a striped replicated volume so again layering up the functionality here we're not doing any distribution in this case we're striping and replicating so we have multiple striped volumes and every time a file is written it is being written to it's being replicated but striped twice so we have a set of stripes over here a set of stripes here and every time a file is written it's written to both service stripes so we've essentially told Gluster here you're for bricks now take do a two stripes and two replicas and because of that it's not doing any distribution there's nothing left over to distribute to so now we can get into the pretty complicated scenario of stacking all of these pieces up and we now have a distributed striped replicated volume now the capability of doing this is actually only as of Gluster FS 3.3 if you go back to 3.2 you it will not allow you to create this three layers of functionality here but you can see what's going on here is a now you have two striped volumes with two replicated volumes in each of them so when a file is written it's getting both striped and replicated and when the next file is written the distribution piece of it is coming into play and it's writing it to another set of stryker replicated environment that's the graphic we're going to come back to in a little bit first take a little bit more of a look at data access so we talked about the Gluster Native Client so fuses the kernel module that's going to let you access a file system allow a file so going to be created in user space this is the big advantage here is that we're not having to actually get into the kernel code and we get to stay stay safely outside of that and still get all the file system functionality that we need when you use the Native Client to mount to a cluster FS share to it to a volume you're going to be able to specify any of the nodes so I'm going to be able to say okay I want to access this volume and I can say I want to access it on on that particular node in the in the cluster and when that communication gets started I the client is actually going to say okay now I know where all of this data is and it's going to be communicating directly with the individual nodes that are storing the data rather than always going through the node that you mounted to that's one of the advantages of the Native Client so okay so that's the part that works is the Native Client fetches the ball file from the mount server it communicates directly with all nodes so this is a this is recommended for the high concurrency environments where you need the good write performance so in FS as I mentioned before this is built in good right weather that's a chef right so it's based on whichever one responds first right so which is most likely going to be the local map right okay one thing that I've discovered in my own testing of this I think we pointed out when I brought this up once before that this is actually a bit of a bug Rick do you remember interrupting you there with the we brought up the fact that with the client you actually have to specify to mount with version equals three because of a the error that that it basically comes back with a right and for some reason that wasn't happening I remember having the discussion before mmm yeah so I wanted to point this out because if you do get home and you start messing around with this in your own environment you go to mount it up you may find that when you run the NFS mount command you're going to be treated with an error in response so just keep in mind that you actually do inform it to mount with version equals three standard Auto Manor is supported you can again like the Native Client you can make the mount to any node but with the NFS client that once you make them out to a particular node your communication all of your communication to the bricks is going to be filtered through that one node so this is one of the detriments of using the NFS protocol instead of using the fuse native client but on the other side you can get better performance for lots of small files because of the NFS caching that's built in so again accessing your data with sifts as I described before this is going to be a little bit more of a manual configuration that you have to do outside of Gluster you're going to use the the fuse protocol to mount locally and then reshare out with a Samba configuration of your own choice ok so I'm not doing a live demo here but what I have in this next set of slides is some screenshots that are going to show you what the Gluster command line looks like to do a lot of these things that we've talked about so the first thing we're going to do here is prepare a brick this is pretty straightforward this I'm putting this in here just so that we have it you know a complete set of examples of what we're going to be doing from beginning to end so all we've done here is we've created a logical volume and a filesystem on it in this case I do highlight and point out that we set the inode size to 512 this is important so there's enough room for the metadata information to be stored otherwise additional blocks are going to have to be used to store that information we create a directory we've mounted that directory up and we've set it up in FS tab so it's always going to mount that's all there is to a brick ok we just we just created a filesystem but now that we've added that filesystem let's start building our glass so the first thing we're going to do is probe the piers so from the Gloucester command line we see pure probe and I've given it a hostname and that just is a resolvable hostname on this machine and then you can see peer status now one thing that's kind of interesting when you look at this peer status command is it does not show the local machine that you're running the command on so this output in this case is showing server two and server 3 we can assume through inference that I'm actually running this command on server 1 which is also part of the Gloucester cluster in this case it's not going to show up in the in the peer status command but but that's ok so yeah so I always thought that was a little bit of a problem because it is certainly a bit confusing it is good to know that they so in the future release is it is if 3/4 I guess that's going to were going to see that or ok good so anyway if you if you have a later release you're probably going to see this isn't the case it will actually show all of the hosts including the local machine so doing the basic thing that Gloucester does let's create a distributed volume here you'll see that this is as simple as I'm going to create a volume I give it a name though so I run the volume create command the name next which is which is arbitrary and then I'm going to tell it what bricks to use so in this case I've given it server to brick to server 3 brick 3 and that server so server 2 is obviously the resolvable hostname and then colon slash and everything after the colon is a actual filesystem hierarchy as it would look on on the local machine so in this case I am using slash brick 2 on server 2 as a brick and slash brick 3 on server 3 as a brick because I didn't tell us to do anything else I've now created distributed volume so when you see the volume info the output on this shows I see the volume name I see the type is distribute it shows the number of bricks transport type is TCP that's a default if you don't tell it anything else and then it shows me the layout of the bricks brick 1 server 2 brick to brick 2 server 3 brick 3 now this isn't fully ready to go yet Gluster recognizes that you have those bricks you put it into a volume but if you want to make use of them you do need to run the volume start command you see that there at the the bottom of that example so because this is so simple we're going to jump from that simple scenario to let's put all the layers in place and we're going to come back to what we looked at before which is that distributed striped replicated volume so it was it was this graphic that we saw before when we put all of those layers together this set of commands actually shows how to build that environment so we've run volume create weave get in a name we tell it replica to striped to and so we've told it to stripes and to replicas but we've now given it a tricks to use so with the essentially the left over set of bricks because we told it to and to that's only four but we've given it eight it's going to do the same thing with the other four it's going to distribute between two configurations that look exactly the same now you'll notice when I pass this command that green highlighted bit is complaining to me and that's because whoever created this graphic didn't take into account that this was not a optimal scenario what we're doing in this case is we're actually replicating between bricks on the same node and that has to do with the order that we've passed the bricks in it it's going to do it's going to set up the replication based on the order of that so you'll notice when I passed the two bricks from server one I've got export one and export two from server one listed first so when it sets up the replication the first thing it does is it replicates between the first two bricks because the first two bricks are on the same node cluster is actually warning in this case this is not optimal why would you want to do this but it'll let me go ahead and continue and then I can build out the environment so just taking a look at the the pieces of the commands that set this up so you can see that the the test volume name that I gave it when I passed the command is the distributed volume the which is the top top level of the the logic that we're looking at the next piece was the replica too so you can if you look at the graphing you can you can see by the way it's the graphic is laid out that it's replicating between nodes on the same sorry between bricks on the same node and then it is striping between the replicas on the two different nodes and then finally the distributed volume it's doing this essentially twice so let's take a look at how we fix this and this is just as simple as rearranging the order of the bricks so we run the same command this time but you notice this time and I've highlighted this for for clarity that I've given the bricks in pairs of bricks on different servers so the first two bricks are on two different servers the second set are on two different servers and so I actually end up now with an environment where all of my replications are across bricks that are on different servers which is a much more optimal scenario you'll notice now if I run the volume info command on the test volume I get all of the information that I need to see I get the volume name test volume you'll see that it shows distributed stripe replicated because I've given it all of you know two replicas two stripes and eight total bricks shows that it's created not started you can see it does a little bit of math logic for you there just for your reference two by two by two equals the eighth bricks that we have transport type is TCP which actually did explicitly imply in this one just so you can see that in the in the command up there and now it shows the brick layout and you notice that the order of the brick layout is the same order that I put them in in the command and the replication is is in those pairs so we have an existing volume and we want to we want to play around with the the bricks that are in there so in this case you know simple scenario let's add a brick to it so volume add brick you see all these commands are very straightforward volume add brick to this volume and then I'm going to give it a new brick to add to it now when I do that it's not going to be well you'll see in some of these later commands here I'll show you the next set is showing or moving a brick so in this case to do a volume remove brick tell it the volume and what brick I want to remove now you'll notice at the end of this one I have to say start because when I'm removing a brick I obviously inherently need to move data around to free up the space on that brick so I tell it to start and then I can run a status command and a status is going to give me some of that output that you see there so in this case I see that it shows that it is it's moving that data it shows it is in progress now once that status comes back and says complete I can run the commit command which is going to officially remove that brick from that volume and and free it up for for whatever you need to do with it so when the the last little set down there we see the volume rebalance command so when I've added a brick to a volume I now need to essentially reshape that volume so that that brick is is available for use a couple of different things I can do here and the first one that I'm showing there is is I'm running volume rebalance with the volume name and then I'm giving it the fixed layout and then telling it to start that process now in this case all it's doing is changing the architecture of the bricks in that volume and making that new brick available for writing data to it is not moving any existing data around so even though that's part of the rebalance command it's not actually rebalancing any existing data so when I start that the when that process completes what you've essentially got now is a new space for files to be written to and because that spaces is free and available the elastic hash algorithm is now going to use that space for new files that are being written if I run the rebalance command without fixed layout we're actually going to be doing in this case a full-blown rebalance this is a pretty heavy process and you want to be careful with the scenarios that you might actually use this in if you have lots and lots and lots of data this is going to require some some heavy lift lifting by the system yes so in this case by by giving it a new brick and telling it to do a full-blown rebalance the the hash algorithm is now going to be run against every existing file that's on the system and it's going to make a new decision about where those files need to go to you know to redistribute those files as if that brick had been there in the first place so you're going to end up with a lot of disk activity and network activity inherently just by moving that data to on the new bricks like straight for festivai if I extend my gross available Squarespace by 10% you would have 10% so look here at a migrating data or replacing brick so in this case instead of just adding a new brick or taking a break out let's say for whatever reason I want to replace a brick that's already existing again the command is straightforward volume replaced brick which volume which brick I want to replace and what the new brick is and start now this is essentially doing the process of removing and adding a brick all in one simple swift move again I have to start it I can't just run the command like I can when I add a brick initially at the start the sob data is going to be moved around when the data is finished you're going to get a migration complete message when you run the status and then once the the status shows migration complete you can run that commit command and then the old brick be removed from service so there are a lot of commands that you can use to sort of get into the the deeper level of system administration of Gloucester I've thrown a few up here that are going to be just more common examples of stuff you might run into this is stuff I certainly recommend you dig into when you set you know set up a test environment and see what options are available check the documentation but I just wanted you to see sort of the format of some of these commands so you know if you want to set authorization for how data on your Gloucester volumes is accessed then we have the volumes the volume set command is sort of the core command for applying these other attributes so I'm saying volume set the name of the volume and I'm saying auth allow and you can see that I can pass like an IP address wild car wild card there or you know in the same thing auth reject don't allow anything from the ten dot anything IP address pretty straightforward stuff there are NFS specific commands that are in there as well keep in mind that with the built-in NFS server I'm not doing something like editing and Etsy exports file that's not there for me to do Gloucester's managing that for me so a lot of the stuff you might want to do as far as NFS specific configuration is also available in the volume set command so you'll see here we did a volume set on that volume if I want to set my NFS access to read-only I can do that here or if I want to disable an FS altogether because maybe I'm only going to be using the diffuse native client or you know or that in combination with samba I don't want to use NFS I can turn that off entirely here you can also dig into a lot of other options that are in there like I said I've got a couple of examples there of the set read-only for everything so in the previous command I show setting read-only for only the NFS protocol but if I want to set an entire volume as read-only I can do that here there are also a lot of performance tweaks that you can do in this case I show doubling the cache size here so giving it a value that's double the default so digging into the performance of your volumes there are a couple of tools that you can use to get some insight the first one is the volume top command this is a gives you a lot of cool output here there are a lot of different options you can pass to this command as well to get different kinds of detail in this case I'm just showing an example where I said give me the volume top of this volume and I want to see the read values on a brick level of this brick and the last little option that I passed there is list count 3 so tell me show me the top 3 show me the top 3 read activity based on the brick volume based on the break on this brick and so you can see it's going to show me the read count of some some individual files that are on there so when you run this volume top command you know you can see there that I showed Reed and brick you can run you can run that as read against directories or against files or write against those there's a lot of extra options you can you can dig into there there's also a an interesting little performance tool that you can run there that are part of the top command as well but you need to be careful with these performance analysis tools because they actually do a DD process in the background so these can impact your system performance when you when you run them but very good for benchmarking performance so if you want to get that the volume top command is good to giving you a point in time information but if you want to get some trending information you can collect that with the volume profile tool so volume profile the volume and I tell it to start it's really all there is to it to get it started and and now it's starting to collect data and it's going to continue to collect data until I tell it stopped at any given time while it's running I'm going to run the volume profile on the volume until it passed the info command to it and you see a truncated example of the kind of output that you get on this one what it's going to show me is read and write information at different block sizes the the ellipsis that I put in there is cutting out the other lines of output that just wouldn't fit neatly on the screen but you can see there that the columns that it shows are one bit plus 32 64 so you see those block ranges if this continued down you would see higher and higher block ranges and more information on those you also get the latency output there at the bottom and then you finally it shows you the duration that it's been running and the total bytes read and written during that duration yeah I imagine there's going to be some performance it with running that but I don't I don't think it's excessive to run the profiling tool is it it's right it's essentially logging okay so it realistically it should be very lightweight you should be able to run that and collect information on a production system with that impact again it is logging information so obviously it's storing that data somewhere I'm not quite sure what it does in the background as far as managing it you know the storage of that data and rotating it but it is run per volume when you run that command okay okay so a quick look at the commands involved in setting up geo replication just so we have a little understanding of that again you see the the very simple straightforward commands volume geo replication I tell it what volume and then I'm passing it what host I'm going to replicate to now you'll notice in in this case I show a couple of examples here just so you see the the differences in the way you can pass this command in this case I've given a remote hostname and I've given it a colon and a volume name now this basically tells the Buster to interpret this as the remote machine is also running a cluster system and that is a remote volume name I tell it to start and then you know it gives me the information that is that it successfully started and when I run the status on that you can see shows the master the master is the name of the volume that I told to replicate and then you see the slave and it shows it gives that little Gluster protocol tag on there to show you that this is replicating to a remote Gluster volume shows the name and the volume and gives me a status of okay you can if your scenario calls for it you can perform a geo replication to simply a remote SSH share instead of a remote Gloster volume I still have trouble understanding why you would do this but yeah I imagine there's a scenario where you would but just to kind of show how that would be set up you do actually need to set up SSH keys for cluster the secret team has got to be in place and then you see in this case I've actually passed it a username at the remote host and when I give the directory structure I've now given it a colon slash which is indicating that it's actually a directory structure and not a volume name so it interprets that shorthand appropriately and you can also see in the output of the status command it shows master volume and it shows the slave is actually being a remote SSH share so I could send this directly to a remote filesystem granted if you're using Gloucester to store loads and loads of data and then replicating to a remote SSH file system what does backing up the data on your remote as you know what your back-end storage in that case that's why the scenario seems kind of strange but it's there there in this case I show another instance of running the volume info I've truncated the output that we've seen before just to show you that at the end of that once I've set up geo replication the volume info command actually shows the options that are reconfigured as geo replication is on so that little bit of output is put in there so we'll close this out with just a little bit of information on some use cases common solutions we've talked a little bit about these media content distribution backup archive disaster recovery large-scale file servers these are all pretty straightforward use cases where we're Gluster really excels where I have lots of data that's being accessed by lots of client that I need to be able to scale out at a at a linear pace inexpensively this is really going to be the best use scenarios another one of the interesting ones is HPC I like that scenario because it's a little different than some of the other use cases the the Gluster back-end storage being able to scale in a similar way to how you you're going to scale your HPC environment you're going to be able to use that commodity hardware in a similar fashion it doesn't never made any sense to me that if I have an HPC system that's outputting tons and tons of data and I've built this HPC system with this wonderful inexpensive commodity hardware and commodity software systems now all this large amounts of data that it's that it's dumping I either have to put it on some skimpy system that's hard to access and it is limited in capacity or I have to put it on this really expensive back in sand so I think this is a really cool place for for Gluster to excel as well so you can scale along with the HPC system and then I think what another one of the things we're going to see more of in the future is in infrastructure as a service environments using Gluster as a storage back-end for that we also talked a bit before about hadoop i think john mark actually covered this fairly well it's certainly an interesting scenario to be able to use something that is a POSIX compliant file system as your storage back-end for without having to really do anything significant to Hadoop to get it to do that you know you essentially get a seamless replacement for HDFS but you get the added benefit of being able to access your data as files rather than having to transfer into the HDFS system via your manual process as you normally would need so these are a few customer scenarios just to kind of show where Gluster has been used in in real life and a couple these are pretty high profile and obvious CIC electronic signatures these guys are doing a lot of processing of electronic signatures as their name indicates they need to be able to do this stuff fast and they need to be able to scale quickly and they need to be able to do it in the cloud and what they were running into was the performance of the storage environment and scalability of storage environment while they were in the cloud was really limited and their SLA s are demanding and what they're able to do with Gluster in Amazon Web Services is is set their standard for those SLA s and they were able to move far beyond those they were actually able to get performance metrics that are well beyond what they were what they were even shooting for they were also able to accelerate their cloud migration once once they put cluster into the plan they're there their move to the cloud was able to go a lot more quickly and smoothly this is the the big one that that's always fun to tout because most of you guys are probably running this on your device of some sort a very very well known environment so you can imagine a lot of this this music that you're streaming from Pandora is being hosted with a with a cluster FS back in which I think shows a really significant you know use case for Gluster that that we all can relate to the the amount of storage I think this this data is probably out of out of date at this point but at one point two petabytes of audiosurf per week 13 million files I think we show that there I thought this one had a number on it for how much data they were using so what we talked a little before about scenarios where you may or may not use the rebalance and hot content is one of those places where rebalance is probably not something that you want to do so as pandora needs to expand out their storage infrastructure they can add additional Gluster nodes additional bricks of storage and by not rebalancing they now have the newer technology the more modern the faster stuff that is free and available so the hashing algorithm the elastic hash algorithm is now going to be able to take advantage of that for the new hotter and hotter music files that are that are coming in and being accessed so you want that to sit there and you really kind of want your older stuff that's being accessed in less and less and less to sort of stay on that older slower stuff that you can migrate off at at some point later yes so using using the pen doors back in storage in conjunction with the content distribution right right so yeah part of the answer there is obviously that we're limited with the capabilities of the current release of Gluster but with the future lustre FS releases and two-way replication that's going to increase those capabilities for sure so Brightcove is another interesting scenario they're doing something pretty similar to what you see with Gluster but with files that are significantly larger and a pace of growth that's it's very significant so these guys are storing videos instead of music and as times are changing and they're needing to store distribute different types of data the the videos you know use you can imagine as you're getting into HD formats the amount of storage that they're able to the storages they need to use and the information that's being streamed is is increasing significantly so we show here a petabyte of total capacity against probably outdated there the project was to grow this pretty significantly but similar to Pandora they're able to use this to to scale very quickly inexpensively and be able to administer in the long run with a small you know set of administrators I think it even says one administrator to manage day-to-day operations which is pretty significant for a storage system that's holding even a petabyte of data so you know high reliability path to a multi-site at some point in the future again probably taking advantage of the the future replication stuff that we're going to have and then getting to that HPC scenario that we talked about this pattern energy is a company that's actually using this they they had an interesting scenario where they need to track weather patterns and they do this for a couple of different reasons in their business and they have an HPC system that they need to that to do that and again generating lots of data and scaling pretty rapidly how do I take that data and put it on a back-end that scales with my HPC system so they were able to utilize Gluster for that function and get a lot of benefit out of that and a lot of benefit to their to their business for this the solar and wind patterns helps them identify not only patterns for energy consumption so that they can route energy in the ways that they're going to be most efficient but also for things like deciding where to put a wind farm at some point in the future if I can understand the wind patterns that might have a better idea of where my windmills are going to be the most efficient - to be placed so a lot of good information coming out of that and that's all I've got for you guys today I just say thank you to everybody I've got some resources up here again I put up that the URL to the to the the survey for some feedback and appreciate everybody providing that feedback and I've got my contact information up there as well as some some links that are useful and catch us on the the social sites as well any questions that's so from a performance standpoint there's there's there's an aspect to the size right as well not sure how to best answer the question the there's nothing in Gluster that's limiting the the size of the individual bricks right so I can put different brick sizes into the environment that they're you know of course going to be up to the individual scenario that you're that you need right alright takes all that into account you yeah I think it's a little related to to the question that came up earlier about when you have a replicated volume how does the client know which of the replications to access and maybe it explained that it's basically whichever one is responding first so we have that ability to for the client because the client is is aware of the multiple locations for the data it's going to essentially ask for all of it and wherever it gets the response back first it's going to communicate with

Info

Channel: tomek S

Views: 44,572

Rating: 4.7959185 out of 5

Keywords: Linux, RedHat, GlusterFS, filesystem

Id: HkBndZOcEA0

Channel Id: undefined

Length: 72min 24sec (4344 seconds)

Published: Sun Feb 03 2013