More about ZFS - datasets and zvols!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] good morning everybody and welcome back to next door netadmin now last time we were talking about ZFS I went into a good bit of detail on vdevs and pools and sort of an explanation of how the physical side of ZFS Works how you set up your your disc array or pool as ZFS calls it into a cohesive hole today I'm going to continue on that thought but instead of looking at the physical side we're going to take a look at the logical side of things and we're going to just jump right in and start talking about data sets now when you create a section of your pool to store data in there's a couple of different ways you can do it you can generally do it as a data set or you can do it as a zall the difference between those we'll talk about it when you create a data set it's similar to creating a partition in concept it's a set space on the disk where you can set parameters like quota which sets a maximum amount of data that that data set can contain you can set a reservation which sets up a minimum amount of space that the data set will hold so that you know that you've always got a minimum amount of capacity there now I'll explain a little bit more about that because reservations are not a concept that is frequently used in my mind and it'll become more important when we get to the Zed balls you can set compression on a data set and in fact it's encouraged to do so the default type of encryption lz4 is very very fast and reasonably good so you can actually store more data on your pool than you otherwise would be able to with no observable performance cost for the most part so it's a good thing to have you can enable D duplication so that if you write the same file over and over in different places in the file system it'll only store one copy of that data however D duplication has a very large cost in Ram and in processing time associated with it so it's recommended not to enable that unless you've specifically built a system for the purpose of doing that you can enable encryption in uh more recent versions of open ZFS you can enable encryption on the data set directly without relying on some sort of full dis encryption scheme underneath it but you have to set up encryption at the time of data set creation you can't go back and add it later it's either on from the start or not on at all that's just how it is so there are a lot of options that you can configure on this that make it something like a partition and yet in cont concept it looks and behaves a lot like a folder it mounts into a particular section of your file system and from there you can even create sub data sets that appear as other folders underneath the first data set even though it's completely different and can have completely different options associated with it and you can do a lot of other interesting things with your data sets like choose where it appears it doesn't all have to be under the same directory even if it's all under the same root data set here's an example on a true Nas core system it's very common to have a pool called tank it's a tank full of water essentially a tank full of storage space and your data sets will typically be created under that so you'll have tank SL files tank slash videos tank slash whatever but it doesn't have to be mounted there if you really really really wanted to and you shouldn't do this on a true nass system because the operating system isn't set up to recognize this but if you were running a different system let's say one where you were in full control of everything that you were doing there's no problem with deciding you know what I don't want videos under slash tank I want videos under slome slm my user so you've got a ZFS data set storing files that should be in your home directory but it's storing it in the data set that's in the pool called tank you can do that you can choose to put it on the same level you can choose to move it around you can you can basically put it wherever you want now another good thing that you can do with your data sets is take a snapshot of them because of the way that ZFS Works where you have each block individually check summed and the check sum stored at the Block higher until you get all the way to the one block at the top the super block because the way that works you can take a snapshot showing the entire file system all in one place and the other feature that makes this possible which I have not mentioned so far is called copy on right or cow a file system that is copy on right does not overwrite files directly it creates a copy so when you update you take this this version of the file and you decide to update it the update is saved as a separate copy so the old block that had the old file data is still there and the check sum is still saved in the super block of the previous copy of its parent and and etc etc etc but the new copy will have a new check sum the new check sum will be stored with the new copy of its parent and you keep going that way so in order to save a snapshot of the file system all you really need to do is grab the super block the one at the top and it's check sum will reference all the other blocks in their previous or current state and then as you go forward new new versions will be created a new super block will be created showing or linking excuse me to all of the other blocks in its copy its version of the file system and so really the snapshot only takes up as much space as the changes if you want to be technical about it the new version the current version of the file system only takes up as much space as the changes that have been made since the last snapshot but because of the way people look at the live file system is having all the data six of one half a dozen of the other you can say the live system only has 300 Megs when the whole thing is a terabyte or you can say the snapshot only uses a unique 300 megabytes and the live system has the full terabyte of live current data either way you want to look at it it means almost the same thing even if technically it might be a little different these snapshots are very very quick to create almost instantaneous because all you have to do is take a a quick Co just mark this version of the super block as not going anywhere it's Associated to a snapshot so that next time you want to overwrite it you know that the the new super block is part of the root file the the live excuse me file system I'm mixing up my words like crazy today isn't that fun all of this is a data set cool within a data set your file system is ZFS sure but you can then expose this to other machines using other software for example if you were to use Samba you can expose it using SMB which is the windows file sharing protocol and windows will be able to log on and use that you can expose it using NFS which is Network file system which is much the same thing as SMB in intent except it was developed for Nyx variants Unix Linux Etc so there are lots of different ways that you can deal with this if you want to run an FTP Damon you can go ahead make it available over FTP make it available over SFTP and SSH if you want make it available over our sync your the choices are yours it's just a file system as a file system you can also include other things for example Windows file share permissions under SMB tend to go Way Beyond the standard NYX uh permissions of read write execute Windows has many many many more permissions and you can set that up for different groups different users Etc so you can have extended acl's in the file system where you can store all of these additional Windows permissions and windows rights that'll work and then that's available to you over SMB that's cool but this is all for data sets like I said how about Zeds a zall is a block device pure and simple that means that ZFS doesn't know what files are on it it doesn't even know what file system it is it is just a section of the pool that has been set aside as a single block device you hand that block device over to another machine and you can say here it is over I scuzzy and it's a dis drive it is a dis Drive do with it as you wish if you give it to Windows Windows might format it as NTFS if you give it to Apple Apple might format it as apfs if you give it to a Linux system it might format it as xfs or ext4 or any of the other file systems that Linux can use and ZFS doesn't have to know anything about it it's just a block device but crucially it still benefits from ZFS check sums because every block of data ZFS doesn't know what's doesn't have to know what's in it it doesn't have to know whether it's a file of some text format or whether it's video or whether it's pictures or whatever it doesn't have to worry about it all it needs to know is I have this sequence of data this is the check sum for it and so it can still do the standard um data corruption checks and resilvering and provide all of the protection and resiliency that ZFS offers even in data sets you can have an encrypted zvol same thing it doesn't need to know what the data is all it needs to know is I have this stream of data let's encrypt it okay cool we can do that encryption excuse me not encryption compression you can do that too it's not as simple Because unless you have a string of zeros uh you don't necessarily know where one file starts and where another one begins so you you have to just in compress the whole thing I nearly said encryption again I've got encryption on the brain today apparently you can just compress the whole thing and lz4 will still do a reasonably good job of it at very very very minimal impact performance-wise so it's still a good idea to enable it but when it comes to snapshots now we're talking something a little different a little more involved depends on how you've set up your your zvol your block device I mentioned a reservation as an attribute that you can specify if we go back to data sets for a minute every data set appears to have the full free amount of disk space available within the entire pool because theoretically you could add that much data to that data set so your free disk space is the total free dis space of the entire pool and as you add files to one data set you will see the amount a free dis space decrease on the other data sets because they're essentially using the same free disk space you can set a quota to say hey this is the maximum amount of data you can put in there and that data set will see that quota everything else will continue to see the full size of the free disk space because they're able to use the full free dis space if they want a reservation works in the opposite fashion a reservation says I'm going to take 5 gigs of space that is the minimum space assigned to this particular data set nobody else can encroach on that everybody else can use everything on the all the free disk space available up to the last 5 gigabytes the last 5 gigb have to go to this data set over here so all of your other other data sets would see their free space decrease because you've got a reservation for this one other data set and only that data set would see the full total of free disk space available to it it's a little weird but in a zed by default every device that you create every zvol block device that you create has a set size it has to have a set size because you're presenting this as a fixed block device which means that you are in essence creating a reservation and a quota that are the same number essentially you've got a maximum amount that it can use but you've also got a minimum amount of space that is reserved for this block device cuz if you present a block device device to a Windows machine let's say every M every operating system will do this but we'll go with Windows as the example if you give it a block device and say Hi here's a hard drive it has nothing on it it's yours to do with as you wish you get halfway through it and suddenly your disc drive that says oh yes I'm still 50% free but I can't hold any more data that does weird things that usually causes the dis volume to crash badly with lost to data lost data is not a good thing so to prevent that you have to have your reservations set up to make sure that you've got the correct amount of dis space available that you've promised over promising stuff that you can't deliver is called over-provisioning you can do it if you want to but by default it won't where this comes in so if you create a sparse volume one where it's overr provisioned and you say oh yes you have a full terabyte and actually you've only got 200 gigabytes and you're like if it starts to get close to that I'll just add more discs and make sure that eventually I can fulfill the full terabyte you can run like that if you want it's not generally recommended because those of us with a you know more conservative mindset uh Tech technically speaking will go what if if something goes crazy and you start writing data out you may not have time to fulfill the provisioning you might crash the machine that's trying to use this which is a bad thing but it's still there for you if you want to do it and here's where it affects snapshots if you take a snapshot of a zvol you have one version of an entire block device 15 terabytes let's say and as soon as you have any new data you're creating a new 15 tbte block device complete with the reservation complete with the quot again ZFS doesn't know what data is in there all it knows is that this is a block device and so every snapshot doesn't just use the change to data the snapshot uses the full size of the zall for every snapshot ouch if you have not planned for this ouch especially at when you're getting into large data sets or large sets of data is what I should say to distinguish it from a data set um if you created your zvol sparse so that it doesn't have that reservation set if you're living life on the edge then yes you can create a snapshot that will also be sparse because rather than taking a snapshot of the entire block device ZFS is only taking a snapshot of the data that has been written to that block device okay so you can do that but again you can't set a snapshot to be sparse when the full live version is not sparse the setting is the same for the snapshot and the live Source whether that's a data set or a um zvol so that's some of the tradeoffs there these kinds of uh data stores work better in different circumstances if I'm using a general file share to store user files a data set is the absolutely the right thing to use use that all day every day if you want to do something like hosting a hard drive for a virtual machine that is better off in a zvol because there's more compatibility for that you can create an ice scuzzy volume put the virtual machines hard drive on that give it to your virtual machine call it a day but then you have to take additional backup measures because snapshots will be a lot more costly and you won't be able to keep as many of them as you would be able to on a data set so it's something to think about as is everything technical anytime you're dealing with this stuff there's always stuff to think about and there's lots of it lots of little bits and pieces but then that's where you get people like me that's our job that is probably more than enough detail on that for now so thank you very much everybody for watching if you have any comments or any thoughts or questions feel free to leave them down in the comments but until next time I am your next door nedman and we'll see you next time
Info
Channel: NextDoorNetAdmin
Views: 99
Rating: undefined out of 5
Keywords:
Id: HmwSnZHI_4I
Channel Id: undefined
Length: 21min 22sec (1282 seconds)
Published: Mon Jul 01 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.