ZFS Deduplication in TrueNAS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
are you using your network attached storage as a game library are virtual machine drives consuming all of your storage despite them being filled with nearly identical information have you been frustrated by finding the same operating system iso in seven different directories you're not alone and there may be help ask your doctor if zfs deduplication is right for you side effects may include additional cpu overhead night terrors a compulsive desire to purchase more system memory bandwidth envy checking account withdrawal kernel panic attacks or erectile hyperfunction you don't have to live with swollen storage start using zfs deduplication and get ready to live again today's video is brought to you by manscaped and the performance package 4.0 which includes everything you need to keep your yard looking its best having the right tool for the job is paramount as you're not going to get the results you want with the wrong equipment the lawnmower 4.0 is ip67 rated so you can look your best whether it's rain or shine plus with its skin safe technology you won't end up tilling instead of just trimming you'll also get the weed whacker for ear and nose hair crop preserver to keep your tomatoes dry and the crop reviver to keep them cool go to manscapes.com craft computing to get twenty percent off free international shipping and two additional free gifts that's manscapes.com craft computing and remember your balls will thank you welcome back to craft computing everyone as always i'm jeff in my last video i covered how to set up an iscsi share inside truenas and connect to it from your windows pc if you're interested in how to set that up i'll have a link down in the video description today i'm going to be going over a zfs feature that in my opinion gets an incredibly bum rap because of misinformation in the community and hearsay that is repeated in every forum post where the topic comes up and that is zfs deduplication deduplication is a concept for storage that has been around since the dawn of file sharing the basic idea is exactly what it sounds like rather than keeping 10 duplicates of the same file on a nas in different locations deduplication will only keep a single copy and symbolically link to other references of that file instead of 10 copies we only need to keep one saving on storage space and actually improving performance when implemented properly what's more if you have 10 versions of the same file just with minor differences between them deduplication helps save space there as well keeping the original plus modified bits without needing to save an entirely new file all of this happens in the background of your file server and is even hard to see how well it's working inside truenas or any other os for that matter when asking about dedupe configurations you'll often hear these common phrases repeated in forums it's ram and cpu intensive so you don't want to use it it slows down performance on everything why would you need that just buy more storage it really doesn't save much space and it makes everything else slower so no one really uses it my cousin in trinidad implemented zfs deduplication and his testicles became swollen and his fiancee left him a friend of mine used deduplication once and it replaced all of his mp3s with mbop by hanson if you want to save space just compress your files with gzip9 no thanks i already ate if you're only keeping one copy doesn't that defeat the purpose of a raid deduplication is just one more thing looking at my data and i don't trust it now these oft-repeated phrases came from somewhere so let's go over what deduplication actually is how it works when it makes sense to use it and what you'll need to be successful with it the first thing we should cover is should you use zfs deduplication and like all questions for storage servers the answer is it depends if your storage pool is made up of data that is largely unique you're not only going to increase your hardware requirements but also potentially hurt your overall performance think of something like a plex library where your storage consists of thousands of unique files with no chance of there being any duplicates when data is written to a dedupe enabled volume zfs will store a checksum of that data in a table to compare against everything else in the pool this checksum can be stored in either system memory on dedicated storage disks or on the zfs disks themselves each unique storage block receives its own checksum entry and each checksum entry is compared to the rest of the table during new writes because every unique storage block takes up tablespace the more unique blocks you have the more processing power is required to check incoming data against the data you already have stored but the more duplicated data you have the faster those blocks can be accessed and the less memory address space is required for comparisons one thing i often hear repeated is zfs already requires a ton of memory and zfsd duplication requires even more so no one ever uses it now while it's true that zfs will use practically all the memory that you give it there isn't a massive hit to enable deduplication on a server either needing only an additional one gigabyte of ram per one terabyte of logical storage that means the three terabyte z vault i set up in the last video would only need an additional three gigabytes of memory from the server since deduplication works best with a high volume of duplicate files it's an ideal system for storing virtual machine drives game libraries mail servers and any other scenario where data is likely to be the same between two different sources surely you don't think all 20 gigabytes of files in a windows 10 install are unique to that system right wouldn't it be great if your four linux vms only required the install space of one or for last week's steam library over iscsi you can install microsoft flight sim onto every pc in your house while only requiring the storage space for a single copy so what is required for setting all this up i recommend creating a brand new data set or better yet a brand new storage pool as deduplication does save data to disk a little bit differently than standard zfs hardware wise you'll need one gigabyte of ram per one terabyte of logical storage in addition to the base system requirements for trunas i'd also recommend picking up a pair of right endurant ssds to store your deduplication tables on just like system memory zfs table storage on the ssds only requires around one gigabyte per one terabyte of logical storage so a large ssd is definitely not necessary something like a 32gb intel optane module a fusion i o card or a sun accelerator card like i took a look at earlier this year would be perfect options for this project for today's video i'm going to be using two drives off of a sun f-80 800 gigabyte accelerator card as the d-dupe table storage it is massively overkill for this job but it's also one of the most write resilient ssds that i own and because there are four drives on a single card i have the added bonus of having a couple hot spares already installed on the server in the event of a failure to get started i'm going to be creating a new data pool inside my trunas server using four seagate es2 3 terabyte sas drives i'll also be adding a zfs table v dev using a sun f80 ssd now again you could rely on system memory for deduplication and in some cases it actually may be faster but every time you reboot your server truenas will need to copy that table from local storage to memory again and i just assume use a couple ssds in raid1 to store the table information permanently to start out open up your truenow's web interface head over to the storage tab and click on pools again i'm going to be creating a new pool so i'm going to click on the add button in the top right of the window give your new pool a name in my case i'm going to be calling it zfsdoop for this pool like i said i'm going to be using 4 3 terabyte sas drives just click the check mark next to each drive you want to use and click the arrow to the right to add them to your data v dev before you create the new pool though we need to add our ssds for the d-dupe table storage at the top of the window click on the add v-dev drop-down menu scroll down and click on add d-dupe since the sun f-80 is actually four 200 gigabyte sas ssds on a single card it shows up that way under my disk selection i'm going to choose drives 0 and 1 and add them to the d-dupe v-dev for the data v-dev i'm going to select raid z so 4 drives with a single disk parity and for d-dupe we'll use a mirror to keep things up and running and if everything looks good go ahead and click on create back in the storage pool screen you should see your newly created pool deduplication is not enabled by default even with a d-dupe v-dev added to your storage pool deduplication is an inheritable feature so if you want it enabled on all data sets and z vols inside your pool go ahead and enable it in the storage pool settings alternatively you can select individual data sets or z vols and enable them independently now i've already done videos on setting up windows file shares as well as iscsi drives so i'm not going to go through that process right now again if you need specific tutorials for those make sure to check out my true nas playlist down in the video description but essentially what i've done here is create a single data set with a windows smb file share attached which is about as basic a setup as you can get on the smb file share i've copied over some windows iso files around 35 gigabytes worth now despite these being all windows installs there's not likely to be any duplicate blocks inside these files so they're going to take up around that full 35 gigabytes of space but what if i accidentally made another directory with the same file somewhere else on the file share oops now i have two sets consuming nearly 70 gigabytes at least that's what windows is going to tell me since it sees two sets of iso files in two different locations i also copied over the game files for doom for my steam library just for good measure and wouldn't you know it i accidentally copied it over twice my bad now if i'd only moved over a single copy of each the iso and doom files i'd be looking at a total file size of around 96.9 gigabytes but windows reports this as double that at 193 gigabytes jumping back over to truenas if we take a look at the zfs storage dataset you can see it actually agrees with windows and there's 193 gigabytes of space being used wasn't this supposed to prevent this exact thing from happening and in real time yes it is and yes it did both windows and trunas see files in both directories and those files are real files that take up real space the trick is deduplication isn't really visible to the file system itself as again we're looking at the raw storage block data but there's a really simple way to see if deduplication is working and to see how much space is actually being used on your file server open up a new shell session inside truenas and type in z pool list it should return all the pools you have configured along with some stats for each of them we can see the zfsd pool we set up earlier has 11 terabytes allocated that is 11 terabytes of physical storage this includes reserved space for the parity inside the raid z we set up as well as the ssds we added for table storage but the column we're interested in is the d-dupe column where z-pool will list a ratio of space consumed versus space saved with deduplication in that zfsd pool i created a file share and inside that file share i have two identical sets of isos as well as two identical installs for doom the ratio is 2x meaning that windows and zfs are both displaying two times the actual used space if i divide the 208 billion two hundred fifty six million seven hundred six thousand five hundred sixty bytes being used by the shared directory by the ratio of two what i'm left with is one hundred four billion one 128 million 350 3780 bytes or around 96.9 gigabytes or around the storage space of both the iso folder and the doom installation folder isn't math cool now of course the purpose of deduplication isn't just to throw 100 copies of every windows iso you have into a folder and call it a day it's meant to cut down on bloat from multiple clients saving identical data to what is essentially the same exact place moving back to my last video with hosting your steam library on an iscsi share let's say you have three pcs in the house and all of them want to have red dead redemption 2 installed if you were to just use an iscsi drive without deduplication each pc would need to download and install its own copy onto the nas consuming around 357 gigabytes worth of space but if you do have deduplication enabled those same three copies would only consume around 120 gigabytes in total all completely invisible to your pc in theory you could also set up replication tasks between those iscsi drives keeping everyone's game libraries up to date whenever one of the clients is updated or better yet installing the game to every pc by only downloading it onto one for a home with multiple gaming pcs it really could be a game changer for both ease of use and for bandwidth as you would only have to download a game a single time and play it on all of your pcs but all of that savings and storage is moot if the forum dwellers are to be believed so is there a performance hit for enabling deduplication on zfs looking only at the right speeds i transferred two different sets of files to two different storage pools first off was 35 gigabytes of isos to represent sequential data rights and secondly were the doom game files i used for a more random file assortment both storage pools were set up identically with four three terabyte sas drives the only difference being deduplication enabled on the second set with a pair of ssds as i mentioned during the setup process not only did i not see a performance hit on the deduplication pool it was actually faster by a pretty good margin sequential rights ran at 177 megabytes per second versus 164 megabytes per second on the non-ddo pool or around eight percent faster random rights were predictably slower but still quicker on the d-do pool at 173 megabytes per second versus 155 or a full 12 percent faster but that doesn't quite tell the full story as we were writing data to each pool for the first time what about data that already exists in a storage pool if deduplication is actually comparing it on the fly will it be faster or slower than writing data for the first time round as it turns out deduplication kind of works as advertised with sequential rights cruising at a blistering 404 megabytes per second with random writes not that far behind at 396. that's a 230 percent increase when data already exists in a storage pool which means in the right scenario deduplication not only will save you storage space you might actually see a performance increase to go along with it while the documentation around duplication said as much i'm not sure i actually expected to see this result bear out gotta say i'm pretty impressed like i mentioned at the beginning of this video there are three different methods to store the zfs deduplication table you can save it locally to your zfs data disks keep it in memory or use dedicated ssds like i did here looking at the results from my testing with ssds i can't imagine getting better results from either of the other two methods which might be where some of the negative assumptions come from so keep that in mind when designing your own deduplication infrastructure so why have i been looking so hard at highly duplicated data lately well i'm working on some of the finishing touches to my cloud gaming server and one of the last keys to that setup is getting data and game installs out to all of my virtual machines at the same time so stay tuned for that one in the next month or so in the meantime if you like this video make sure to hit that thumbs up button and subscribe to craft computing if you haven't done so already follow me on twitter craft computing to keep up with daily shenanigans like this and if you like the content you see on this channel and want to help support me in what i do head on over to craftcomputing.store and pick yourself up a pint glass that's going to do it for me in this one guys thank you all so much for watching and as always i will see you in the next video cheers guys beer for today is from terrapin beer company in athens georgia it is the i quote vanilla chai latte waken bake coffee oatmeal imperial stout 2020 reserve so not only what should be a pretty good stout but also in the running for longest beer name of all time vanilla lactose sugar and chai spice flavor magnificently combines with jittery joe's wake and baked coffee to create this robust and savory oatmeal imperial stout the balance between these satisfying flavors helps to produce a beer that makes those cold winter nights easy to enjoy sip after sip cheers right as a leaf blower decides to start walking by my house screw it we're just gonna roll with it uh 9.4 percent so a little bit of a heavy hitter this breakfast leaning stout oh man i can smell chai all the way from here boy there's chai there's cinnamon maybe a little nutmeg in there man if this is what every pumpkin spice latte came out like i would totally order these every single time i go to starbucks oh and the smoothness does not stop at the nose wow that is rich creamy um latte like as far as consistency goes it's got this this steamed frothy marshmallow mouth feel to it whatever they were going for the vanilla chai latte wake and bake coffee oatmeal imperial stout i think they nailed it uh like i said latte like on on the on the texture of it but it's still got this meaty is not the right word but this thick chewy back end to it that you get in usually some very large oatmeal soups like this that's really good who the wrote this all right i did that is a stupid stupid stupid stupid stupid stupid paragraph
Info
Channel: Craft Computing
Views: 34,491
Rating: undefined out of 5
Keywords:
Id: KjjSJJLKS_s
Channel Id: undefined
Length: 17min 55sec (1075 seconds)
Published: Fri Oct 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.