Using dRAID In ZFS for Faster Rebuilds On Large Arrays.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
One issue facing RAID arrays is long rebuild times, or the amount of time it takes to replace a failed drive with the parity drive that's on other disks. This can often mean that a hot spare is replacing a failed drive, or a user put in a new drive. This is exacerbated with these new ultra-large drives, like an 18TB drive I have here. Modern large drives have been growing in capacity much faster than speed has, which means that RAID rebuild times just keep getting longer with these newer drives. Some of these drives can take multiple days to rebuild an array, which can leave an array vulnerable to more drives failing causing data loss during that time and potentially reduced performance. One thing that can improve the rebuild times if you're using ZFS is dRAID. dRAID came out in Open ZFS 2.1 a couple years ago now, which means it should be integrated into most other products that use ZFS, like TrueNAS or Proxmox, and in comparison to traditional RAID Z, dRAID integrates the hot spare drive into the rest of the data. In ZFS, if you add a hot spare, that hot spare drive will just sit there and wait for an additional drive to fail in the array, and then once that drive fails, the hot spare will jump in and replace the drive. But with dRAID, the hot spare is just part of the array, like any other drive would be in that array, and it splits data between the hot spare there. So instead of having a drive dedicated to being empty and waiting, it essentially splits out that empty part to all of the drives. So it would still have about the same amount of empty space, it's just that that empty space is split across all of the different disks in the array. And this means that the data can be rebuilt much faster, typically about 60% of the speed of a traditional RAID Z setup in my testing, though I have more details later on in this video. dRAID also differs from RAID Z in a couple of other ways. Most notably, it has a fixed stripe width. And what that means is in RAID Z, if you write a small amount of data that doesn't need the full width of all the drives, it'll just write maybe two data drives and one parity. Even though it could write the full eight data drives, a small file won't need to do that. But on dRAID, it might only need two drives of data and one of parity, but the rest of these will be filled up with zeros, which means it's going to be less efficient for lots of small files compared to more traditional RAID Z setups. One thing to mitigate this is using a special drive, which stores metadata and all the small files up to a given size which you can set in ZFS. And that would mean your system will handle those small drives' files better, as it just stores it on a dedicated, typically SSD drive for all of the metadata and small files. The main pros of going with dRAID is the much faster rebuild time. It also allows you to integrate your spare drive into the rest of the array, which can lead to better I-O in some circumstances and spreads out the I-O between the disks better. And also allows for a little bit more flexibility when it comes to setting up disks. So for example, in traditional RAID Z setups, you need to make sure all of them work. So for example, if you wanted a four drive RAID Z1 and another four drive RAID Z1, you'd have to have a multiple of four drives. But with dRAID, you can kind of split it out. So you could have a four drive RAID Z1, kind of, but split it across 11 drives if you want, with one of those being a spare capacity worth of drives. dRAID does have some disadvantages compared to the more traditional RAID Z setups. As I talked about earlier, it does struggle with lots of small files, and a special drive should be added if you want to use dRAID and lots of small files. dRAID also requires an extended period to rebuild the drive if you want to replace a spare drive that's already failed in dRAID. Whereas with RAID Z, where the hot spare is essentially unused, you can replace it near instantly, dRAID is typically less flexible than RAID Z setups when it comes to expansion or adding drives, although ZFS is typically not the best here. Anyways, in a setup with multiple V-Devs of a RAID Z setup, you can expand the RAID Z later on when that feature comes into the main versions. You can also replace all the drives in a single RAID Z V-Dev with larger ones to get more space, and you can also add another RAID Z V-Dev later on to get more space on the array. Doing this in dRAID really is not the greatest and typically is best for configurations that won't change. Now let's take a look at performance and rebuild times. All of my tests were done with 2 2 terabyte hard drives on a Duo LGA 1366 server. For my OS, I was using Debian 12 Bookworm with ZFS 2.1.13 on it, and this system was fully up to date as of November 2023. I used FIO to do my performance testing and would create a 10 terabyte file and run performance tests on that file. Then, in order to test the rebuild speeds, I'd pull out one of the drives in the array and see how long ZFS reported that it would take to rebuild the whole array. Let's first take a look at rebuild speeds, the most important feature of dRAID. And in here, I made a comparison between a few different RAID Z configurations and their most equivalent dRAID configurations. And dRAID did much better than RAID Z when it comes to rebuild times. The worst case configuration with my massive RAID Z 3, rebuilded in a dRAID equivalent of roughly 60% of the time, was the fastest case with having 6 very small RAID Z 1s, rebuilded in about 25% of the time of the equivalent RAID Z 1 setup, which means it rebuilded 4 times faster on dRAID. This shows that dRAIDs spread out spare data between all the drives and other features like having the large sequential stripes, help significantly speed up the resilver process in ZFS, and means that the array spends much less time degraded than the equivalent RAID Z setup. I also made this interesting chart here that showed all the different dRAID, RAID Z, and Stripe mirror configurations I had and the rebuild times. This shows that it's better to have more narrow arrays rather than one giant RAID Z 3, for example, if your goal is to have the fastest possible rebuild speeds. In my testing, dRAID always did better than the equivalent RAID Z setup when it comes to rebuild times, so if you want to have the fastest possible rebuild times, use dRAID instead of RAID Z, and also minimize the width of your stripes. So try to have a more narrow stripes and have more of those stripes spread out between all of the disks, instead of something like a massive 15 disk RAID Z, for example. Now let's talk about disk IO performance of these different dRAID configurations. For disk IO performance testing, I used FIO to test a performance, and I did a random read, random write, sequential read, and sequential write test of all of these different disk configurations. The ZFS documents claim that dRAID and RAID Z should have roughly the same amount of performance, and my testing found that they were generally fairly close in performance, but not typically exactly the same. When there would really only be one stripe wide in a RAID Z or dRAID configuration, so for example with my 20 drives, I would have 2 as spare, and an 18 drive for RAID Z3, it turned out that dRAID was typically a little bit faster. I'd have to guess that's because it could use those extra drives that are registered as spare drives in dRAID to help with performance, because those also contain data, but in a RAID Z array, the spare drives are empty and don't help with performance at all. But when I had multiple RAID Z V-Devs that were all striped together into the large pool, it turned out that performance was a little bit better with the RAID Z configurations compared to the most equivalent dRAID configurations. I'd have to guess this is something about it striping the performance is a little bit better when it comes to striping it between multiple V-Devs versus one large dRAID V-Dev. I also tried doing some uneven performance testing in D-RADE to see if that would affect things. So for example, I had a total of 3 data drives and 1 parity drive for 4 wide, but I had a total of 9 drives in the array. This is not an even configuration because 4 cannot evenly go into 9, but dRAID still had decent performance, so I wouldn't be too worried if things don't add up perfectly in D-RADE. I also tried to set up where I only used 10 drives instead of 20 drives, and if I did use spare drives, dRAID had significantly faster rebuild times compared to the equivalent RAID Z setup, but dRAID doesn't really make too much sense in small arrays because you're likely not to have a hot spare anyways in a very small array. If you set up a large ZFS pool before with multiple different RAID Z V-Devs in between, you likely had a configuration where you had the top level pool, and then you had a RAID Z and it had multiple drives in a large parity array here, and then another RAID Z with multiple drives in its pool, and I can go so on to have a large array. dRAID is a little bit different in how you do configurations because instead of having RAID Z, the drives in the RAID Z, the RAID, the next RAID Z, the drives in the next RAID Z, draid has the dRAID configuration and then all of the drives together. So instead of configuring it by the number of drives and which drives you put in which RAID Z, that's what these numbers indicate here. The first one is going to be the dRAID level, or the amount of parity drives in the array per stripe. So this can be either 1, 2, or 3, and if you say just dRAID, that is the equivalent of saying dRAID 1. The next number after is going to be the amount of data drives. You can think of this as the RAID Z equivalent. So for example, if I had a RAID Z2 of 4 drives, this would mean that 2 drives because RAID Z2 are parity and the other 4 are going to be data. So the dRAID equivalent would be dRAID2, 4D. So 4 data drives and the other 2 are going to be the parity drives in D-RADE. The next number after is going to be the total number of spares in the whole array. So in this case, that is going to be 2 spares. And the last number at the very end is going to be just a check. So this stands for children here and is the total number of drives. So if I have 20C, I want to make sure that I have actually 20 drives after that total number. If it's incorrect, it'll say, "You said you wanted to have this many children drives, but you don't actually have that many." Go back and check because that number is not correct. Now let's do one more example to hopefully make these numbers make a little bit more sense. I'm going to be using my 20 drive array and I think that 1 spare drive makes a good amount of sense with 20 mechanical hard drives. So that means I'm going to want a 1S in the dRAID configuration and 20C for the total number of children or drives in this dRAID configuration. Then it comes to configuring my dRAID level and the amount of data drives per dRAID stripe on this array. This really comes to how you want to optimize things between performance, resiliency, and other settings. But I think something like a dRAID2, which would allow for 2 drives to fail at any given time and also still has 1 spare that will come to quickly replace it if needed. And then maybe I'm going to have 8 data drives. Then because I said I wanted 1 spare earlier, I'm going to do 1 spare now and then I'm going to do 20C. And I'm going to press enter to go create this array right now and it'll appear on my system. Looking at the Z-pool status of this array that I just created, I can see my 20 drives right here in my D-RADE configuration. And I can see this configuration I just made earlier. dRAID2, 8 data drives, 20 children and 1 spare total on all of these drives. Now I'm going to pull out my phone to use it as a calculator to calculate how much free space I should have and see if that lines up with what the ZFS list shows. The first thing to note is that almost all space in your OS is going to be tibibytes instead of terabytes. So a 2 terabyte hard drive is approximately 1.8 tibby bytes of space. Then the first thing is, this 1 spare drive is not going to be usable capacity. So I'm going to just multiply this by 19. This number on my phone is the amount of total parity and data that will be stored on the array. So I already removed the amount of spare space that isn't going to be used by dRAID at all. Then I need to find out how much of this is actual data. One thing to note is because 2 plus 8 is a total width is 10 and I take 20, which is a total amount of my drives, subtract 1 and get 19. That doesn't spread evenly against 19 drives. So I can't do the super simple math. So what I'm going to do now is I'm going to do the ratio of data to parity. So in this case I have 8 data into a total of 10 drives. So 2 parity plus 8 data. So I'm going to multiply this by 0.8. And this gives me about 27.36 tibby bytes of space that I believe should be available. So now I'm going to go run ZFS list on this system and I get 27.5. And shows that if you want to estimate the amount of space on the D-RADE array, what I would do is take your total amount of drives, subtract the amount of spare drives you're using and then multiply it by the amount of data to parity ratio. I also want to point out if you're using something like Proxmox, you can also set up D-RADE here. And it's a little bit different of an interface, but you can set up D-RADE 2 as your RAID level here. Your data D devs in this case. So that would be 8 for my previous configuration and my spare here. The total number of drives or children is set by just selecting which drives you want to use in this case here. Now I want to talk about where dRAID makes a good amount of sense for implementations and where it might not be the best option. It's always hard to make hard recommendations in videos like this because everyone has different levels of risk and how likely they think a drive is to repair and how bad it would be in their environment to have a RAID degraded due to a couple of drives failing. But just here are some of my thoughts. I think dRAID makes a lot of sense if you have a large storage server, don't plan to replace any of the drives inside it or add more drives later on, and would otherwise have a hot spare. This makes the most sense to me in something like one of those 60 drive top loading enclosures. Lots of drives, you're likely going to fill it with large hard drives these days if you're buying a new one. Anyways, you can't add more drives because it's already filled with 60 drives and dRAID makes a lot of sense here. Having faster rebuilds makes a lot of sense with 60 drives where the chance of having one fail is relatively high. dRAID doesn't make sense to me in something like a small NAS or something where you only have 4 drives. D-RADE doesn't really have any difference if you're not going to have a hot spare. And having a hot spare in a 4 drive enclosure doesn't really make more sense to me. In a small enclosure, I think it makes a lot more sense to just have another parity drive. So for example going from RAID Z1 to RAID Z2 or Z2 to Z3 instead of having dRAID and a hot spare. Thanks for watching this introduction to ZFS dRAID. Let me know if you have a large drive enclosure and if you think ZF SdRAID would make a lot of sense for your implementation. At least to me it's cool to see new features like dRAID being added to ZFS and I'd love to hear your thoughts on this in the comments below. Thanks for watching.
Info
Channel: ElectronicsWizardry
Views: 2,907
Rating: undefined out of 5
Keywords:
Id: lmVcUFe48bc
Channel Id: undefined
Length: 14min 11sec (851 seconds)
Published: Fri Dec 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.