RAID: Obsolete? New Tech BTRFS/ZFS and "traditional" RAID

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

There are some things here that need a little touching on. He mentions that Google relies on their software to correct errors on their hardware, and that they use many many cheap SATA drives to store the volume of data they need. While the part about the cheap SATA drives may be true, I have a very length case study in PDF form (that I can post if I can find it) regarding Google's handling of large drives and RAID rebuilds basically taking too long and suffering failure during rebuild. The solution is replication, not correction. They're not rebuilding/correcting but rather they just have tons and tons of replication. Because their arrays are so large with such large disks they are no longer relying on the rebuild of arrays and instead they simple offline a bad array and rebuild it in the background by replication, not rebuild. It's a very wide and flat scheme.

I read a blog post from someone which made a good point even though it was a little tongue-in-cheek: Run large arrays in RAID0 - you'll get the benefit of speed and you won't be living on the assumption that you can rebuild it. As a result your backups and replication will be PERFECT. A little edgy of an idea, but no bad overall - when it comes to data preservation, ZFS/butterfs/etc are all great, but nothing beats replication to many redundant arrays.

👍︎︎ 10 👤︎︎ u/5mall5nail5 📅︎︎ Mar 20 2015 🗫︎ replies

+1 for the Tek Syndicate guys

👍︎︎ 6 👤︎︎ u/zaps45 📅︎︎ Mar 20 2015 🗫︎ replies

Very interesting, I still like mdraid as a low-end option though.

Those zones at the end of the video didn't seem clickable, I was interested in buying him a beer.

👍︎︎ 4 👤︎︎ u/pemboa 📅︎︎ Mar 20 2015 🗫︎ replies

There are some glaring problems with this presentation.

First, a NAS is a "network attached storage". This is very different from a SAN (storage area network). In the former, the disks are typically given a single IP address, and a filesystem is exported over the network (NFS, CIFS, etc.). The latter is a local network of multiple disks or devices. These are typically exported over iSCSI or Fibre Channel, and the storage can be accessed via raw blocks. While the end goal may be the same, they are very different. I feel that he didn't explain this well at all.

Second, he's treating software RAID as a second class citizen in storage, without addressing any of the hardware controller issues. Even worse, he brushes off Linux software RAID as something an enterprise wouldn't even want to consider if you're serious about data integrity.

Given the abundance of hardware resources in CPU, bus, and RAM these days, software RAID will usually outperform hardware RAID controllers. Linux software RAID also supports TRIM if the underlying disks are SSDs, which is virtually unheard of in hardware. Linux software RAID supports external metadata formats, allowing the use of fake RAID.

Hardware RAID may have a battery backup, but unless you're willing to front a good deal of cash, they are slower, and prone to failure.

Thirdly, ZFS and BTRFS do not need battery backed controllers to keep the filesystem in a consistent state. Both are atomic in nature, meaning you get all of the write, or none of the write. In the event of a power failure, you're filesystem will still be consistent, just old. With persistent external SSD write caches, which both ZFS and BTRFS both support, the transaction will be present on the next boot, and flushed when the pool is available. There is no need for battery backed controllers with ZFS or BTRFS.

Fourthly, ZFS does not need tons of RAM, unless you're using deduplication, or a large L2ARC filled to capacity. ZFS gets by fine on 2 GB RAM installs on laptops or workstations. Yes, it will use all the RAM you can allocate for it, but this isn't a bad thing, and it's tunable.

Fifthly, ZFS and BTRFS are true software RAID managers. Yes, they're filesystems, but they employ legitimate software RAID. The RAID1, RAID5, and RAID6 in BTRFS and Linux MDRAID are standard RAID levels. In ZFS, RAIDZ1, RAIDZ2, and RAIDZ3 are certainly nonstandard, but it is legitimate RAID taking advantage of all the disks in the array. Nothing is being fudged, in any case.

Finally, the large advantage of software RAID over hardware, is the ability to migrate disks from one system to another. ZFS, BTRFS, and Linux MDRAID make this virtually painless. Not so with hardware RAID controllers. With proprietary controllers, you are locked in to that vendor when migrating disks. This is partly the reason SANs are so expensive. You have to have a controlling head that will talk to the disk shelves, and the disks cannot be migrated from one vendor to anther without data loss or multiple copies.

2 stars.

👍︎︎ 3 👤︎︎ u/atoponce 📅︎︎ Mar 21 2015 🗫︎ replies

Isn't this just some dude reading me Wikipedia?

👍︎︎ 3 👤︎︎ u/appleflaxen 📅︎︎ Mar 20 2015 🗫︎ replies

Seth Rogen speaking RAID, awesome

👍︎︎ 2 👤︎︎ u/nwefnaiosf 📅︎︎ Mar 20 2015 🗫︎ replies

I'm pretty sure this guy is ... either oversimplifying, or flat out wrong in many cases. I think he's really overplaying the occurrence of data corruption based on the "Write Hole" - this is an issue with single disk desktops also, but you can, 99 times out of 100, pull the power cord from Windows, plug it back in and boot ok. IME, and yes, this is just my experience, but MDM very rarely if ever kills an array because you pull or lose power and then power it back on.

That said, I do like ZFS and use it at home, but again, I can run it fine for a home system on 2GB of RAM, and for videos at least, I can pull power whenever and not lose any noticeable data... The system is pretty old though I did just upgrade the disks. In the future when I rebuild the whole thing, I will likely go ECC and more RAM etc...

👍︎︎ 1 👤︎︎ u/jmp242 📅︎︎ Mar 20 2015 🗫︎ replies

Captions

you trust your radar a with your data you shouldn't raid is an acronym that means redundant array of inexpensive disks the assumption is that if you have stored your data on a set of hard drives the data is stored in such a way that if a drive fails you lose no data and in case you didn't know raid is no substitute for a backup in fact I'll say it now if you work in the enterprise and you want to store one gigabyte of data you're going to need to store four to eight gigabytes worth the face in total to be reasonably assured that independent that you were not going to lose that that said at a minimum you need three completely independent copies with room enough for spatial and temporal snapshots with differences of the data so back to raid raid is rarely you know as simple as it's made out to be so what's being tested in this video for me the priorities of a raid are don't allow the data to become corrupt to experience bit rot in other words the data that you put on the array should be the data that you get back from the array even after a hardware failure replacing a failed drive should work and that and not corrupt the data as a result of the drive having been replaced corrupt data ahead of outride drive failure should be detected reported and corrected the raid mechanism really should not rely on just the mechanisms of the mechanical hard drives but all too often that's that's the case and the performance should be reasonable for the number of spindles involved and so what we're really talking about here is that your data integrity is the most important thing and then everything else speed and everything else that follows that is sort of the lesser important metrics I mean sometimes you see things on the internet it's like raid zero SSDs for speed while it's insanely dangerous with your data but you know if you want speed guess that's what you can do alright so let me start by explaining raid 1 raid 5 and raid 6 you can probably skip over this part and if you already know okay so raid 1 typically that's a simple mirror you have two drives set up such that the same information is written to both drives if one of the drives dies and goes offline you can read the data the other drive it's basically just a mirror when you have a write you write it to both the drives and if one of the drives dies it's you know the drive makes sense straightforward that's easy enough raid 5 is a little more complex than the three drive arrangement or you've got three drives operating in raid 5 any one drive can die but how does that work exactly that any drive can die well the mechanism for this is that the data is split into at least two chunks and because then we have more into that in a second a special type of mathematical operation is done on those two chunks of data and it's the type of computation is called a parity computation it computes the parity essentially the parity is just a mathematical difference between the two blocks of data it is chosen this way in order such that if one of the two chunks is lost and you still have the parity the lost chunk can be recomputed so if you have chunk one and chunk two chunk one is lost you're able to look at the data in chunk one and the parity and recompute what the missing chunk is that's basically all that is and so raid six is sort of an incremental improvement on raid 5 and that you have an additional chunk for parity now for just like three drives or four drives I mean you could do a four drive raid 6 but it really doesn't matter we're talking raid sixes you see more when you've got like 6 7 8 9 hard drives because people that were running you know raid 5 with like five or six or seven hard drives would experience multiple failures just because there's so much data stored on modern hard drives you know terabytes and terabytes of data and so like one drive would die a five drive array and then when they replace the drive that died with a blank drive and then it's reading all the stuff off all the other drives there would be another failure and so raid 6 gives you a second set of parity so that you can recover from a trick to drive failure if necessary just for the sake of completeness of the definitions let's suppose that the raid 5 that we're talking about what's not three disks and instead it was five disks the data would be split into four chunks plus you would have a parity chunk now some people say that you have four data drives in one parity Drive this is a fine way to think about it but it's not really what's happening under the hood it's not really what's happening underneath in reality all modern systems that take parity data and spread it evenly across all drives in a given pool it is accurate to say that in a five drive raid 5 array you would have four drives worth of space with one drives worth of space going to parity but that parity is not actually all on one disk it's spread evenly across all of the disks so just something to keep in mind if you're super pedantic now some of the systems that we're going to look at are not exactly raid 1 raid 5 or raid 6 it's a different kind of raid and that's sort of the point here now what I've told you about raid arrays you might already know I've given you a hint about the doomsday scenario that's coming up though and that is what happens if you don't necessarily have a drive failure in one of these drive tools but it's did the drive fails in such a way that it's lying and returning bad data so if you've got raid 1 if you've got a mirror and one of the drives in the mirror has gone evil and it is returning data its returning incorrect data but it's not otherwise really reporting that it's failed what what does that handle what you know what happened in that situation think about that and we'll come back to that in a second so before we get into that let's talk about the different kinds of things that we're going to test now when you get right down to it we are going to be looking at the same things here that really I mean between the hardware and software options and some of the other stuff you could say that it's really an apples to oranges comparison and that's really just because this type of technology is in a transitionary phase the sort of the raid system the old guard like the way that it used to be done with raid is really kind of out the window for some different reasons it'll make sense later but it is a fair statement to say that some of the stuff we're looking at we're really sort of comparing apples and oranges but that's ok because this is something that like when I'm doing an audit when I'm doing an enterprise audit or something like that this type of misunderstanding just comes up over and over again it's like a raid is kind of substitute for a backup raid is not bulletproof in terms of some some oddball situations you can get in - you know a drive failure is the most expected way that you're going to have a problem but you can have sort of a Black Swan event where something happens and you know it was sort of an unpredicted type of failure and then you get into uncharted territory and it's just really bad you run into this in the enterprise all the time when you've got really expensive stuff you've got to take care of you're going to have these really weird edge cases and it's not going to make any sense in the past the way rate has worked is that you have a piece of hardware or if you're very unlucky a piece of software would take all of your discs and present it to the operating system or you know presented to whatever as one huge pool of storage and that piece of hardware or that piece of software if you were unlucky would be operating behind the scenes underneath the operating system to make sure that you know failures were handled and that errors are corrected and that sort of thing and really when we're looking at some of the more modern approaches to raid with with file systems like ZFS and btrfs or butter FS on linux those file systems really need access to the hardware that underlies the array there's not really a hardware controller that is sort of managing that aspect of it it's sort of managed in software and so it is a software raid solution but it's not a ghetto software raid solution it's a software write solution that's typically better than a hardware raid solution but you've got the overhead that comes with that and so it's kind of a trade-off if you've worked with enterprise storage systems you probably already know that you know raid is basically an obsolete technology and you probably have a saying well what's a saying exactly and for the purposes of this video and we're not really gonna get into sans much here but it's just something to keep in mind a SAN is really a server that presents pool storage on the network and it typically does things that are more advanced than simple raid you've probably heard of an ass that's network-attached storage typical NASA's do simple raid but they don't do some of the advanced things we're going to talk about they don't really do things to protect you against bit rot because remember bit rot is the most important thing that we're talking about in the context of this video in the context of long term data retention in context of not losing your data not losing the crap that you arrey typically a sand will offer up its storage to a server as a nice cozy target or NFS or other proprietary protocols and so it really takes care of the underlying mechanism it doesn't just you know present it doesn't it actually does a lot of work between the actual storage and data retrieval and data retention there's a lot of stuff going on there which is like ZFS and btrfs but it's not really like you know a hardware RAID controller like a dell perk or an LSI controller or a software raid controller so essentially raid as we've defined it for raid one five and six relies on hard drives to self-report that they may be dying or that they might have bad data or that they're failing or you know whatever there are two types of disk interfaces that are popular right now serial Attached scuzzy that's SAS and SATA now SATA is probably the one that you're familiar with SAS is what's more using the enterprise and servers like medium in servers on up sometimes you'll see SAS in near line devices that are like SATA drives they're really just souped up SATA drives you see those in the enterprise typically the difference between SAS and SATA is that SAS is considerably more expensive but also SAS has better air built-in error detection and error correction and they've got you know there's stuff in the drive to try to do sanity checks and error correction much more so than you run into with SATA drives so SATA in the enterprise still a thing still totally happens but you know you also see SATA used in large installations you know companies like Google and Facebook and those types of companies are using really really cheap pretty much the cheapest they can get SATA drives and they really rely not for everything but for a lot of things and they rely on their software stack to correct errors with that hardware because that I mean conservative industry estimates with that hardware at ten times less reliable than the equivalent you know SAS or Enterprise great gear so that's something to keep in mind now it's also true on the SATA side of things that vendors are aware of this and they are segmenting their market with multiple different SATA drives that address some of the shortcomings of SATA and so like a Western Digital and Seagate each have different classes of SATA hard drive like oh this one's for video storage auditions for Nass this one's for whatever and typically the firmware and air correction and some of the other stuff there has been tweaked to be more appropriate for each of those use cases and then they charge appropriately so whether or not that's worth it left to an extras left as an exercise to the reader but something to keep in mind now for our videos for our purposes what are we actually going to look at it's like okay we're going to take a look at a few different things we're going to look at MD on Linux this is Linux multi device in a real real good name this is a software raid solution and this is raid in the traditional sense this is a particular set up we're going to look at is raid 5 with three hard drives and we're going to format this right file file system with ext for the ext4 filesystem we're also going to look at btrfs or sometimes it's called butter FS we're looking at raid 1 with btrfs because raid 5 raid 6 with btrfs is not really well supported yet they literally just checked in some changes for colonel 3.1 9.1 that is supposed to enable raid 5 and raid 6 it is still not exactly stable and it is very in my opinion very far from being ready for production use that's it I know people have been running it for years I have not been running it for years I've been running it for 6 months probably it's not as raid 1 is basically ok but anything other than red one I think needs to bike a little longer but they're getting close and btrfs is sort of promising and we'll talk more about what BTF btrfs is in a bit and one what that means that kind of thing we're going to look at the del perk 6 it's hardware RAID controller I thought I had an LSI controller handy but I don't so if anybody wants to send me one send me one that'd be great the Parc six is a hardware RAID controller it's an 8 port SAS controller as an arm controller has I think turn 50 6 Meg's of RAM it also has a battery backup unit has its own built in battery to keep the RAM power we'll talk more about why that's needed in a minute and we're also going to look at ZFS on FreeBSD now this is a three drive raid Z one set up which is this is similar to a raid five set up but again this ZFS is aware of the drives and so is it's not quite saw its software raid but it's not software raid in the traditional sense its software raid that is designed with multiple drives in mind it's designed to have direct access to multiple drives whereas the older like MD software raid it's really just a dirty hack like no one should be running software raid in any kind of environment even like a small business environment if you just have Windows Server and it's like I'll let me click on two hard drives make it a dynamic volume turn that into a raid 100 please do not do that no that's it's instant fail on data integrity I don't know no data integrity audits just a raid one honestly if you had to do it right one is probably the only thing that would kind of halfway be okay we're like the software raid five on Windows just you're asking for trouble and unless you've got a hardware RAID controller with an armed microprocessor and a battery backup unit you don't want to be running just it's bad news it's just trust me so how are we going to do the testing then is your next question I was like well glad you asked we have some scripts that we've set up and we've published those on the website those scripts are configured to generate 500 megabyte files that are filled with either zeros or random gibberish now if you're going to fill it with random gibberish it's going to come from dev view random Devi random you can get 10 to 50 megabytes per second of random data so it's not going to be the fastest thing in the world if you want to fill it with random data for for testing purposes honestly it's fine if you want to use nothing but zero filled files for these tests so we're going to fill the drives with 500 megabyte files that are basically completely empty files and we will then compute the md5 sums of all those files so after we've computed all those files we're going to read all those files back in now here's a hint all of those files should have the same md5 sum and they will because they're all empty and are all identical so there you go now once we've established that once we've got you know the list of files the number of files the list of file to md5 some mapping we are going to do some other stuff why are we making an md5 of every file well we're going to introduce corruption on the file system we're going to corrupt the array just a little bit and then we're going to reread all the files and see if any of the files have changed an md5 is a mathematical hash of the entire contents of the file so that 500 megabytes is taken and compressed down to just you know a very short character string of you know a hexadecimal string basically that is representative of the contents of that file if the contents of that file change even a very little bit it will dramatically change the md5 sum the md5 sum will be completely different so if we introduce corruption and then we re reboot the array and they raise up and running in the array is like hey everything's fine and then we re read from the array those files and that md5 sum changes we know that we've been successful in introducing bit rot now normally this type of corruption that we're doing it wouldn't occur naturally but over time with the aging of hard drives and you know magnetic fields and just you turn the computer off at the wrong time random crap may end up being written to the drive in a similar way and so what we're doing is really sort of simulating the passage of a large amount of time it would not be unusual to have this type of corruption that we're going to introduce on one of the drives in the array for a drive that has been you know operational for a couple of years or some some drives have onboard memory the onboard memory is not error correcting and so this type of you know memory based error if there's a if there's a problem with that buffer memory on the physical hard drive controller on the hard drive then you will see some corruption like this potentially so this is not an unreasonable tab so if you want to repeat the tests you know please download the scripts from the website post on the forum forum that X indicate calm if you have any questions there's an Associated forum post that goes with this article so you can do your own tests and see if your you know see how your own controller works for this once we've done that and we've got our you know we've got our results from whether or not the array was able to recover from that we're going to pull the drive from the array that we fiddled with completely and just see if and see if the drive continues now when we take the drive out that had corruption our suspicion is that the drives that had a bad Indy 5 some will become corrected probably because when the drive is no longer reporting faulty data which is not reporting at all the parity mechanism will work like it's supposed to the problem is that with M Dean's and the other stuff the parity mechanism depends on a drive not being there in order to recompute it if all the drives are there and you go to read something and the parody's inconsistent it doesn't know which drive to trust so you've got a in a 3.85 you get a 50/50 shot of you know it returning the right data so just something to keep in mind now before i mentioned the hardware RAID controller and it has its own battery well why is that well this is a convenient way that we need to talk about something else that happens with raid arrays it's called the right hole so when you have a block of data that you're writing out into your array of disks and the power is pulled what happens well it's mess when you get that block of data and it's due to be written out it may have been written to some of the disks but it's not guaranteed to have been written to all the disks in an unexpected power failure event the drives are going to lose power at different times and so what normally has to happen is the thing to be written goes into memory and then the controller says ok let's write that out and it'll go and it will write it out to disks and when all the disks say okay we're done writing that all of the disks then it will flush it from memory and so this is a hardware controller this is also why you really need a hardware RAID controller if you've got a file system unless you've got a faucet that can deal with multiple physical devices so you get your hardware RAID controller that's got memory on it and like a buffer and so if it loses power it doesn't know of that you know block of stuff which things were written to disk in which things weren't without that memory unit you know in a five drive raid five two of the drives could have been written and the other three not and so when the computer boots back up and it goes to read that's going to have some of the information but not enough and certainly not enough to reconstruct on the parity situation and even if it did have enough to reconstruct on the parity situation it's that situation where you've got some data corruption and it doesn't know which drive to trust because it's looking at all five drives and it's like well did the right finish yes or no you don't know because there was data you know on the drives that didn't actually get written and so if the controller looks at it wrong then you're just going to get random gibberish back and so this is a bad situation and this is why pretty much any server of the last five years that has anything other than the crap raid controller will have a battery backup unit inside it the memory that's on the raid controller because the controller has to be sure that everything was written either none of the district were all the disks were written and so when the server boots back up the next time it gets power because the battery just keeps the RAM power it doesn't keep the drive spinning or anything like that the next time the server is powered up the controller boots up and says oh there's crap in RAM ok let's just rewrite everything and so it'll you know just start over for all of the disks and so the first disk or two or three or maybe all of them would have that thing that was written in memory and the controller will just write it again it's not a big deal and then once it once all the drives have confirmed that they have written it then it will release that from memory this is a pretty good system it basically ensures that the array is consistent and you don't have a problem with the right hole so the question is should be burning a hole in your mind is well wait a minute we've got MD on Linux which is used everywhere what does it do for the right hole the answer is nothing it does nothing for the right hole it's a bad situation if you reboot a Linux machine and it's an unclean shutdown meaning that you know there's it's rebooted unexpectedly and you haven't shut down da-rae it's going to start rethinking all of the parity and so when you reboot you're going to have that situation where it's you you underscore like we were seeing when we were building the array and literally the the drives are being reread and the parity is being recomputed and distributed throughout the array the entire thing is lost and so if the computers shutting down and shut down unexpectedly and one of the drives in the array goes haywire you've immediately lost all of the information on the parity because as soon as the thing comes back up it immediately starts overriding it so if you get into a situation where all the data is corrupt maybe if I had the parity I could fix it can't do it there's parity is gone because it's been overwritten because it might have been dirty some of the some of the parity might have been dirty so the only option is to just rewrite everything so that's what MD does which is sort of it really can't do anything else without the the without the memory it's not a good behavior but it's the best behavior that it can do given you know given the situation that's why you'll always want to run even if you're running btrfs or ZFS you really want a battery backup like a like a 110 volt external battery back out that will notify your machine to shut down when there's no external power because when you're running btrfs or ZFS there's not an option really or a battery backup unit on the controller because those file systems expect direct direct access to the disk and we'll get into more about why and why that in a minute let's talk about some of the differences of btrfs butter FS and ZFS with a traditional raid system the it's not exactly the same that's it's you know it is kind of an apples to oranges situation and it's important to understand that it is a little bit of an apples to oranges situation btrfs and ZFS are designed their file systems but their file systems that are raid aware and so it really blurs the lines between raid functionality which is taking a bunch of disks and presenting it as a large empty volume on which you can put any file system to btrfs which is a combination of file system and way of dealing with redundancy it's sort of two things rolled into one and the reason for that is because there's a lot of overhead to it and with old computers this would have been a bad idea but with modern computers where they're so fast yeah maybe it's the thing in ZFS kickboard takes 8 gigabytes of RAM just to get started and the part of the reason why is because it can deal with all this situation you don't have to deal with this and hardware so because of this overhead if you're if you've worked on this kind of stuff in the enterprise before this kind of overhead is the type of thing that you see also on a sand so a sand will help you prevent bit rot a sand will do things for you a lot you know honestly a lot of commercial sands run ZFS just under Solaris not FreeBSD but that type of file system is important because it provides parity and it provides redundancy and it is aware of multiple discs running underneath it and the file system is designed not to really trust the hard drives in ZFS case it trusts the RAM and the CPU and the motherboard and so if that's why ZFS all the forms are like oh my god if you're using ZFS you've totally got to get error-correcting memory but kids ZFS assumes that if it encounters an error it's because there's a disk line not because there's bad Ram and so ZFS making that type of an assumption will be very bad news for the integrity of your data if it was actually a RAM problem so keep that in mind so that these types of functions are really also the same types of functions that you see on a sand and so a lot of the time not btrfs btrfs is way too new but a ZFS where it's older and more mature runs on solaris and other proprietary file systems you will see those on sans and this is some of the stuff that they're doing under the hood now sans also do a lot of other really fancy stuff beyond just this but this is why sans are so much more expensive than NASA's I mean it's not unusual to have a sand it's a quarter of a million dollars or more you know even an even inside of a company with only you know 500 thousand employees it's not unusual to spend $150,000 on the sand I mean a sand does a lot for you in terms of integrity and in terms of you know the whole statistic that I was telling you about oh if you want to store it gigabyte of data you're going to have to have four to eight gigabytes worth of space sand can help maybe sort of kind of fudge that a little bit because it'll store you know deep duplicated blocks of data it will store differential snapshots so that you not net you know if you've got five copies of a file and that file is a gigabyte you're not necessarily storing five gigabytes worth of information you're storing the first gigabyte and then the differences for the second file and the differences for the third file and sort of under the hood it's doing all this kind of stuff transparently to make it like you've got five one gigabyte files that are that are a little bit different from one another but really under the hood is just one one gigabyte file and in some very small difference files now there is one important piece of terminology between a SAN and btrfs and ZFS and that is background scrubbing or just scrubbing scrubbing is generically defined as a function of the file system that one is not busy or periodically it will go through and look for corruption or at the very least inconsistencies may not rise to the level of corruption and it will correct those instances of corruption now sometimes you'll see this on hardware raid controllers like the perc in the LSI controllers that we've taken a look at to have this background read patrol thing and they do that but again that relies on the hardware functionality of the drive the controllers asking the drive to do a full drive read patrol which is basically a full read of the drive and the controller make sure that the drive says that it was able to read everything okay which most of the time the drive is honest but there are times where the drive is flying through his teeth so I'm going to take that with a grain of salt now earlier I asked you what happens when one of the drives is lying and you know we we sort of talked about that a little bit but btrfs and ZFS because they also store extra data whenever something is written to them they're able to sort of tell on a drive is lying though they will store extra check sums like md5 sums so we've computed the md5 sums of the files and these file systems will automatically store extra error detection and correction information in order to recover from that and that's why those file systems need awareness of the devices that under in the file system and so with ZFS and btrfs you would not want to you know use a RAID controller or a raise situation and then present a bunch of drives has one large volume to the ZFS or btrfs format it will totally let you create you know a one drive one volume ZFS filesystem or a one drive one volume btrfs volume but because the filesystem doesn't have any knowledge of the underlying filesystem you're not going to get any redundancy if something happens to one of the underlying drives you're entirely at the mercy of the raid system for the recovery which is bad for the reasons that we've outlined you need to rely on the file system and the file system functionality in order to do that and so with btrfs when you create it's kind of funny they're going to change this they haven't changed this yet but mark my words they're going to change this with btrfs when you create a to drive raid one and you ask how much free space you have it reports the actual free space which is the free combined free space on both drives it's just that when you go to write if you've got a 500 megabyte file and you write that to your drives the amount of free space will go down by terabyte double that amount because btrfs work transparently in the background taking that 500 Meg file and writing it to both drives no RAID controller no other RAID controller on planet Earth works that way and some pedantic programmer somewhere was like yes this technically you have that much free space and if if you want with btrfs you can write the file a non redundant way and save some space that's totally fine because btrfs has all sorts of cool things planned like being able to write a file and a non redundant way so that if you lose it it doesn't really matter and but that's just there's some crazy stuff planned for btrfs and they actually implement all the stuff that they have planned it's probably going to have more overhead than ZFS but that's another that's a video for another time oh and incidentally if you do run btrfs on top of MD and you do want the data redundancy thing if you do specify the btrfs data redundancy flag on the command line you're also going to have half your available space because for every write that you do on that you know Linux multi-device volume btrfs is going to write it twice for redundancy reasons now normally by default it's writing the the metadata redundantly and so what that's like the file allocation tables or like we're like the accounting records of where all of the files are on disk and how big they are and that is not very much of a space penalty and the redundancy that you get for that because if you have you know if you have drive corruption and it's in a file oh it's bad news you lose the file if you have drive corruption and it happens to be in the an area where metadata is stored you're going to lose a lot more than just one file and so that's why this particular file system btrfs butter FS stores by default multiple copies of the metadata it does not by default store multiple copies of data especially on a single volume or single Drive instance because that would be silly so that's part one of this series part two and we might split it out into more than part two but part two is going to be step by step how we tested how we computed the md5 sums using the scripts that are posted on the website and how we introduce a little bit of corruption into the array hint hint it was with DB which DD is awesome and what the results were of each type of test we actually probably could split the results of each type of raid test into its own video but I think we'll probably end up just doing it in one really quick video and then maybe do a longer write up and so this has been sort of a backgrounder and an overview of what we've done so far and what the plan is if you spot a problem with the plan let me know if it's a pretty good plan and if you learn something know let me know or if you didn't learn anything let me know that too I'm still getting the hang of this so so let me know what you think and if you work any Enterprise and you deal with enterprise systems and you learn something definitely let me know oh cool I'll go see on the forums Tek syndicate comm see ya

Info

Channel: Level1Enterprise

Views: 312,436

Rating: undefined out of 5

Keywords: RAID (Invention), Btrfs (Software), ZFS (Software), Tech

Id: yAuEgepZG_8

Channel Id: undefined

Length: 32min 56sec (1976 seconds)

Published: Thu Mar 19 2015