How SYNOLOGY protects your DATA - BTRFS and data integrity

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right how's it going y'all so today we're going to be going over how safe your data is on a sonology NAS from a data Integrity perspective all that means is how safe is your data when you put data on How likely are you going to be able to pull that exact same data off which is critical when it comes to data Integrity hard drives like all other mechanical components of the world are not perfect sometimes they do a thing called bit rot where they flip a one to a zero or a zero to a one just seeming by accident funny enough they can sometimes be caused by solar flares they also can just sometimes die and so you want to make sure that even if something like that happens you don't want your files to be corrupt files are binary every single bit in a file is either a one or a zero that's it if you change a single value you can destroy an entire file especially if it's something like a zip file flip a 1 to a zero or a Zer to a one in it will cause the entire file to be corrupted and that's honestly unacceptable so data Integrity has to be perfect we essentially want to be able to make sure that whatever data we put on this thing will be read back identically every one and every zero of a billion of them have to be in the exact same order place in everything otherwise our files are completely at risk and to summarize this sonology and btrfs are actually very good at it and before I get started I have to give a huge shout out to this blog right here I will leave a link down description below this guy is absolutely awesome Delon Durst right here has done absolutely awesome work in testing this all out I will leave a link down the description below to this guy he is just really brutally attacked his Nas and we got to see exactly how well btrfs holds up and and at the end of the day what they've got is pretty darn reliable and before we jump right into it if you like these tutorials and all this other stuff like And subscribe we're trying to hit 1,000 subscribers and we're going to be talking about how this thing actually protects your data in multiple ways but before we completely dive into it I need to preface this by saying you still should back up your keepy files anything that the Nas is the sole source of Truth on that is critical for you should be backed up in some way shape or form otherwise all this stuff that is really helpful for protecting your data it's not perfect and there may be a case where something happens where even if a kid just knocks an ass off and it shatters and breaks all the hard drives all the checks in the world don't help you when all your drives are broken at the exact same time so anything that's critical your family photos those things need to be backed up backed up all automatically to a cloud or another drive it's absolutely something that's critical to do and no amount of check summing in the world should make you feel like you should not be backing up your files now there are some files that are okay not to back up and that is always a personal decision and it's just money versus how much the files are worth it to you to give a really crude estimate that's honestly an over estimate expect that every year there's about a 1% chance that your Nas completely destroys itself who knows maybe that's it gets stolen maybe it's in a fire maybe somebody knocks it off maybe it just freaks out and all these check Sims in the world don't protect you that is where you have to have the decision how valuable are those files to me is 1% probably Overkill yes I've only ever seen catastrophic irrecoverable data loss on nases in the cases where somebody essentially gets hacked and does not have the proper things like snapshot set up and protection plan I've never actually seen sonology have irrecoverable data loss and except for that case I've had volumes go into readon mode I've had other issues but we've always been able to get the files off and even when somebody accidentally pulls a read write cach out of their Nas while it's powered off and then turns it back on sonology support is actually able to come in and repair a lot of that stuff and so I've never actually had full catastrophic data loss from the nas itself only ever from external factors that's a data point you still need to be backing up your data but to summarize this video these things are pretty good at protecting your data all right so now let's actually dive in and go over what you should do and what settings you should have to really protect your data for the long term and make sure that in 10 years those files will be exactly the same as they are today and I'm just going to go ahead and pull up this DS 923 plus and funny enough I actually ripped a hard drive out of this this morning for another video and so it's currently doing a data scrub so it is very on theme with this video and the first thing we're going to talk about here is the single most critical one and that is the volume you've got btrfs is vastly more safe for data Integrity than ext4 is full stop btrfs is what's called a copy on right file system which gives so much flexib ability for files to be updated and not cause file corruption when things go bad that's because the way btrfs works is anytime you're saving a file or updating a file instead of deleting the old file and saving the new file in its place it instead saves that change in a different part of the dis and once that change has been successful it updates the file system to now know that that file lives in another place this means that if halfway through saving a file your file system crashes well nothing's happened because the file is still exactly as it was before it was saved because that b tree has not been updated yet if you have another file system like ext4 it's not copy on right instead what it does is it essentially overwrites that data on disk with the new file if your volume crashes well you're kind of in trouble because now your file is halfway being overwritten and it is going to be corrupted and so both btrfs and ZFS are what's called copy on right file systems and they are very very very valuable for that and they also give you the ability to do what's called snapshots so really key thing is having a btrfs volume because that has so many features in it that allow you to protect your data we're also going to want to use a redundant raid you can pick anything here raid 5 shr1 raid 1 pretty much anything other than raid zero will give you additional protection here if you search btrfs you may have also seen something online about how btrfs raid 5 has a thing called a right hole and therefore is not stable and should not be used and this is completely true you should not use btrfs Raid with raid five that's because depending on when a power loss happens essentially you can have corrupt data but the anology is kind of Secret Sauce is they are using btrfs as the file system they're not using btrfs raid5 they are using tried andrue mdadm from Linux for their raids and that is true for raid 5 shr1 all of those use mdadm which is tried and true raid and so you do not have those same issues so if you're ever seeing those articles and worried about using raid five on your Nas don't they are talking about a different type of raid five btrfs raid five where Synology uses btrfs but a different raid that's on top of it it's a bit confusing but just know it is not something you need to worry about so that redundant raid protects us from a few different cases one it actually gives us more data and more understanding to be able to fix files when corruption occurs hard drives will become corrupt hard drives will flip ones to zeros and zeros to one it happens it is called an uncorrectable error and it happens there are a zillion ways that it can happen but it is an inevitability and should be treated as such well having a redundant raid really protects you from this because now when that happens there's additional information including the btrfs check sums to allow it to use Barker codes to actually go back and figure fure out what that data should have been it is very powerful and really really resilient so even if that does occur and when it does occur you will essentially see that the Nas Just fixes it without even telling you and that's also why it's really important to set up a scrubbing schedule so the way the nas works is every single time you read a file from it it's actually also reading that check sum and it's making sure that that check sum works that check sum is essentially a special piece of math that allows it to ensure that yes all the data from all these different drives syns up it has to be that everybody was telling the truth nobody was lying nobody was wrong and that check sum cannot only figure out what data was corrupted but also what it should have been so every time you read a file it's checking behind the scenes without even telling you it's like yep yep yep making sure these files have not been corrupted on disk and if something happens where it detects this corruption the nas will automatically just give you the real data and then fix itself so it'll say hey hard drive 3 that one actually should have been a zero and it'll just hand you the data as if it was the proper value and fix it without even telling you now this is great for files that are read often but let's say we've got those family photos that we've not opened in 15 years well we need to make sure that those are still getting read and that corruption is not growing because it can handle one error per section a file but depending on the raid it might not be able to handle two errors per section of file and that's where data scrubbing comes in data scrubbing is something critical to set up where you come in here and you hit schedule data scrubbing and these are your settings you should use enable data scrubbing and run every 3 months and you can set it so it only runs outside of business hours or time year using the nas data scrubbing is essentially the nas reading every single file it has and checking it for errors it's also done whenever there's an unexpected power loss or any other issues that could cause data corruption it'll also run one manually then but whenever you see this data scrubbing running up here that is the nas just doing a consistency check on itself and finding those errors and fixing them before it can no longer fix them and so that's why you really want to make sure you have enabled data scrubbing the next thing we want to make sure that we do to help protect our data as much as possible it's comeing into control panel and anytime we're creating new shared folder we want to add in the advanced check sums and unfortunately you can only do it when the folder is created but this right here is what you want to do you want to hit this enable data check sum for Advanced Data Integrity this does does have a small I've never been able to measure it performance impact but what it does do is it has that additional check sum which gives it the ability to recover from more errors because it's adding in additional check sums not just to the dis but also to the file system and note there are a few things that will cause it not to be the best idea to do mainly things that require ultra high iops such as virtual machines surveillance stations that are just getting written written WR and you don't really need it because if a single frame of video gets slightly corrupted it's not the end of the world and apparently active backup for business with the sonology bare metal restore but I would always recommend checking that and that will help protect your data now is this something you have to go ahead and blow away your old folder and redo probably not but whenever you're setting something up I would really recommend having it on there all right so now those are the key settings to have to help protect your data I'm going to throw a couple of extra ones here and then we're going to talk about how your Nas is protecting it under the hood so first off I would really recommend having notifications turned on notifications will allow you to know if something's gone wrong so if a dis has failed or anything like that it's worth enabling notifications so when stuff happens you get an email and it's not 6 months and your Nas has been dying instead you can figure out what's gone wrong and fix it before it becomes a real issue and then the other thing which I've just absolutely beat the dead horse on is snapshots snapshots don't really help you with data Integrity but they mean that external programs can delete files or crup files and you can undo them so I've talked about snapshots a zillion times on this channel but that's another way it doesn't really help with data Integrity but it helps with other programs not screwing up your file and finally remember back up your nest all right and so now with all that out of the way let's talk about people who have really tried to break the Nas and seen what corruption can do here and we're going to go and look at this man's blog who I've read this a zillion times throughout the past because he's done a ton of testing here to see how much he can throw at the Nas and see if it will still keep surviving on through and to summarize this it did an incredibly great job job I will leave a link down to the article below with all of this essentially he goes through and he creates a bunch of files with hashes so we know exactly if the file is what it says it is and it makes finding corruption really easily and he's actually using shr1 with three drives which is essentially going to be raid five underneath the hood what he does is he will start actually going in adding the file to the Nas and then then he will shut off the nas plug them into his computer and actually on a single Drive flip a one to a zero essentially a FF in HEX to an Fe and then he reads the file off the nas lo and behold the nas doesn't even tell him that something went wrong and it just fixes it not only does it fix it on the file that was returned to him but also fixes the corrupt bit on the disc at the same time so the Nas is constantly doing in the background and just fixing things without any trouble then he does the exact same test by wiping out 32 kilobytes or 32,000 bytes and breaking every single one of them in one massive swoop and again it just is able to fix it that's because there's a lot of pieces of data here that allow the nas to figure out what that data should have been and there's really really really cool math out there that allows you to figure out how data was corrupted and fix it it's used in Radars it's used in so many things it's used on the internet if you're somebody who's really into math and going down rabbit holes check out Barker codes they're really cool in detecting corruption and also being able to be not easily tricked and I will just leave it at that there's a lot of very intelligent people who have written a lot of very great math proofs that are used in these things and it's the same way that whenever you're watching this video even if your phone blips and drops a few packets they're detected and you still get to see my lovely face without corruption so we found that the Nas is able to just handle this file corruption but let's also see what happens if data is broken Beyond repair and so the next thing he did is he corrupted the same file on two separate drives he was using shr1 which can only survive one drive of failure and what did in fact happen is yes it got corrupted but the good news is is it did not corrupt any other files and it actually did not let the final corrupted file go through instead it reported an error and so what you may see and I've seen this log before very rarely is this check some mismatch and if you ever see a checkm mismatch that is because a file has gotten corrupted one way or the other and that is where it does not allow the file to be read and that is where a file has been irr reversibly corrupted and so while doing all that the raids still survived and all the other files were okay and so it does show that it is isolated the last and most critical thing that he tested as well is what happens if you rip a drive out while it's rebuilding and all of those type things as well as what happens if while the drives are riding something gets corrupted and to summarize it handles it really well unexpected power power losses are very dangerous especially when it comes to the fact that these hard drives up top have their own little caches to make them faster and those are not power loss protected so what that means is a drive will say yep it's written when in fact has only been written to its cash which might not survive a power loss and that is very dangerous and in all of those cases the nas honestly survived very well you you can read this article and I would really recommend checking out the link in the description because I mean he has just done in-depth testing here that I do not have the time to do but man he did a great job and showed there's no real easy way to break this even with him pulling out drives and manually corrupting them himself which is a great feeling to have and the other thing that's very valuable is the fact that even when destructive data did occur as in corrupting two drives worth of data even though the nas can only handle one the rest of the file system continued to operate and was successful without that data so it did not cause the entire volume to go down and wipe out everything because of an isolated incident which is really important to have and now the test this guy did was a phenomenal synthetic test it tested all the worst cases but now I want to cross this with the anecdotal evidence of real world testing of stuff I've seen happen so I I'd say in all the time I've been doing this and remember people call me because things go wrong A lot of times I'm getting calls from people who things have already happened to if we're only going to be talking about data that was caused by corruption or power loss or anything other than a attacker actually attacking the assas I've only ever seen three cases of a volume having an issue and I'm not talking about a fail Drive issue I'm talking about a issue where it either went into a readon mode or similar thing and I'm happy to report that all of them all the data came off successfully without any real issues which could have been bad so one of them what occurred when the NASA was being converted from raid five to raid six which is a time where data is being flipped around and Rewritten and there was a power loss during that time the volume went into a readon mode but we were able to copy everything off of it I'll be it it did crash and it was slow at times but it did allow us to keep the volume in a readon mode albeit with a lot of alerts and we did not even have to go to sonology support for that the next incident that I know of was somebody who essentially upgraded from one Nas to another had a read write SSD cache and did not bring over the readwrite SSD cache into their new Nas they booted up and now the volume was corrupted because there was data that was on that read write cache that had not been s syn to discs yet sonology support was able to get the entire volume back up and running pretty much in no time at all without any corruption or any problems whatsoever on that remember if you got a read write cash and you're transferring make sure to remove the read write cach before pulling discs out and I would actually just recommend removing it from the volume entirely and finally the last one that I know of was actually caused by a read write cache and power loss protection client who was using a readwrite cache with some nvmes had a power loss and the rewrite cache had data on it but the envm did not have what's called power loss protection and so when that happened his volume booted up into a readon mode and we were actually able to get all the data off of it basically what it took eventually was copying everything off of it and then formatting it and resetting up as a new volume but none of those incidents ever caused irrecoverable data loss for any of the files thankfully that's not to say this won't happen to you but it does show that there is a lot of very intelligent people who have written some very nice software and note btrfs is not a syic product it is an open- Source software they do help maintain it but there's a lot of people out there in the open source Community who are running these projects and making these very resilient file systems that you can really beat up and work on now the last thing I want to talk about is something you may have seen and it was especially prevalent in the early 2000s and that is the fact that raid 5 is dead you'll never have a raid five volume that's over 14 tabes because of a thing called uncorruptible errrors unc's saying that you'll never be able to rebuild a volume and especially if you talk to people who have been it guys for over 15 20 years they will often say no no no I'm never doing raid five never doing raid six that's got tons of problems only ever RAID 10 and this is just not something I would worry about anymore I'm not going to get into it in depth here but just know that raid five and raid six work they work well and they scale very well and the whole UNCC air with a 14 terabyt just really never came to be scrubbing essentially protects that from occurring and I have a ton of times in the past rebuilt massive volumes and never had these problems that you'll read about have to occur statistically and so it is not something I'd worry about I'm not going to touch on it super depth here but just know that's one of those things that really just never came to fruition and is in fact quite stable I do tend to recommend that after you get roughly your seventh or eighth Drive should go from raid five to raid six or shr1 to shr2 for that added protection just because you start to have a lot more drives it's worth having a little bit of extra parity there all right well that is it for this overview I hope this was useful in some btrfs and sonies do a very good job of protecting you from file corruption and you can really rest well with your data on these things as long as your drives are getting checked and you're running your scrubs regularly if you have any other questions put those down in the comments below and if you want to hire me there's a link for that down description below all right have a good one bye [Music] you [Music]
Info
Channel: SpaceRex
Views: 6,068
Rating: undefined out of 5
Keywords:
Id: m0Kvc_w3H5o
Channel Id: undefined
Length: 24min 34sec (1474 seconds)
Published: Wed Jun 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.