How Disk Arrays and Parity work - Raid and Unraid

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this video is another in the beginner core topics series so if you want to know how parody works how disc arrays work the difference is between traditional raid and unraid well then this video's for you hi there right so in this video i'm going to be talking about parity i'm going to be talking about what it is and how it works also i'm going to be talking about how the unraid array uses it why the unraid array uses it how it does and the differences between a regular raid array and the unraid main disk array and also its pools so let's start what is parity well let's see what the definition is so the dictionary definition of parity it says it's the state or condition of being equal or in mathematics the fact of being even or odd and similar words are equality equivalence uniformity sameness etc so let's have a look at the definition in regards to computing and it says here a function whose being even or odd provides a check on a set of binary values so what does that mean providing a check on a set of binary values so i'm sure we all know that the language computer's speak is binary all ones and zeros well it means basically checking when a binary message is sent from one place to another whether there's an error present or not so let's take two computers sending a message to each other over the network now we know data is sent in packets and packets they can be different sizes so let's take a one byte packet and as we know one byte is made up of eight bits as we can see represented here so an 8-bit binary message may look a bit like this so if the sending computer on the left sends this packet to the receiving computer on the right how does the receiving computer actually know that the message is actually correct there may have been noise that corrupted data along its transmission for example the last one here that bit may have been flipped and they appear as a zero so as this data is sent at the moment the receiving computer has no way of knowing that there's an error that's because in our packet all of the eight bits they're all data bits but if we're to construct our packet differently and only have seven data bits and one parity bit this will allow us to check whether there's an error present in the packet so let's use the bit on the end as our parity bit and let's color it red so remember when we looked up the definition of parity where it's going on about things being even about numbers being odd or even well to use a parity bit our computers need to decide whether they're going to use odd parity or even parity and so in our example here the computers are going to be using even parity so what this means is the number of ones contained in the packet will always be even that's why the computers have agreed on even parity so here if we look at our packet we can see the seven data bits have four ones there and four ones they're even so that means our parity bit needs to be written as a zero so the sum of all of the ones is still even there's still an even number of ones there so now when this data packet is sent to the receiving computer the receiving computer knows that even parity is being used and so it can see the number of ones in the packet they are even so the data checks out as being correct so let's look at that again but with some different data so this packet contains three ones so at the moment there's an odd number of ones in the data so to make the packet even we need to add a one as the parity bit therefore making the number of ones in the packet even so now when this is sent to the receiving computer again it knows the even parity is being used and the number of ones in this packet are all even so again the data checks out as being correct so now let's see what happens when something goes wrong and a bit of data gets corrupted so during transmission this first one flips to be a zero so when this packet is received by the receiving computer because we're using even parity for the computer to know the data is correct so all the ones must add up and be even so how many ones have we got well we've got three ones in this packet and three certainly isn't even so the receiving computer knows that there's some error that's happened in the transfer of this data and so the receiving computer is going to say back to the sender hey i didn't quite get that can you send it to me again and then the sending computer will send that same data again and hopefully this time it will get through without any interference and come through correct so the parity bit will basically tell the receiver whether there's an error there but it won't say what the actual error is so it's a bit like us having a conversation on the phone and i say to you watch my new video on and forget to subscribe to the channel and so your brain and common sense that's your kind of parity check and you think to yourself when did you say to watch the new video i didn't hear that and surely he couldn't have said forget to subscribe to the channel that makes no sense so you say pardon and then i say watch my new video on monday and don't forget to subscribe to the channel so you see the parity bit can't correct the data itself in order for the data to be corrected the whole piece of data needs to be sent again now parity it sounds pretty good but it can't actually catch all errors for instance if we resend this packet again and this time two parts of data become corrupted we'll see what the problem is so here a one is flipped to a zero and a zero is flipped to a one and so because of this there's still actually an even number of ones so as far as the parity check's concerned this is correct even though the data is different so as you can see a parity bit can detect errors but not all errors and also it can't automatically correct errors it can only tell us that there's an error there now if you're interested in things that can automatically correct errors do a search and look up something called hamming codes and you can see how multiple parity bits can actually auto correct data but i'm not going to be talking about that in this video okay so now we know a little bit about parity let's see how parity can be used for redundancy when storing data on disks so let's rotate this block around and think about it how it applies to these disks so now instead of having a parity bit we've got a parity disk and as we know disk store information has ones and zeros as well and they do that on their sectors and depending on the size of the disk depends on how many sectors that disk actually has now to make it easier for this video i'm talking about parity being used over sectors of disks but the parity is actually over the bit position on the disk and generally a sector on a disk is 512 bytes which is 4096 bits but just for simplification in this video we're going to talk about the sectors on the disk instead so let's take these eight disks well in fact let's get rid of one of these disks and imagine that this is an unraid main array and choose three different sectors maybe sector 100 5000 and sector 10 000. so now let's put some data on these disks now the unraid array uses even parity so when you do a parity sink it calculates the ones and zeros across each sector and on the parity it writes either a one or zero to make sure the sum of everything is always even so let's look across the disks on sector 100 so we've got four ones across this sector so that means it's already even so for sector 100 on the parity disk it's gonna write a zero okay so sector 5000 now across here we've got three ones which is odd so for the parity disk on sector 5000 it needs to be a one to make it even and finally sector 10 000 here again across here we've got four ones so that's already even so the parity disk sector 10 000 that will write a zero so as we're not checking for an error in the data being transmitted as such although some people would say we are transmitting a message while we write data from the present to us sometime in the future when we read that back so let's simulate disc two failing so unlike before when we're looking at parity in binary messages being sent from one place to another we didn't know where the error is obviously when we have a disk fail you know exactly where the error is so because we know where the error is that allows us to be able to actually error correct and know what that missing data should be so we don't lose any data so let's calculate the data on the failed disk too so taking sector 100 we can see on our parity we've got a zero and if we add up the ones we can see we've got three ones so because we're using even parity and we've only got three ones and we've got a zero for the parity then we know the data on disk two on sector 100 is definitely a one and sector 5000 we've got four ones which is even so sector 5000 was definitely a zero and sector 10 000 we've got three ones a zero for our parity so to make it even this two sector 5000 must have been a one so with the failed disk that hasn't been replaced the data is corrected and so therefore emulated and we can still access all of the data but replacing the failed drive from the emulated data this is written onto the disk and then we're back to how we were with all of our data intact so now you can see how a parity disk actually protects your array it doesn't actually contain any actual real data on it only the ones and zeros that make all the others add up to be even for our even parity and how this is all calculated is using something called an xor algorithm and xor being short for exclusive ore now we'll talk more about exclusive ore in a moment but for now i want to talk about the differences between the unraid array and a standard array which also has one disk's worth of parity let's compare it to raid 5. okay so on an unraid array the parity is not kept on any of the data disks at all it's kept separate on its own disk now in raid 5 this is slightly different because the parity is striped across the disks along with the data in this picture here the parity being represented in the orange so the parity is striped across all of the disks and as well as the parity being striped the data is also striped across the disks as well so one file may not reside just on one disk it's cut up into little pieces and written across all of the disk now this has both advantages and disadvantages the advantages are because all of the disks are being used all at once with read and writes the read and writes are faster but this comes with its own set of disadvantages because the parity is striped across all of the disks it means all of the disks have to be the same size you can't have mixed size disks well actually that's not a hundred percent true if you were to make a raid 5 array with mixed-sized disks then on each disk the usable space would only be that of the smallest disc for example if you had two 10 terabyte discs and one two terabyte disc the 10 terabyte discs would only have two terabytes used so from that array you'd only have a usable space of four terabytes and that's because the two 10 terabyte disks because of the two terabyte disks present in the array would only act as a two terabyte drive themself and another disadvantage of having the parity striped across the disks means you can't add an additional disk to the array you can't add either another data disk or a parity disk to an existing raid 5 array and the easiest way to show why you can't do this is to show you why on an unraid array where the parity isn't striped that we can do it right so here we are back on our unread disk array now if you remember earlier i said i was going to take one disk out the disk 7. well there is a reason for that because i wanted to show you how we'd add a disk in so imagine the disk 7 is inside the server ready to be added to the array so how can it be added without messing up the parity well that's easy by writing each sector to this disc with a zero and so because these zeros don't change the parity calculation at all sector 100 5000 and 1000 they're all still even parity if you have a look and because the parity is still valid should any data disk fail well the data can be rebuilt so when you add a disk to an unraid array unread does what's called clearing where it zeros all of the disk before adding it to the array my preferred way of doing it is to use the pre-clear plugin and setting that to zero the drive and then afterwards read back all the zeros to check it's been done correctly and there's two reasons i do it this way firstly because it stress tests a new drive by actually reading back the data as well and also just to be a hundred percent sure everything zeros before i put it in the array just to make sure my parity will be valid and on the side note another thing that's interesting about the unraid array and zeroing a drive is if you want to remove a drive from the unraid array you can actually do that if you zero the drive and then remove it that won't affect the parity either and i'll be showing how to do that in an upcoming video later this week where i show various ways to shrink an unraid array and another advantage of having a dedicated parity disk is the fact that we can use multiple size disks in our array the only rule being is that the parity disk must at least be as large as the largest disk in the array so taking the example we've got here now imagine the largest disks in the array here they have 10 000 sectors and the other disks only have 5 000 so imagine the five thousand sector discs are like five terabytes and the ten thousand discs are ten terabytes now of course this number's totally arbitrary discs are going to have much more sectors than that but here imagine that the parity disk this one two and three are ten terabytes and this four five six and seven are five terabytes with the mixed size drives the full size of each disk can be used and it doesn't affect the parity because the parity data is not striped across the data drives now also an unraid array it doesn't stripe its data across the disks it writes individual files and folders to individual disks without breaking up the files spreading them across multiple disks now this has an advantage because you can take any disk out of the array and put it in another computer and read the data off it but also like in this example here i have more than one disk fail and i can't build it back from parity well i've only lost the data on the two disks that have failed all the other disks are there to have the data still okay so now let's just think of the advantages and disadvantages between a raid 5 array and an unraid array so to read or write data from a raid 5 array all disks must be spun up to read or write data from an unraid array to write data two disks need to be spun up to read data only one because of this an unraid array will use less electricity than a raid 5 array would so long as the disk is set to spin down during periods of inactivity also because a raid 5 array reads and writes to all of the disks at once then the wear across these discs is uniform and even because all disks are used equally this is not the case on an unraid array the parity disk will have the most activity during writes it will always be written to so of course this could be considered a disadvantage because obviously the more a disc is used the quicker it reaches its end of life so for this reason i'd really recommend that your parity drive is a very good quality disk your parity disk don't use a shucked drive and by a shocked drive i mean one that used to be in a usb enclosure and was sold as an external usb drive manufacturers will sometimes put drives in these enclosures that don't come up to standard for what they'd sell for the higher costing internal drives this isn't always the case but it's something to keep in mind and think about why these disks actually cost less than their non-external counterparts so going back to the raid 5 whilst you could think it's advantageous to have all of this have equal wear if you look at it from another perspective this means that when a disc comes to its end of life and it fails there's a high chance that the other discs in the array will also fail too or they'll fail shortly because they've all had the same amount of wear and tear and will have all been bought at the same time this isn't the case in an unraid server because the discs won't all have equal wear and because you can actually add this to the array the discs probably won't all be the same age which obviously brings us to another point that we're talking about earlier and on raid array you can add extra data disk to it as time goes on and you can even add an extra parity disk if you want two parity disks which will allow the unraid array to have two drives fail and for you to be able to recover now i will talk briefly about dual parity and unread towards the end of this video but for the time being i'm going to ignore it yeah but going back to raid 5 as we know the array can't be expanded at all okay so both a raid 5 array and a one parity disk on raid array and recover from one failed drive what happens if more than one drive fails what happens then well on a raid 5 array you lose the whole array and all the data on that array but on an unraid array you only lose the data that's on those failed disks and as we saw earlier this is because unread doesn't stripe data across disks it just writes to each individual disk one at a time so another thing to think about during a data rebuild is that during the rebuild both on a raid 5 array and an unraid array all of the disks need to be read from which sounds fine but what happens if you've got some bad sectors you haven't seen on the disks and you get an unrecoverable read error or a ure well if this happens on a raid 5 array during rebuild the rebuild will fail and the rebuild will stop however on an unraid array if you get an unrecoverable read error during rebuild it will just skip that sector and yes the data will be corrupted in that sector but the rebuild will continue so at worst some of your data will be corrupted but you'll still be able to get the large amount of it back okay so the last thing to think about is on a raid 5 array because the data and the parity is striped across all of the disks then reads and writes are going to be faster than on an unraid array where the data and the parity isn't striped and so the reason i'm mentioning this last is because i thought it would be a good segue into speaking about unread cache pulls so to combat the slower write speeds on an array a number of years ago lime technology introduced having a cache drive and more recently introduced cache pulls so this allows files to be written first to a fast ssd and then later on normally during the night a program called mover will then move that data onto the main array so what happens if the ssd fails well we can use pools of ssd devices in a btrfs or butter fs raid and here with these two ssds this is a btrfs raid1 where the data is mirrored across the two drives so this gives us redundancy should one of the ssds fail before the date has been moved on to the main array and vtrfs raid is very similar to conventional raid raid 0 the data's striped between two disks so the reads and writes are very fast raid 1 as we said a moment ago is a simple mirror which gives us guaranteed redundancy and raid 10 basically that's a combination of raid 0 and raid 1 and raid 5 and 6 are parity raids with either one disk of redundancy or 2. now i don't really recommend using raid 5 or raid 6 in batter fs a lot of people say that raid 5 and raid 6 and batter fs isn't stable enough to use and i think the general consensus is not to use raid 5 or 6 with butter fs raid and we're not limited to one of these pools we have multiple pools and you can see a video about setting up multiple pools here now it's rumored that zfs is coming to unraid soon maybe in version 6.11. now this is something i think a lot of people myself included are really looking forward to as it will allow us to create unraid cfs pools now in fact we can actually use the fs on unraid at the moment with the use of a plugin and if you want to see about doing that then see my video here now the one thing i haven't really spoken about much about the unraid array is using two parity disks now a lot of people mistakenly think that the two parity drives are just a mirror of each other well if that was the case the only thing the second parity disk could protect is if the first parity disk failed well as we know having two parity disks allows us to have two drives fail in the array so the second parity disk is definitely not just a copy of the first so the first parity disk is called the p drive and it uses the xor or exclusive or algorithm and the second parity disk is called the queue drive and this uses another algorithm called a reed solomon algorithm this is the same algorithm is what's used to correct scratches when playing back something like a cd or dvd now i'm not going to talk about the reed solomon algorithm i did say i was going to talk a little bit more about the exclusive ore so you may have heard of bitwise and or in computer programming and its negative counterpart bitwise nand and nor well bitwise exclusive or is from this same family so we know computers use binary and if we think as one is equalling true and zero is equalling false you may want to find out if two conditions have been met because you may be trying to find out if something is a one and something else is a zero and with these two conditions we have something called a truth table which looks like this this is the truth table for normal or so imagine condition a is is it wet and condition b is have i slipped over and so we want to know if one of those conditions have been met so in the first column the zero is false so it isn't wet and i haven't slipped over so obviously the result will be false in the second column is it wet no a zero and have i slipped over yes a one that's true so the result is true the next column is the reverse it is wet but i haven't slipped over so one of the conditions has been met so that's a true and finally the last column it is wet and i have slipped over so even though both of those conditions are true well one of them is two so the result is a one okay so that's the bitwise or operation so now let's look at the bit rise exclusive or now this is very similar except that last column in exclusive or it's got to exclusively be one or the other so obviously the last column where both conditions are met instead of that now being true that's now considered false because only one condition can be met in order to be true hence the name exclusive or so you may be thinking well how does that relate to the unraid array and the parity well like i said we were using the exclusive or algorithm to calculate the odd and even and we imagine the result is the parity disk well if you look at the first column a zero and a zero is even so again the parity would be a zero and the next column a zero and a one is odd so to make the parity even that would be written a one the same with the next column and the last column a one and a one is even so a zero would be written to the parity so using the xor algorithm this is how the result is calculated so i'm sure some of you are thinking well that's very well for just two conditions which would be like two disks but what about four disks how does this work with that so what we do is start at the top and xor the first two numbers the disk one and two so if we look at our truth table a one and a one is a zero so we take our result and then x or that with the next disk this three and a zero and a zero well the result of that is zero and finally we take our result from that and we xor it with disk four so a zero and a one well the result is true so the answer is one so now that means all of the disks have been exclusive ored so that means this final one will be the result for the parity okay so that's how parity's calculated with the xor algorithm and so now at the end of this video hopefully you know a little bit more about parity raid and the unraid array now this video is much longer than i thought it was going to be so it's time to wrap this up now and finish the video now i wanted to make this video about parity and the unraid array because my next video i'm going to be demonstrating how to shrink an unraid array and i think having an understanding of parity is really useful for that anyway guys it's time for me to go now but if you enjoyed this video then please hit the like button and subscribe to the channel don't forget to share the video with everyone who you think might find it useful i want to thank all of my patrons and supporters out there for supporting this channel thank you so much guys for enabling me to make these videos and if anyone else out there wants to help and support the channel then please find the links in the description below anyway guys it's time for me to go now but whatever you're up to for the rest of the day i hope it's good and i'll catch you in the next video
Info
Channel: Spaceinvader One
Views: 37,648
Rating: undefined out of 5
Keywords: linux, unraid, unraid array, raid 5, raid 6, parity, single parity, dual parity, xor, reed solomon, lime technology, lime tech, unRAID, nas, data, how data is stored on a harddrive, ssd, hdd, hard drive, how a hard drive works, home server, server, kvm, docker, qemu, All about the Array, how data is written and how the parity works in unRAID, tutorial, guide, how raid works, how unraid works, difference between raid and unraid, double parity, what is parity
Id: dX2PvD1qtKw
Channel Id: undefined
Length: 26min 54sec (1614 seconds)
Published: Wed Jan 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.