I built the fastest Raspberry Pi SATA RAID NAS!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
is this the fastest most reliable raspberry pi nas ever built in this video i'll build a native sata raid array on the new raspberry pi compute module 4 and show you how to build the fastest most reliable network storage for the world's most popular single board computer i can't get one of these lsi sas cards working on the pi yet so hardware raid is currently out of the question so i switched to testing with a sata card instead specifically i bought this io crest 4 port sata card which has the marvel 9215 chip with driver support built into the linux kernel in the last video i showed how i recompiled the kernel with sata support and then connected four drives to the pi in this video i'll test everything out and show you how well everything works together and i'll show you some pitfalls to avoid when you build your own pi storage array before i tested all four drives together i decided to do a few baseline tests first i tested my cheap kingston ssd i tested it in my fastest usb 3.0 sata case with uasp through a usb 3 card on the compute module and i also tested it directly connected through the sata card every benchmark shows a noticeable improvement using native sata especially for 4k random writes this is likely due to the fact that there's one less translation layer between the drive and the rest of the system there's no usb protocol slowing down the sata bus switching gears i tested a single wd green hard drive to keep the competition a little more fair though since a hard drive is no match for even a slow ssd i tested it against my favorite micro sd card the samsung evo plus and here's the sad news for small file random i o spinning hard drives are terrible and you might wonder how it could be so bad well here's a picture i took over 10 years ago of an old ide hard drive i ripped open the problem is those little arms that extend out to the disc platters whenever you work with data on a spinning hard drive that mechanical arm needs to physically move to the right location on the disc while the disc is spinning a hundred times per second and that requires a lot more effort than on a flash drive where the drive just needs to look up the correct address in its solid state memory you might not have ever thought much about it but since ssds became commonplace in the last decade most people outside the data hoarders subreddit don't think much about how much of a revolution the transition from spinning platters to ssds really was get it revolution anyways now we have some performance baselines there are much faster spinning disk drives available but even the fastest ones i've used are still much slower than the average ssd for practical workloads but because they cost a lot less per gigabyte they're still useful for data storage and that's why i'm testing them in my setup one way to increase both the reliability and performance of data storage with these slower drives is to pool together many drives using raid the first thing i did is partition all four drives using fdisk to keep things simple i created one primary partition on each of the four drives technically you don't have to partition the drives before using them in array but there are a few good small technical reasons why most guides recommend partitioning them before you create the array and to create a pool of drives i decided to set up a traditional raid array using linux's md admin tool but before i do that i should probably explain raid a little bit because if you dive into multiple drive tech like raid zfs or btrfs and you don't know what you're doing you will end up losing data and it's not a matter of if but when so what is raid well a long time ago around the time i was born in fact someone got the bright idea that it would be good to take multiple hard drives and put them together to make a bigger drive a constantly backed up drive or a faster drive or maybe a bigger faster constantly backed up drive and so we started seeing people put together multiple drives in redundant arrays of independent disks initially in raid 0 or raid 1 configurations raid 0 takes multiple drives and joins them together into a stripe striping increases the total volume size and it also brings better performance because data blocks can be written across and read from multiple drives at the same time but what happens when any single drive is lost it's as i said catastrophe if any single drive fails all the data is gone and the more drives you have the greater the chance one will fail so then we have raid one you can take many hard drives and mirror the data from the first drive onto one or more copies that's much safer because now any drive can fail and the data is still intact and data reads are faster because they can still be distributed to all the copies of the drive but you're not gaining any performance benefit on rights and you're not getting any extra storage but where do we go from there well there are two main avenues to try to get more performance storage and redundancy with more than two hard drives some people may be willing to accept more risk to save money on extra hard drives and they go with something like raid 5 or raid 6 but i don't generally use them if my spidey sense tingles about a certain technology like the parody that's used in these types of raid arrays i don't trust them with my life or with my most important data and the way data parity is distributed among drives in raid 5 and raid 6 makes me a little nervous especially if you have many terabytes of data if you want one brilliant example to learn from check out linus tech tips 2016 video all of our data is gone i'm only getting about 25 likes per second 30 it's probably gonna crash like while you're doing this if you watch that entire video which i highly recommend you'll realize linus was lucky to be able to get expensive data recovery specialist to painstakingly rebuild the storage array but he also learned the number one lesson about all types of raid raid is not a backup if you're ever tempted by the raid demons to consider feeling like your data is safer because you're using raid stomp that thought right out you still need to back up any data that's important to you ideally with a local backup and at least one off-site backup i follow the 3-2-1 rule you should keep at least three copies of your data storing two copies on different media with at least one backup offsite i won't go too much further than that but i will say that i only use either raid 1 or raid 10 in any situation where i care about my data's availability and on top of that i configure rclone to create scheduled backups on slower storage and cloud storage like amazon glacier raid 1 or 10 do require more hard drives but i can rest easier knowing that the odds are in my favor more so than with raid 5 or raid 6 when any particular driver controller fails i should note here before all the zfs zealots start leaving comments below that i'm not going to dive into zfs in this video that's a different rabbit hole that deserves an entirely separate video since once you start going down that road you're combining file systems volume managers and backup and replication and i'll get to that some other time and yes i know open cfs 2 was just released and i'm sure 50 people are going to comment about how amazing it is anyways but what can i do well we went on quite a tangent but i feel like if you're going to go to these extremes for resilient storage on a raspberry pi you should at least know what you're getting yourself into now remember the earlier benchmark showing the abysmal random access performance of the single wd green drive even a pokey old micro sd card beats the pants off a budget spinning hard drive so next i wanted to see the difference with raid 0 and raid 1. to set up an array with md admin i first installed it with apt install md admin then i ran the command sudo md admin dash dash create dash verbose slash dev mdo that's the device name i want to use dash dash level equals zero which is the raid level i want to configure then dash dash raid devices equals 2 followed by a list of the drives in the device tree in this case sda1 and sdb1 the partitions i created earlier using fdisk after that was done i created a mount point in mount raid 0 and formatted the new array using sudo makefs.x4 and the path to the new array dev mdo after it was formatted i mounted the device and the raid 0 mount point i set up previously you can use the md admin tool to also inspect the array and check on its health using the command on the screen if you want to make the array mount at startup edit your etsy fstab file and add a line that looks like what's on the screen you'll also need to persist the md admin device details into the md admin configuration file in etsy md admin mdadmin.conf using the command on the screen once that's done you can reboot your pi and the raid array will mount on startup after the raid 0 was up and running i ran the same benchmark again and here are the results compared to the single drive all the benchmarks were around two times faster and interestingly the 4k write test was almost three times faster which was a little surprising i have some theories about that but the main takeaway is raid 0 seems to scale performance proportionally for each new drive added i also quickly confirmed that two drives in a raid one array performed about the same as a single drive before moving on to testing with all four hard drives i set up the four drives both as raid 0 and raid 10. raid 0 stripes all four drives together and you can see that it provided blazing fast speeds compared to just one drive the random write performance is even better than the micro sd card now and all four drives together can actually handle sequential data faster than my single low-end kingston ssd but remember with raid 0 if any drive fails you lose everything so it's really only helpful in a situation where you need fast caching storage so next up i built a raid 10 array raid 10 which is short for raid 1 plus 0 means that there's a striped set of mirrors every drive in the array has a mirror and all these mirrored sets are striped together for better performance you sacrifice half the performance and storage capacity over just addressing the drives directly but the speeds for these four drives are still double the performance you get out of any single drive on its own also one thing i didn't realize was that when you create a mirrored raid array with md admin like with a raid 1 or raid 10 you have to wait a long time for an initial resync operation to take place this can take hours on larger and slower hard drives technically you could start writing data to the drives right away but it's going to be a lot slower until the resync is done just something to keep in mind after all that testing with those four hard drives i realized waiting hours for these slow hard drives to complete their testing was getting kind of boring so what did i do i went and bought three more kingston ssds so i could test an ssd rate array since i already had stackable 3.5 inch drive cages i just bought two ssd 2.5 to 3.5 inch adapters and dropped the four ssds into two of the hard drive slots it's probably better to build your own little cage for ssds or even go all out and buy something like icdocs hot swap case but i needed something quick and cheap and these corsair dual ssd mounting brackets fit the bill and after my last video a few people mentioned that there are sata cables with thinner more flexible wires than the standard flat cables so i picked up this five pack of cable creations sata cables and wired the pi to the ssds using them it's a much cleaner look though i lose out on the plus 5 hp speed boost i get from the red cables formatting the ssds and creating mirrored raid arrays with them was a lot faster than the old spinning hard drives i didn't time it but it was about 10 times faster i started running into some strange device or resource busy warnings when i tried creating the arrays with ssds and after digging into it i found it was a race condition probably caused by the fact that ssds all respond so quickly compared to the hard drives the fix from a 2012 blog post i found was to pause udev from processing events while it was creating the array so i did that using udav admin with the command you see here and then i started it back up after the array was created and everything worked out great so now it's time to benchmark the ssds at raid 0 the wd green can actually keep up to the ssd array for sequential rights but it gets torn to shreds on random performance the laws of physics severely restrict the iops for physical spinning hard disks so even bargain ssds absolutely dominate random i o performance but that's just raid 0. what happens if we build the array in the much safer raid 10 mode compared to the hard disks in raid 10 the ssd ssds finally win for sequential performance 2 while still demolishing the hardness for random i o and finally i wanted to compare the ssds with the baseline performance of one ssd versus 4 and raid 0 and raid 10. it looks like for sata ssds on the pi performance doesn't increase by as much with raid compared to a single drive and repeated testing showed that the difference between raid 0 and raid 10 is actually a lot less on an i o starved raspberry pi compute module than you might think my main takeaway from all this testing is that if you need any sort of random performance or high iaps like for virtual machine images or generic file storage you should try as hard as you can to get ssds for your storage array and skip the hard drives but if you just need to store lots of large files like for a media server you might get sufficient performance even with slower cheaper hard drives and that brings me to the last thing i want to talk about in this video using a raid array for a nas or network attached storage device there are a number of different things you can do with an ass like store and transcode media files with plex store virtual disk images for other raspberry pi's to boot from using netboot or just to have extra storage space available using samba or nfs there are even full distributions you could install on the raspberry pi like freeness which manage almost everything for you turning the pie into something like a synology or qnap appliance but for my needs at least right now i just need more storage for my video files and i want to see if samba or nfs is faster when i need to backup or access archived final cut pro video projects using the sata drives in raid 10 i installed and configured samba and benchmarked it all the different tests and commands i'm running in this video are documented in a blog post linked in the description in case you want to try it on your own first i installed the samba and samba common bin packages using apt after that completed i created a shared directory with 777 permissions so anyone could write to it inside the mounted raid volume then to create a samba share i edited the file etsy samba smb.config and added a new configuration section inside with all the settings you see on the screen after saving that file i restarted the samba daemon with sudo system control restart smbd the share is now available but before i could log in from another computer i needed to set up a samba username and password for it so i ran sudo smb password dash api to create a pi user account and then i entered a secure password then i went over to my mac and connected to the raspberry pi in the network location and connected to the shared folder i tested the speed of a few different file copies including a single 8 gigabyte file and a folder with a lot of files totaling 2 gigabytes and averaged the results and the speed for the large file copy was about 93 megabytes per second while the folder copy averaged 25 megabytes per second next i installed and configured nfs and benchmarked it i installed nfs just like everything else with apt-get install nfs kernel server once installed i made a share directory inside the mounted raid 10 array called sharednfs then i made sure the directory had 777 permissions so anyone could read or write to that shared directory on my network then i edited the etsy exports file with nano and added the line you see on the screen this file tells nfs what folders to share and how to share them i won't get into the details in this video but there are some great resources that go into more detail like the raspberry pi documentation nfs page once the exports file is saved i ran sudo export fs-ra this forces nfs to rescan for new nfs exports and start sharing them then i switched to my mac and in the finder i chose connect to server from the go menu and then put in the pi's ip address with the nfs protocol and then i put the path to the shared folder after running the same file copy tests in the nfs share i compared the results against samba and found that nfs was about 10 percent faster for large file copies and a whopping 40 percent faster for folders with many many files and in general usage i found the same thing at least for my mac browsing files on the nas copying things on the mac using the finder and even deleting folders with tons of files all ran noticeably faster over the network if i connected with nfs there are likely a number of reasons for this but i also want to mention that i ran into some strange throughput issues mostly with nfs but even sometimes with samba if i saturated the one gigabit network connection on the pi for a long period of time i would periodically see the data copyright go from 100 megabytes per second down to maybe 20 or 30 megabytes per second and if i monitor the process via atop i noticed that two of the drives in the raid array would temporarily get slower and show as being very busy i think what might be happening though i can't confirm this yet is that the pi system on a chip could be queuing data for the copy and it just doesn't have enough i o bandwidth between the network interface and the pci express lane to pump through a continuous stream of data at 100 megabytes per second indefinitely it has to pause the copy every once in a while to catch up on cued rights this is something to watch out for but you might not even run into the problem on your own pine ass depending on how you use it it'd be interesting to hear if anyone else has this issue and has more ideas so let me know in the comments i know how it performs now but i also wanted to know how efficient it is to use a raspberry pi as a nas and i noticed one slightly alarming thing the first time i touched the card and that is how hot this card can get here's a thermal image of the card with the area around the 2r2 inductor hovering around 120 degrees celsius during one of my long file copy tests for us americans that's 250 degrees fahrenheit and that's enough to cause an instant second degree burn so maybe be careful handling this card and make sure to keep it well cooled with a fan if i ran it with a fan a few inches away the card stayed around 90 degrees celsius which is still hot enough to cause a burn if you touch it but at least it didn't melt and the chip itself which is under this huge heat sink seemed to remain in a more reasonable temperature in terms of overall energy usage i measured the wattage for both the pi and the four ssds with a kilowatt the nas uses six watts while idle and used a maximum of 12 watts during a huge file copy and that compares pretty favorably to standalone four drive synology or qnap nas devices which use between 30 to 40 watts while all drives are running all things considered though a dedicated nas doesn't use a ton of energy and you should probably worry more about things like features cost and reliability when you choose to build or buy an ass so raw throughput on the drives can reach into 300 plus megabytes per second which is the limit for the pi's pci express 1x gen 2 lane network file copies can saturate the pi's one gigabit network link you can configure different types of raid for redundancy or better disk performance and when used as a four drive raid 10 nas the pi sips only 12 watts of power so what's not to love about this well i know the first thing i want to improve on is the way i have everything laid out on my desk it'd be great to have a compact case and smaller i o board design with just the ports i need and there are actually a few people working on that right now like user mebs t on reddit who's trying to design this all-in-one enclosure with a built-in fan anyways that's all i tested for this video but i'll continue testing lots more cards on the pie in upcoming videos i'm still trying to get that darn lsi card working i also just got another gpu and finished testing this wi-fi 6 card so expect more videos on those soon please consider subscribing and until next time i'm jeff gearling followed by a list of the divine oh every drive in the ray in the array that's hard to say are you finished up there after all that tested test uh on an i o starved raspberry pi raspberry kai features cost and i spelled it wrong reliability and you should probably worry about things ah you should probably not worry too much about talking because you can't do it apparently so expect more videos on those soon i just showed you the back side of this card that's i mean you can't really see it that well anyways but you know that's the front
Info
Channel: Jeff Geerling
Views: 281,639
Rating: undefined out of 5
Keywords: raspberry pi, nas, sata, raid, pcie, pci express, marvell, 9215, fast, performance, efficiency, qnap, freenas, synology, raid0, raid10, raid5, raid6, parity, zfs, btrfs, array, storage, plex, smb, samba, nfs, network file system, mdadm, linux, md, multi disk, card, pci
Id: oWev1THtA04
Channel Id: undefined
Length: 21min 22sec (1282 seconds)
Published: Fri Dec 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.