Fixing my worst TrueNAS Scale mistake!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody Christian here and today I want to talk once more about my trueness storage server you maybe remember the video where I've built it for my home lab I'm still using it to store all my important files in my home Labs such as YouTube videos Graphics personal data and whatnot but this thing has literally been a ticking Time Bomb because I made a pretty big mistake setting it up some of you already mentioned it in the comments the way how I've configured the storage pool was somewhat concerning result in bad performance or even cause data loss in a specific circumstance so I took all this incredible feedback from you and ignored it for like half a year or so lucky for me that nothing happened to any of the drives yet but of course I always wanted to change it and make the whole server setup more robust for the future and thanks everybody for the honest feedback on that video that always shows me that I still have so much to learn and I have a very smart Community I really appreciate that the whole story is I had a bit of experience with storage and Nas Solutions here I've set up a few smaller systems at work before I knew about raid and backups and ECC memory and all that stuff but this project has shown me that it is a huge difference between installing small homeness and a huge storage server anyway today is the day I want to show you my new setup for this storage server and let's talk about what actually was my mistake so you can learn how to avoid it whenever you want to build something like this and Nas or storage server this is supported by teleport a free and open source access proxy that helps you to securely authenticate to all your it infrastructure like Linux servers databases kubernetes clusters web applications or remote desktop you can easily protect your accounts with modern security features such as two-factor authentication or a password that's logged in and access your services through the browser or the CLI tool with audit logging and session recording and the best it's completely free in the community version so you can just download and run it in your entire home lab or if you'd like to use it in your company teleport offers many professional features like auditing single sign-on and more it's a great tool so just check it out you will find a link to their website in the description of this video before I start showing you what I've done we first need to recap what my setup was like and we also need to talk about ZFS and storage pools again I know complicated stuff but this literally is the most critical part about a storage server and I feel that I neglected it a bit in my true net scale video because I was so excited about the other cool features like kubernetes Docker and Linux however this is still foremost a storage solution and although kubernetes is available on that machine I honestly don't use it much what I mainly bought this server for was storage and I've ordered it with 12 4 terabyte drives put it all in a single right Z1 storage pool just like I've done it previously with my small virtual server however just like I said you need to be careful doing it this way because there are two major problems with this setup performance wise and in terms of for tolerance but let's break it down a little bit more detail so when we're talking about ZFS performance there are a couple of different metrics to take a look at the streaming read speed the streaming write speed and the read and write iops so the input and output operations per second and there is by the way a great documentation on the trunes homepage about measuring these metrics and how to calculate the performance of ZFS pools and in the documentation it's described as the following IO operations on a rate zv Dev need to work within a full block so each disk in the v-dev need to be synchronized and operating on the sectors that make up that block no other operation can take place on that v-dev until all the disks have finished reading from or writing to those sectors thus iops on a raid zvdef will be that of a single disk while the number of iops is limited both the streaming and write speed will scale with the number of data disk each disk has to be synchronized in its operations but each disk is still reading and writing unique data and will thus add to the streaming speeds minus the parity level and reading writing this data doesn't add anything new to the data stream oh so if I understood everything correctly in my case I added 12 4 terabyte drives to a single raid Z1 storage and let's assume a single drive would have 250 iops and 100 megabytes per second streaming read and write speed in reality it is a bit faster on these Western Digital drives that I bought but it doesn't really matter for these calculations so let's start with the streaming speed and according to this documentation and the theoretical numbers the streaming and read and write speed should be 1100 megabytes per second because we have 12 drives each has 100 megabytes per second streaming speed but we need to subtract one drive because we're using a single parity layout this gives us an impressive reading and writing speed 1100 megabytes per second this is awesome but the iops are still just 250 because we have put all these disks into a single v-dev and it means in reality the overall performance of that entire Pool isn't that great in every condition you just wouldn't immediately notice it because if you are just transferring data over a network share it's mostly reading and writing the data from the cache first but as soon as you start any recovering processes for instance if one drive fails and you need to replace it those recovering processes can take a pretty long time as the entire pool is just limited to the iops of a single Drive and this becomes more and more a problem the bigger your pool is and the more storage it consumes that is why it's mostly recommended to not add too many drives to a single v-depth this is by the way also mentioned in the official Oracle Solaris CFS documentation rate Z configurations with a single digit groupings of disk should perform better the second and I think even the bigger problem is the Fault tolerance with this setup so as you probably know a raid Z1 is equivalent to a hardware rate 5 so no matter how many drives you add to a single V Dev only one drive can fail if you're losing a second one all your data is lost and call me crazy I haven't really considered failing two discs at the same time a realistic case but many of you guys have warned me this is actually happening and it could become a problem especially during a recovery process which takes a pretty long time because I've decided to put all drives on single v-dev in limited the iops and when during this recovery process a second Drive dies which literally does happen sometimes all my data all my video files my backup everything is lost well that doesn't sound great of course I wanted to change that but what should I do about this what is the best recommendation so I came up with the following setup instead of a single parity like in a raid Z1 I took a raid Z2 which adds double parity and that means I can now lose two drives per V death and I also didn't put all the drives in a single vdef I split them up in two which gives me a fault tolerance of 4 drives in the best possible case so I still can't lose more than two drives in a single vdef but when I lose two drives in each B def it's still okay and this also doubles the iops which gives me an overall better performance of this entire pool the only downside is you need to sacrifice the storage capacity of far drives which is 33 percent of the entire storage capacity so instead of 48 terabyte I only got 32 terabyte left for data but this is still okay just giving the fact that I know can sleep much better never run anything goes wrong with any other drives even during a recovery process a second Drive fails in any pool it's still okay for me and that is also by the way the setup that most of you guys have recommended to me in the comment section so good job everyone you clearly were smarter than I was but that only was part one of my improvements I also wanted to have an even faster storage pool not primarily for storing big data but for storing virtual disks or run containers on the true net system itself and because magnetic hard drives aren't the best for this job I added four brand new ssds to this storage server and built a second pool with them and for this pool I've chosen a slightly different layout here because it solves a different task for tolerance it's not so relevant in this case I can just back up this entire pool to the first one so even when I lose the data on this pool it's not so critical but instead performance streaming speed and iops they are more relevant so what I came up with is a mirrored stripe so I haven't really used rate Z1 or Ray C2 at all instead I've just split up the four ssds into two mirrored v-depths this gives me the iops of two ssds by sacrificing 50 of their storage capacity and a fault tolerance of two drives in the best possible case so one drive per V Dev and I know a striped mirror is pretty expensive yeah out of two terabyte storage capacity I only get one terabyte but I get better performance by still having a fair amount of redundancy and fault tolerance so that's why I just also bought a pretty cheap brand of ssds honestly I don't know how good these threats are but I suppose they still will perform better than usual HDD so I can just use it for testing IX applications the kubernetes part on that true netscale server it also should be fast enough for NFS shares because I'm using that in combination with a 10 gigabit network interface so this is more like a testing storage pool it's not holding any critical data I will test that a bit in the next couple of months and of course I will let you know how that goes when you think I've made a bad mistake again just please put it in the comments below and let's just talk about it and maybe one day I will make a comprehensive video about trueness and storage server with all the stuff that I've learned from these projects there are some of the best practices like backup plan storage layouts and so on please tell me if that would be interesting for you and what you'd like to see in such a video and I think that's it for now yeah as always thanks everybody for watching thank you so much for your feedback by the way and I will catch you in the next video take care everyone bye bye
Info
Channel: Christian Lempa
Views: 128,375
Rating: undefined out of 5
Keywords:
Id: 10coStxT5CI
Channel Id: undefined
Length: 10min 40sec (640 seconds)
Published: Tue Nov 15 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.