Our data is GONE... Again - Petabyte Project Recovery Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- When I say we store and handle a lotta data for a YouTube channel, I mean it. I mean, we've built some'n sick, hundred plus terabyte servers for some of our fellow YouTubers, but those are nothing compared to the two plus petabytes of archival storage that we currently have in production in our server room that is storing all the footage for every video we have ever made, at full quality. For the uninitiated, that is over 11,000, Warzone Installs worth of data. But with great power comes great responsibility, and we weren't responsible. Despite our super dope hardware, we made a little oopsie that resulted in us permanently losing data that we don't have any backup for. We still don't know how much, but what we do know is what went wrong and we've got a plan to recover what we can, but it is going to take some work, and some money, thanks to our sponsor, Hetzner. Hetzner offers high-performance Cloud Servers for an amazing price. With their new US location in Ashburn, Virginia, you can deploy cloud servers in four different locations and benefit from features like, Load Balancers, Block Storage and more. Use code LTT22 at the link below for $20 off. (upbeat music) Let's start with a bit of background on our servers. Our archival storage is composed of two discrete GlusterFS clusters. Both of them spread across two 45Drives Storinator servers, each with 60 hard drives. The original petabyte project, is made up of the Delta 1 and Delta 2 servers, and goes by the moniker Old Vault. Petabyte project two, or the New Vault is Delta 3 and Delta 4. Now, because of the nature of our content, most of our employees are pretty tech literate with many of them even falling into the tech wizard category. So, we've always had substantially lower need for tech support than the average company. And as a result, we have never hired a full-time IT person, despite the handful of times, perhaps including this one, that it probably would have been helpful. So, in the early days, I managed the infrastructure, and since then I've had some help from both outside sources, and other members of the writing team. We all have different strengths, but what we all have in common is that we have other jobs to do, meaning that it's never really been clear who exactly is supposed to be accountable when something slips through the cracks. And unfortunately, while obvious issues like, a replacement power cable and a handful of failed drives over the years were handled by Anthony, we never really tasked anyone with performing preventative maintenance on our precious petabyte servers. A quick point of clarification before we get into the rest of this. Nothing that happened as the result of anything other than us messing up. The hardware, both from 45Drives and from Seagate who provided the bulk of what makes up our petabyte project servers, has performed beyond our expectations and we would recommend checking out both of them, if you or your business has serious data storage needs. We're gonna have links to them down below. But even the best hardware in the world can be let down by misconfigured software. And Jake, who tasked himself with auditing our current infrastructure, found just such a thing. Everything was actually going pretty well. He was setting up monitoring and alerts, verifying that every machine would gracefully shut down when the power goes out, which happens a lot here for some reason, but he eventually worked his way around to the petabyte project servers and checked the status of the ZFS pools or Zpools on each of them. And this is where the Kaka hit the fan. Right off the bat, Delta 1 had two of its 60 drives faulted in the same Vdev. And you can think of a Vdev, kind of like its own mini RAID array within a larger pool of multiple RAID arrays. So, in our configuration where we're running RAID-Z2, if another disc out of our 15 drive Vdev was to have any kind of problem, we would incur irrecoverable data loss. Upon further inspection, both of the drives were completely dead, which does happen with mechanical devices and had dropped from the system. So, we replaced them and let the array start rebuilding. That's pretty scary, but not in and of itself a lost cause. More on that later though. Far scarier was when Delta 3, which is part of the New Vault cluster had five drives in a faulted state with two of the Vdevs having two drives down. That's very dangerous. Interestingly, these drives weren't actually dead, instead, they had just faulted due to having too many errors. So, read and writers like this are usually caused by a faulty cable or a connection, but they can also be the sign of a dying drive. In our case, these errors probably cropped up due to a sudden power loss or due to naturally occurring bit rot, as they were never configured to shut down nicely while on backup power, in the case of an outage. And we've had quite a few of those over the years. Now, storage systems are usually designed to be able to recover from such an event, especially ZFS, which is known for being one of the most resilient ones out there. After booting back up from a power loss, ZFS pools and most other RAID or RAID like storage arrays, should do something called a scrub or a re-sync, which in the case of ZFS means that every block of data gets checked to ensure that there are no errors. And if there are any errors, these errors are automatically fixed with the parity data that is stored in the array. On most NAS operating systems, like TrueNAS, Unraid or any pre-built NAS, this process should just happen automatically. And even if nothing goes wrong, they should also run a scheduled scrub every month or so. But our servers were set up by us a long time ago on CentOS and never updated. So, neither a scheduled nor a power on recovery scrub was ever configured. Meaning the only time data integrity would have been checked on these arrays, is when a block of data got read. This function should theoretically protect against bit rot, but since we have thousands of old videos, of which a very, very small portion ever actually gets accessed, the rest were essentially left to slowly rot and power lost themselves into an unrecoverable mess. When we found the drive issues, we weren't even aware of all this yet. And even though the five drives weren't technically dead, we erred on the side of caution and started a replacement operation on all of them. It was while we were rebuilding the array on Delta 3 with the new discs, that we started to uncover the absolute mess of data errors. ZFS has reported around 169 million errors at the time of recording this. And no, it's not nice. In fact, there are so many errors on Delta 3 that with two faulted drives in both of the first Vdevs, there is not enough parity data to fix the errors. And this caused the array to offline itself to protect against further degradation. And unfortunately, much further along in the process, the same thing happened on Delta 1. That means that both the original and new petabyte projects, Old and New Vault, have suffered nonrecoverable data loss. So, now what do we do? In regards to the corrupted and lost data, honestly nothing. I mean, it's very likely that even with 169 million data errors, we still have virtually all of the original bits in the right places. But as far as we know, there's no way to just tell ZFS, "Yo dawg! Ignore those errors, you know, "Pretend like they never happened, "tow easy ZFS" or something. Instead then, the plan is to build a new properly configured 1.2 petabyte server, featuring Seagate's shiny new 20 terabyte drives, which we're really excited about like, these things are almost as shiny as our reflective hard drive shirt, lttstore.com. And once that's complete, we intend to move all of the data from the New Vault cluster onto this New, New Vault. - [Jake] All three. - New New Vault. Then we'll reset up New Vault, ensure all the drives are good and repeat the process to move Old Vault's data onto it. Then we can reformat Old Vault, probably upgraded a bit and use it for new data. Maybe we'll rename it to New, New, New Vault. Get subscribed, so, you don't miss any of that. We'll hopefully be building that new server this week. Now, if everything were set up properly with regularly scheduled and post power loss scrubs, this entire problem would probably have never happened. And if we had a backup of that data, we would be able to simply restore from that. But here's the thing, backing up over a petabyte of data is really expensive. Either we would need to build a duplicate server array to backup to, or we could back up to the cloud. But even using the economical option, Backblaze B2, it would cost us somewhere between five and 10,000 US dollars per month, to store that kind of data. Now, if it was mission critical, then by all means it should have been backed up in both of those ways, but having all of our archival footage from day one of the channel has always been a nice to have and an excuse for us to explore really cool tech that we otherwise wouldn't have any reason to play with. I mean, it takes a little bit more effort and it yields lower quality results, but we have a backup of all of our old videos. It's called downloading them off of YouTube or Floatplane, if we wanted a higher quality copy. So, the good news, is that our production 1X server is running great. With proper backups configured, and this isn't gonna have any kind of lasting effect on our business, but I am still hopeful that if all goes well with the recovery efforts, we'll be able to get back the majority of the data, mostly error free. But only time will tell, a lot of time because transferring all those petabytes of data off of hard drives to other hard drives, is gonna take weeks or even months. So, let this be a lesson, follow proper storage practices, have a backup and probably hire someone to take care of your data if you don't have the time. Especially if you measure it in anything other than tenths of terabytes, or you might lose all of it. But you won't lose our sponsor, Lambda. Are you training deep learning models for the next big breakthrough in artificial intelligence? Then you should know about Lambda, the deep learning company. Founded by machine learning engineers, Lambda builds GPU workstations, servers, and cloud infrastructure for creating deep learning models. They've helped all five of the big tech companies and 47 of the top 50 research universities accelerate their machine learning workflows. Lambda's easy to use configurators let you spec out exactly the hardware you need from GPU laptops and workstations all the way up to custom server clusters and all Lambda machines come pre-installed with Lambda Stack, keeping your Linux machine learning environment up to date and out of dependency hell. And with Lambda Cloud, you can spin up a virtual machine in minutes, train models with 4 NVIDIA A6000s, at just a fraction of the cost of the big cloud providers. So, go to Lambdalabs.com/linus to configure your own workstation or try out Lambda Cloud today. If you liked this video, maybe check out the time I almost lost all of our active projects when the OG 1X server failed. That was a far more stressful situation. I'm actually pretty relaxed right now for someone with less much data on the line. - [Jake] Yeah, must be nice. - Yeah, I'm doing okay, thanks for asking. I mean, I'd prefer to get it back, you know.(chuckles)
Info
Channel: Linus Tech Tips
Views: 2,532,307
Rating: undefined out of 5
Keywords: lost data, corrupted data, storage array failure, we lost data, hard drives, seagate, 45drives, server, computer, amd, nvidia, intel, petabyte project, delta server, 1pb of data
Id: Npu7jkJk5nM
Channel Id: undefined
Length: 11min 55sec (715 seconds)
Published: Sat Jan 29 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.