this server WONT break

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

When you make as many videos as we do, you need a lot of fast, reliable storage. And our main editing server, Whonnock, has checked all of those boxes for years. It's a great little server. It's built out of high quality components and it even looks cool. But as our team has grown, we've reached the point where even a minute, one single minute of downtime costs over $50. And that's just in payroll. Now, practically speaking, the way to mitigate that is by adding redundancy. Now, our drives are already redundant. We've got 20 drives in there with data striping, but the problem is they all sit in one single server. And I'm sure you can see where this is going. It's been over a year in the making, but it's finally here, Whonnock's Final Form, and I'm calling it... Whonnock 10, because it's the last Whonnock ever. High availability Whonnock! Whonnock 10! I told you this like 10 times. Nobody even knows what high availability means. It means it's Linus proof. Just go ahead, unplug one. Do it, go for it. Okay, I should probably tell you the stakes before you do that. Each of these two grand twin boxes has four entire servers inside of them that were provided by Supermicro. We sponsored this whole thing. And they're set up with Weka, a redundant NVMe first file system. In this config, it should sustain two entire servers dropping out without anyone even noticing. Except that we moved the entire team onto it last night without telling anyone. And it's the middle of the workday with a ton of high priority videos in progress. Do you really wanna test it right now? I haven't tried that. All right, here we go. Okay. What could go wrong? I mean a lot. Naturally, a huge part of a project like this is the software, the stuff that's gonna handle distributing all of our hundred-ish terabytes of video projects, Word documents, and Linux ISOs to the multiple machines that we just showed you. But we can't install any software until we have some hardware. So why don't we start there? Meet the Supermicro Grand Twin A+ Server AS-2115GT-HNTR. Despite it's sort of ordinary looking appearance and unexciting sounding name, it is anything but ordinary and it is very exciting because inside this 2U is four independent computers. But for what we're doing, four nodes? Please, we want eight. Inside each of these is a completely independent motherboard, 384 gigs of memory, an AMD EPYC Genoa processor with 64 cores, dual M.2 slots for redundant boot drives, six PCIe Gen 5, two and a half inch NVMe slots up front, and we've got I/O in the rear. Now, this bit here could be a little confusing at first glance, but that is because not only do we have USB, but we have two full Gen5 by 16 PCIe connections back here, along with display output and power for the entire server. This whole thing slides into the chassis, which holds a really cool modular backplane assembly that we'll take a look at in a minute, and then passes through, thank you, Jake, to the back of the server, where you've got a management port, a single USB port for each server. Nope. It's two and they're shared. What the? I was about to ask, 'cause we've also got a single VGA. You see the button? For two servers there, no way. This button toggles? Yeah. And, okay, before we talk about that a little bit more, look at these power supplies. Each of these is 2200 watts, 80 plus titanium, which sounds like a lot, but when you're potentially handling four 400-watt Epic Genoa CPUs along with a bunch of RAM, up to 24 NVMe drives and eight network cards, well, it seems downright reasonable, doesn't it? Is it 24 drives? Can't be. Yeah. Six times four, yes. Six times four is 24. And of course, that's just one of them. We've got two of those. And that means that in the event that one of these dies, the system should be able to continue to operate uninterrupted, which is a big part of the high availability goal that we have for this deployment. Speaking of high availability, let's move on to our network cards. Each of those PCIe Gen5 by 16 slots I showed you guys before, terminates in one of these OCP 3.0 small form factor mezzanine slots. And what we're putting in them is these ConnectX 6 200 gigabit cards from Mellanox. Excuse me, from NVIDIA that, okay, these are the older Gen 4 ones. So they're gonna be limited by the slot speed of around 250 gigabit per second. But if we had newer cards, that means that each of these nodes could do 200... plus another 200, 400, up to 800 gigabit, which would of course be a complete waste for us, A, because our workload can't take advantage of it, and B, because our switch is only 100 gigabit. Sorry. Of course, the two ports are still helpful. We do have redundant switches. Except there's kind of a problem here. That's still a single point of failure. In a perfect world, we would have two single port NICs. So if a NIC were to die, it would still be okay. But because we have so many nodes, we're not really worried about an individual node. You know, they could have one boot drive and it die, or one NIC and it die. We still have an extra backup. How many nines do you want? I mean, I don't know, like one would be good. Nine percent? Which jokes aside is a really good point. If we were architecting this properly, there are so many more considerations that we would need to make. Like the power coming into the rack would have to come from two independent backed up sources. The connectivity to our clients would have to be redundant as well. The connectivity between all of the systems would have to be architected in such a way that no matter what fails, everything will stay up. And realistically for us, we're not gonna get that deep into it because our goal is better than we had before, which was a single machine with its own built-in redundancies, but other than that, nothing. Now at least we should be able to lose a full machine out of these eight. We can restart one of our core switches, totally fine. Two machines out of these eight, and we can still be limping along. I mean, limping is a bit of a stretch. It's going to be very fast. Now, normally if you buy a Supermicro machine, they're gonna pre-build it for you. They're gonna validate it for you. You can even have them pre-build an entire rack or racks of these things and then validate your application on it before it ships to you. In fact, we've got a whole video that we did about that that was sponsored by Supermicro a little while back. Of course, this is LTT, my friends. So we, will be assembling this one ourselves. Do you like that spin of the screwdriver above the server? Don't worry, I won't miss. I'll never miss. See, I could do this a hundred times and I would never miss. No, it's fine, it's good. It's okay, we have seven more. Anywho, for our CPU, we've gone with an Epic Genoa 9534. This is a 64 core, 128 thread. Monster of a CPU. It'll do 3.7 gigahertz max boost. It has a quarter gigabyte of level three cache, a 300 watt TDP, it supports DDR5 memory, up to 12 channels, and it supports a whopping 128 lanes of PCIe Gen 5. Originally, we were intending to go with 32 core chips, but they were out of stock, so. Free upgrade. Lucky us. Compared to previous generation AMD EPYC CPUs, Genoa is a big step up in terms of IO performance, which makes it perfect for this application. And in the long term, I mean, if we've got all the extra CPU cores and a whole bunch of RAM anyway, why run Weka on the bare metal when we could install Proxmox and then use the other cores for, I don't know, high availability, Plex server. Yeah, or Linux ISOs. More realistically, it would be something like Active Directory. Yeah, which we don't really wanna do right now because if you run Active Directory on one server and it goes down, you're gonna have a really, really bad time. But if you run it on a bunch of servers. Yeah, it's great. So normally, server CPU coolers would come with their own thermal paste pre-applied, but since we're doing this ourselves and... If you look carefully, it's not the first time that it's been installed. We are gonna be using... Okay, thank you for that. A piece of Honeywell PTM 7950. This stuff is freaking awesome. It has great thermal transfer properties and it can handle varying temperatures. Like seriously, I don't remember how many cycles it is. Not even just varying, but like a lot of temperatures. Huge cycles for a very, very long time. Now available at lttstore.com. Is that big enough? Does that cover all of the CCDs and CCXs and CCWs? Oh, there's a second piece of plastic. Am I stupid? Is there a second piece of plastic? No, there isn't. Should I go put one in the fridge? No, no, no, it's totally fine. I've done this like a bunch of times now. Oh, she's mint. Look at that, see? Easy. I would recommend putting it in the fridge before you use it. All right. To ensure we're making the absolute most of our CPU, especially in this high throughput storage workload, we're gonna be populating all 12 of our memory channels with 32 gig DIMMs of DDR5 ECC running at 4,800 megatransfers per second. That's a total of 384. About three terabytes of memory. What? Across all eight. Oh. Each of the cables Jake's removing right now is a PCIe by eight cable that feeds two of the drive bays in the front. But the reason he's taking them out is so that we can install our boot drives. These are consumer grade. Each system is getting two Sabrent 512 gig Gen 3 rocket drives. And it's not because they're particularly special in any meaningful way. They're not even that fast by modern standards. But what they are is, from our experience, reliable enough, and they're fast enough for what we're going to be doing, which is just booting our operating system off of them. Movie magic. All of the other nodes are already built. So, What do you mean movie magic? Supermicro built them. Oh, I thought you built them. Supermicro built them for you. I took it apart. Okay, fine. I took that one apart. No secrets left anymore. Yep. No intrigue, no mystery. You know what is still mysterious is inside of here. I've actually never opened this before. (SPEAKER_02)Oh, okay, let's have a look. Woo! Holy s*** Oh, that's the power supplies. Yeah, this is so cool. So the whole computer is cooled by four fans. No way. There's the two power supply fans, and then these fans in their, what do they call this? Like IO module, I think is what they call it. Look at the blades on this thing, counter-rotating. You're serious? That's what you're looking at? Not this? The most delicate of spaghet. Oh my God. There's not even connectors. No. Every one of these wires is soldered directly to the back of the OCP 3.0. What? Yeah. For storage, we're installing two of Kioxia's speedy CD6 Gen4 NVMe drives in each node. So we've got one that is seven terabytes and another one that is 15 terabytes. They're kind of placeholders for now. And in the long-term, we're gonna switch to something in the neighborhood of about four 15 terabyte drives per node. But the drives we wanna use are currently occupied by, oh, that project. By a top secret pastry related project. So that's gonna have to wait. The good news is that when those drives become available, Weka supports live upgrading and downgrading so we can just pull these drives, swap in the new ones, pull, swap, pull, swap, pull, swap. As long as we don't do it all at once. Are we ready to fire these things up? Okay, there's a lot going on here. What is that? Is that a switch? Yeah. Hey, look, you can see the button now. Oh, that's cool. Boop, boop, boop, boop, boop, boop. What you're hearing so far is just the Nvidia SN3700 32 port, 200 gig switch. Oh my God, it even says Mellanox on the front. I know, maybe it's an old like review sample and demo unit or something. No, we got it with the $1 million PC and I'm pretty sure that that was already Nvidia at that point. Can you hear that? You hear it getting louder? Yeah. Whoa! That one's just excited to see you. This is the Weka dashboard. Maybe if I go over here, cluster servers, we can see all of our servers. We have two drives per and then chorus. This is a very interesting part of how Weka works. It's not like TrueNAS, let's say, where it just uses the whole CPU for whatever you're trying to do. They dedicate and like fence off specific cores for specific tasks. For instance, each drive gets a core. So we've got two drive containers. That means two cores. A full core per drive. Yeah. Damn. Yeah. You also have compute cores, which do like the parody calculation and the inter-cluster communication. And then there's front end, which you don't necessarily always have. Front end cores manage connecting to a file system. So if you just had drives and compute, you wouldn't be able to access the files on this machine. So you would have your backend servers, right? Those would run drives and compute, which is the cluster. And then on your like GPU box, you would run just the front end. And that would allow the GPU box to connect to the backend cluster servers. Oh, I see. But the backend cluster servers don't need to run a front end unless you wanna be able to access the files on that machine. Or from that machine, which we want to. Right. 'Cause we're using SMB. We're using it as a- A file server. Stupid NAS for our stupid Windows machines. Yeah. You can also have a dedicated front-end machine. Yes. So if you had like a hundred back-end servers. But then that's adding a single point of failure, which is what we're trying to avoid. You could have multiple of them. Okay. They thought of that. Yeah. I set it up so every single machine in the cluster, all eight of them are part of our SMB cluster, which means, It cannot go down. Yeah. Realistically, there are a ton of other file systems out there that you could use for something like this. TrueNAS has their scale out set up for clustered ZFS, which only requires three nodes and is something we'd be quite interested in trying out. Or if you're looking for object storage, there's a million options, but the main open source one, MinIO, requires only four nodes. Though, when we saw how nuts Weka was when we set up the million dollar server cluster. I mean, we had to try it out for ourselves. And try it out, we did. So this is each node. Holy f***ing s***. Look, okay, the crazy thing is look at the read latency. (SPEAKER_02)Now guys, look, look, hold on, hold on, hold on. That's 70 gigabytes a second. We've seen numbers like this before, but we're talking with, in some cases, double the number of drives and- No file system. Without a file system. Like raw to each drive. This is- With a file system. With a file system over a network. And we're only using 100 gig ports. Like usually with a Weka setup like this, you'd probably use 200 gig. Well, yeah, 'cause we were, oh my God, we didn't know. 'Cause we didn't even have networking as a factor last time. All the drives were in one box. I know, I know. This is networking too. And the crazy part is we're not using RDMA. This is like some fancy, what's it called? DPDK, I think is the library. This is wild. Yeah, look at that. Read latency, 131 microseconds. That's 4 million read IOPS with a latency of one millisecond average. Are we able to keep using WekaFS? This is a trial. Okay. The software is quite expensive. This is unreal. 4 million IOPS. This is, it is unreal. It's way more than we could possibly ever need. But it's cool. It's so cool. Don't they support tiering and everything? Oh yeah, here, I'll show you actually what that looks like. This is on mother vault, which I think right now has 400 tebibytes left. So let's say max capacity is 400 terabytes. Now, once we run out of the hundred terabytes of SSD capacity, which you can see here. It'll just- It'll tier. I mean, it automatically tiers anyways, and you do need to make sure that your object store is at least the same size as the flash or bigger, because they're gonna automatically tier everything to it. That makes sense. So in theory, we move, manually copy everything from Vault. One time. To Weka one time, because it stores in like 64 megabyte chunks. And then it just stays there forever. Stays there forever. And then we just have one network share, and when something, needs to get vaulted, you just, you just move it from like, you just allow it to decay. Yeah, you would probably move it from pending projects to like done or something like that. We make a folder for done. Yeah, sure. And then it will just do it automatically. Wow. Or if it's a video that like somebody was working on and then, you know, it's been on hold for three months and we shot, you know. It'll decay. Half a terabyte of footage, it will just go away. And then when we're ready to work on it, it'll promote it back up. Holy s*** we could net boot off of this. Follow up video. Yeah, I mean, why not? It's so fast. So fast. You literally could not, we couldn't saturate this. Now, a lot of you at this point must be thinking, gosh, mister, that's an awful lot of computers for high availability. Couldn't you do this with two? And you're not that far off. The old school high availability NetApp storage appliances, like that one we looked at recently, did have just two machines. But those were both connected to the same storage drives. If each system has its own drives, when things can get out of sync, like let's say if one machine has downtime, you can run into a situation where each system believes with all the conviction in its heart that it has the correct data. And then if all you have is two, how will they decide who's right? This is typically referred to as split brain. And that's why the majority of high availability systems have at bare minimum three servers. This allows the third system to be a tiebreaker of sorts in the case of a disagreement. Now in our case, Weka, that stupid ultra-fast file system that we're using, which unlike anything that we've used before, has been built specifically for NVMe drives, not hard drives. Well, it requires a minimum of six nodes with a recommendation of eight, but running Weka can still be an advantage. Video editing with Adobe Premiere, like we use, is very latency sensitive, and even a small delay when going to access a clip can be enough to make the software crash. So any improvement there is huge, not to mention that a pair of these grand twins spec'd out to the max with 128-core Epic Bergamo CPUs. Would get you just four rack units with a thousand CPU cores, actually a little more, 24 terabytes of DDR5, and up to three petabytes of NVMe storage. I mean, heh, that makes our setup seem downright reasonable. Now, the average Weka customers are gonna be a little more demanding than us. Visual effects studios, AI developers, genomics labs, all the folks out there that need stupid fast, low latency storage. And Weka showed us screenshots of clusters that were reading in excess of one terabyte per second consistently. Obviously, that was a bigger cluster, but it shows you what can be achieved with this kind of hardware running on, I mean, what used to be the crappier option, software RAID. Man, I feel bad even calling it that these days. I had a interesting idea with the Supermicro folks. So you know how we have like two petabytes of 13 years worth of footage? Thousands and thousands of hours of footage, thousands. It's really cool that we have it, but it's really hard to use, unless you just happen to know what video the thing you were looking for is in. But what if you could just like search for something? Linus Sebastian, I want every clip with Linus Sebastian. Wow, bam, look at that. Shut up. And let's say, you know, there's this one. That's detected that it's you throughout the entire clip. Yeah, you're in a chair. So you could search for clips of Linus sitting down, with a keyboard. Yeah. Like we're gonna be able to actually find stuff. Yeah, right now there is a finite amount of objects that are trained. I mean- Chihuahua. Let me scroll through this. It's a lot. Eventually you'll be able to train it and tell it, "Hey, this is what a computer fan looks like." Or, "This is what an SSD looks like." Oh my God, that is so cool. So wait, is this running on these extra CPU cores? Okay, no, not right now. Faces and logos are running on CPU. Yeah. Objects, OCR, and scenes run on GPU. Got it. But they're not running on any of those machines. They're running on a GPU workstation that's Supermicro sent that's sitting at my desk. It was heavy. Anyways, what is happening on that new server is proxies. Because if we were to analyze the original clips, A, formatting is a huge problem. When you go into an AI model, it might not necessarily support the codec that you're filming in. Sure. But also, clips are like hundreds of megabytes a second, potentially. That would take forever. So instead, it generates proxies of everything first, which we're dumping. to that new server. And then we can take advantage of the lightning-fast storage, plus the massive compute, and we can basically create, like, a proxy map of what everything is in the main archive. Right. That is so cool! So far, I've generated 2.6 terabytes of proxies, which might not sound like a lot, but they're only 5 megabit. So it's actually, like, a lot. This is gonna be a flippin' game-changer. News, sports... Yeah! Can you imagine your CNN? You want that person wearing a red tie? Yeah! But right now, we've done 25,000, so 2.6 terabytes is 25,000 proxies. Okay, well, let's try and find something. Oh, hold on. Once you've generated a proxy, you have to then analyze it, right? Ah. So the analysis is not done. No, not even close. I've analyzed 22 clips. Okay, everything with Elijah. Elijah. And this is the every clip that Elijah's in. And you can even see This is the actual MAM, as they call it. Media Asset Manager. Yeah. The actual AI guys built this before it was like AI as far as I'm aware. Back when you would have had to make comments like this manually, now it's just AI. So all of the data is in here now. And we can see here's Adam. And Elijah. Oh, that's so cool. Here's all the different objects. Chair, flower pot, microphone. Oh, let me show you the scene understanding thing because that is so cool. This is like a brand new thing. They barely even worked it in. But it basically... No! It basically takes a snapshot every few seconds. Oh my god. Two men are working on a project in a room. There is a speaker, stereo equipment, there's a faucet, there's a tripod. There's the tripod. Some of these are a little less accurate. Two men are working on a robot in a room. It kind of looks like a robot, you know? I mean, yeah, sure. Two men are in a workshop looking at a laptop computer, looking at a machine. There is person Alex Clark. So this is just running right now in real time. Like, more stuff is getting processed as we sit here. Hey, see right here? Processing logos. There it is. Processing logos and faces. It's gonna take a while. Yeah, it's gonna take forever. They're still working on making it function on multiple GPUs. So once we can get it running on like four GPUs, say one GPU is doing face detection, one's doing scene analysis, one's doing object detection or something like that, we'll be able to go a lot faster, but right now it's just one GPU. Got it. But this is so cool. All that's left is to deploy it. Linus had to run away to do some other stuff, so I've hired some backup cavalry. Shawn, our infrastructure administrator, except we've run into a bit of a problem. Linus and me in our infinite wisdom while we were making this rack so much better, ran a bunch of cables right where we need to put the server. Did we just start unplugging s***? No. Yeah. How are we even gonna do this? We have to like part the seas. I know, exactly. I started to try to move some of the cables out of the way, but they're all twisted together. So hopefully the LTT cable management thing, which you can finally get at lttstore.com will save us. Beautiful, cable managed. We can slide a server in there now, I hope. You're in? Yeah, it's in. Ow, ow, ow, ow, ow, ow, ow. Okay, you're good. Just go. That wasn't so bad. Next. Hey, we're in. Now we just have to run a million cables. Uh-oh, let's see. Do you notice anything different? Well, it's loud. Most of that's actually just the vent is on. One of the air conditioners is broken again. But do you notice anything different? I mean, the sticker's here. That sticker's been there for years. Seriously, you haven't noticed anything else? Well, you guys screwed something onto the, oh, did you put sauna pan behind it? Yeah. But I thought this is supposed to be a vented door. My original plan. Was to get rid of the vent that you put in. But that vent was there as a backup in case the HVAC ever failed.Which is fine. It still works. So that fan is the exhaust and that's the intake. You see all the gaps? Oh my God. There's gaps. But do you notice the sound difference? Yeah, that's a big difference. It's huge. But that server is so loud. We basically ended up where we started. Yeah, but that's okay. I was just trying to normalize. I just mean, I didn't make it worse. It's not that. Okay. Look at that. Woo! Cute, right? God, that's a lot of metal. If all goes to plan, we could get rid of this and this and just have these. So no more additional rack taken up, which is nice. Wow. It should sustain two entire servers dropping out without anyone even noticing. Do you really want to test it right now? I haven't tried that. All right, here we go. What could go wrong? Yeah! I mean a lot. The fact that all the fans just like turned down a bit is a little scary. Let's go see if anyone noticed. Oh, hi Mark. Hi. I'm holding your file server. How's your edit going? Uh, what, huh? Is it working? It's working. Is this on wifi? Hey Emily. Hey. How's your edit going? I'm holding your server. That's cool. Everything's working. Is it working? Yeah. Are you sure? Yeah. Hoffmann. What's up? How's your edit going? This is your server right here. It's amazing. Look, feel it. It's still warm. Wow. Yeah, it's still warm. Well, how's it working? It's great. You know, I'm editing the video that we're shooting. You are? Yeah. We're gonna pull another one. Oh wait, no, Linus, you forgot one. Yeah, here. Here you go. Here's another one of your servers. Is it working? It's great though. For reference, you're not supposed to do this. You should power off the system first, but we're just trying to simulate it failing. Yeah, a terrible catastrophic failure. I can't believe how smoothly it handled that. See all the lights. They never stopped blinking. Big thanks to Supermicro for these awesome servers. Thanks to Weka for making this crazy software. Thanks to axel.ai for the awesome AI detection. If you liked this video, maybe check out the video series of us building our nearly three petabytes of archival storage, which we call the mother vault. That thing is awesome. And we showed it to you and it's faster now. Oh, and thanks to you for being an awesome viewer.

Info

Channel: Linus Tech Tips

Views: 1,401,180

Rating: undefined out of 5

Keywords: unraid, server, backup, server room, array rebuild, Linus' house, home lab, nas, storage server, data protection, data hording, truenas, home server, Supermicro, AI, Server Rack, Tech

Id: CcHevgjAnV0

Channel Id: undefined

Length: 27min 57sec (1677 seconds)

Published: Thu Mar 21 2024