Building a 100TB Folding@Home Server!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Real MVP here

👍︎︎ 7 👤︎︎ u/Jertzuu 📅︎︎ Apr 07 2020 🗫︎ replies

64 gigabytes of RAM and then a 960 gigabyte Optane 905p as two levels of caching for the mechanical drives should certainly speed up access!

Very cool what they're doing here.

👍︎︎ 6 👤︎︎ u/biciklanto 📅︎︎ Apr 07 2020 🗫︎ replies

Do F@H needs more servers or has it been taken care of ?

👍︎︎ 1 👤︎︎ u/b0urb0n 📅︎︎ Apr 07 2020 🗫︎ replies

How do you start using his F@H server? sometimes it takes hours to get a work unit.... Doesn't seem like it's helping.

👍︎︎ 1 👤︎︎ u/enkrypt3d 📅︎︎ May 10 2020 🗫︎ replies
Captions
- This is day two of isolation with a sore throat, and you're gonna be able to tell how long I've been at home by how long my facial hair gets. But that doesn't mean we're gonna stop making videos. So we did a Folding@Home call to arms last week, where we had people contribute computing power from their desktop machines to help run protein folding simulations to help in the fight against COVID-19, among other diseases. But as we mentioned in that video one of the main problems the Folding Project is having right now is not necessarily that there's not enough willing volunteers to contribute their compute power but that there's not enough servers to intake all the data. So we said we were gonna work with the Folding guys to build ourselves a Folding@Home target server and put it in our server room. That's what I'm gonna be doing here at home today and I've got one of our old, decommissioned servers along with a care package from Jake that I'm gonna be opening up, using to upgrade it and then Lysoling the heck out of this thing and sending it back to him. And the video is brought to you by GlassWire, instantly see your current and past network activity, detect malware and block badly behaving apps on your PC or Android device with GlassWire. Use offer code LINUS to get 25% off GlassWire at the link in the video description. So it's pretty obvious what all the volunteer contributors like you are doing for the Folding@Home project but many of you probably won't know what role the server plays in all this. So here's the thing, when you're folding your machine says, hey I need a job to work on. What the server does is it says all right, I've got a spot available, go ahead and connect to me. I'm gonna generate a job for you, send it off to you then you go ahead, you crunch those numbers and the job's ready to submit and the server says, okay I'm ready for you. You send it back and it stores it until the researchers who are working on the Folding@Home project are ready to grab it and do something with it. So for that to work, you actually need a few things. One is a decent amount of CPU power. Folding@Home recommends and eight core CPU which we actually do have but for a couple of reasons I'll get into later, we're gonna be upgrading the one in here. It's a little pinner. They also recommend about 64 gigabytes of RAM. That's gonna let us handle anywhere from 1000 to 1200 clients connected to our machine. The other things we're gonna need are a buttload of storage in order to hold those completed jobs and of course a fast internet connection in order to connect to all those clients. So because we got 10 gigabyte ethernet already built into this motherboard we don't need to make any upgrades there but we do need to swap out the 32 gigs of RAM we've got in here, that eight core CPU and of course we're gonna need to add some storage. Let's start with our CPU. The Xeon E3 2618 LV3 that we've got in here was just fine for basic file server duties especially given that we were running a RAID card in here when we originally deployed it so the CPU didn't even have to handle any storage, parity calculations or anything like that. But it's only got a 2.3 gigahertz base clock which means that if we're actually loaded up with somewhere over 1000 clients that thing is gonna be running darn near base speeds. Not to mention that Folding@Home recommends a gigabit internet connection and we are actually gonna be running 10 gig and we're hoping to provision off anywhere between about four gigabit and five gigabit of our internet connection to run to this machine so we wanna see this thing melt if we can get away with it. Let's pull out this chip and now seems like as good a time as any to open up my care package from Jackoo. What do we got in here? Ooh this is looking like pretty good stuff. Here's out two power cables for the redundant power supply that's already built into this machine, a VGA cable so we can test it, make sure it actually powers on after our upgrade. Oh shoot, these RAM sticks got kicked around a little bit. 32 gig registered ECC memory modules so that's gonna give us a total of 128 gigs of RAM in our completed config. I will explain why we need so much a little but later. Got some thermal paste, some screws and this seems like there's more than just a Optane SSD in it. Ahh, there it is! All right, so we are upgrading to a E5 2697 V3, that is a high performance, 14 core, 28 thread processor or at least it was a few years back. Nowadays it's not really anything special but it should be more than enough for what we're doing here today. Go ahead, gt that installed. So nice, he included a little cleaning pad but he also sent me a clean CPU so I'm not gonna bother with that. I love these Thick Boy thermal paste tubes, look at that, it's a hand for scale. I mean it's a small hand but you know how it is. Oh wow, that's a lot, ah, ah, well the good news is we're not likely to run out anytime soon. This is a very early unit of this motherboard Supermicro actually sent us. I think it was one of the first ones off the line for this thing 'cause it was one of their first MATX server boards that had 10 gig ethernet built in. 10 gig used to be more of like a , well why wouldn't you just use an add-in card for that? And it's become, over the last few years, an option to have just pre-built into the motherboard. And our CPU upgrade is done. Now let's talk about the storage. Folding@Home recommends about 50 terabytes of storage for one of these servers but because our network connection and our CPU, not to mention our memory are all beefed up we could quite possibly need more than that. So we're configuring ours with eight of these 12 terabytes Ironwolf Pro NAS drives from Seagate. That's gonna give us about 96 terabytes of raw space or 72 once we give up two of our drives for parity and then once we've formatted about 60 terabytes of usable storage. Fortunately this case is super easy to install drives in. It's actually on of the only things that I really, really like about it. You just pop these open, slide 'em in, not tools,, no sleds, no nothing, just boom, just like that. That takes care of our capacity but these are mechanical drives and even in ZFS RAID, they're not gonna be particularly fast so believe it or not, that's where our overkill memory upgrade comes in. So we're gonna be using 64 gigs of our RAM to meet the recommended specs from Folding@Home but we're gonna be using the other 64 gigs to actually act as an arc or a cache for our mechanical drives. These are gonna be running in a quad channel configuration so we should have plenty o' bandwidth and I don't know, whatever, I haven't shouted out Kingston for sending these to us in a long time. They actually sent them originally for the Six Workstations, One CPU project but we have used them for tons of things and they've worked basically in anything we've put them in so good on them for that. You guys probably noticed though that inside that Optane box there was also an actual Intel Optane 905p 96 gig SSD. This is gonna act as a level two arc so between our RAM and our SSD we are hoping to accelerate our mechanical storage array quite significantly, alleviating any performance bottlenecks that we could run into there. Uh, where's my screw drover? So we're just gonna take one of these open PCI Express Gen3 8X slots, I was gonna put it in the next slot over but I realized that puts it right up against the edge of the chassis and there's no cooling fan there so probably better off sitting next to the HBA and having a cooling fan near it as opposed to being farther away from another heat source but not having any direct cooling. One thing we're not changing is this HBA. The difference between an HBA and a RAID card is that an HBA doesn't have a CPU built into it to handle RAID calculations. It's just a card that adds more ports to your motherboard, in this case eight 12 gigabyte per second SAS ports even though we're only running them at say six gigabyte per second through this back plane over here. That's it then for the hardware upgrades and it all comes down to software configuration to decide how best to use it. So one of the things we could do is instead of using 64 gigs of our memory to accelerate our hard drives, we might cordon off just 32 gigs for that giving ourselves the capability to handle over 1500 clients. That's something that those extra CPU cores might come in handy for, or we might realize that we're better off using our extra CPU horsepower to enable realtime data compression to our hard drives like maybe we run into a network limitation or some other system limitation and realize we just don't have enough storage guys. That's something that you can do with CFS. For now though, all that remains close this puppy up, there we go, screw it together, and give the whole thing a good Lysol wipe down. I like Jake, he's a good guy, he can have the lemon-scented wipes. This is gonna be like the best smelling server ever. Hopefully Matias is here to pick it up and we'll send it back to the office. - Er, er, er, er. Hey let me open this. Okay, so I don't actually have the rails for this server so we're just gonna sit it on top of this Storinator. It's kind of ghetto but whatever. I also don't have a screwdriver so I'm not gonna look at what Linus did but I kinda know hat it is already. Man for a little server, she sure is heavy, there we go. Woo, okay. Hey, he gave me the cables back, nice. He stole the thermal paste though, what the, what the heck? Okay, we got power. Plug one in for IPMI so we can remote control this thing. Ah, oh right, I need a display. Do we even have spaces left on this side? Sweet, okay I gotta try and find this back here, so this is P. Where do these cables even come from? Should probably label this, right? Ah, I'll leave that for the next person. Eh, it's working! Just make sure everything's here. 110/28 gigs of RAM, everything else, it's working! Yes! Like Linus mentioned earlier we're gonna be using something called ZFS for our hard drive array so not only does it do the work of a file system by controlling how your data is stored and retrieved but it also handles how all the physical or virtual disks are partitioned and works to ensure the integrity of the data on said drives Now that we've installed ZFS and verified that it's actually working, we're gonna create our zpool. Now we're gonna be using something called RAID-Z2 which is actually based on RAID 5 but it's gonna allow us to lose two total drives before we lose any data. I've already kind of preconfigured the command, written out all the disk names, so we don't have to wait for that, and I've also defined our cache on here which means we're gonna have that 905p working right off the bat. So just copy this command in there, paste, enter and (grunts) Okay, let's check our zpool list now that it's, should be in there. We see 87 terabytes, which is a little lower than 96 but you now, drive partitioning and what have you, and then if we run DFH we can se we actually end up with 60 terabytes of usable space. That's a little bit down from 96 but it should be plenty for what we're trying to do. There's still a couple more things we need to check off the list before we can install our Folding software, namely I wanna set our arc cache to a max of 64 gigs and then I'm gonna enable compression because I think it's gonna be fine with the 14 cores we have, I think realistically our network is gonna be our bottleneck but we'll enable it for now and if down the road we have to turn it off, no big deal. Now as a sort of last step, I'm just gonna kinda look at our zpool so we're gonna do zpool status, Folding, and we can see all eight of our drives, one, two, three, four, five, six, seven, eight, and our cache drive are all there and ready to go and we have no errors which is perfect. In order for Folders to actually be able to access our Folding server I've created some firewall rules. Now that was pretty easy but we're also going to create something called a traffic limiter because we don't want this Folding server to eat up all of our 10 gigs of bandwidth so we're gonna limit it to 5 gigabit but I'm gonna show you guys how this works on the laptop. As a baseline, I'm gonna run a speed test on the system before we enable the traffic limiter so you can actually see the difference. Okay we have our results in and we're looking at 230 down and 200 up so I think I'm gonna limit it to five megabit which will be a very stark change and you should be able to see it right away. We're gonna go into our two rules here. It's already set to megabit so we'll do five and then we'll go to our in, set that to five then apply and then we just need to find our ip which I already did so 10.20.3.35 for this laptop. So I'll just go add the rule in. So here's our results and yeah! Five megabit right on the dot. So now we're gonna change this over to five gigabit and then set it up on our actual server. Now that we've got our zpool configured and working, we've got our bandwidth limiter and our firewall rules set up, it's just a matter of calling the Folding guys and getting them to install the software so we can get this work server going. - Well, it's been a week and as you can see, my isolation facial hair is quite long and this is great, I've got Jackoo on the line and we're gonna be checking out how well our Folding server is running. So Jake, give me the good news. - Well the good news is there's enough work units for most people now which is great. - Did we help with that or is that just serendipitous? - Yes but it's not only us helping so they've also got Microsoft's Azure, I think they donated some servers, Oracle, you know the guys that make Java and lots of other things, they're doing stuff too and then there's us but if we look on the stats here, we can definitely see that our server does have jobs available which means work units are available. - Nice 19,000 jobs! - I mean for the first five days it was up there was basically none 'cause they just couldn't create enough projects. It was no longer the servers but the people that were the bottleneck. - So how we doing now, how much usage have we got? - I was kind of expecting more to be honest but if we look at the past history you can see like - Ooh! - They spike up quite a bit so like right there is half a gigabit and then thats - Nice! - 600 and then if we scroll out actually so that's like 1.6 gigabit. - Wow! - So from my understanding, these actual spikes, a lot of the time, are actually when the different work servers are communicating with each other. The actual network bandwidth of individual clients connecting isn't too much but it's when they're trying to talk together that it really starts to hurt. - And how much storage have we used so far? - Last I checked this morning it was already two and a half or three terabytes of storage used. - That's awesome. - And we can look at our ZFS stats here which actually paints a little bit more of a picture. So our RAM cache I set to 64 gigs is completely full. You can see it's ramping up, slowly filling so as things are being removed from the RAM cache when they would normally get dumped back to the disks they actually go to the level two cache instead so this is already up to 64 or 40 gigs and it's only been like an hour or something since I reset it so it does take a few days to fill obviously because it's only filling up with stuff that's being removed from the primary cache. - Fantastic! And I actually have an update for you so we have created an exclusive Folding@Home shirt design and all profits from the shirts are going to be contributed to causes within Canada around the research, relief and treatment of COVID-19. So you guys are gonna have to check that out at lttstore.com and what's really cool is Intel is gonna match every dollar raised, dollar for dollar, to a maximum of 40 grand. - And so that 40 grand is actually gonna go directly to the Folding project so they can get servers and stuff, right? - Yes, that's correct, apparently you already knew about this. - I did already know, yes (laughs). This is actually one of my favorite short designs. - And the QR code on the back takes you to our team stats on extremeoverclocking.com - (laughs) - So join the cause guys, we're gonna leave this server running for the foreseeable future. I'll let you go now Jake and just share a word from our sponsors with he good viewers out there. Drop.com, the Koss GMR-54X-ISO Gaming Headset is the featured product this time around. These are audiophile approved and based on a popular Koss headset that was custom engineered for immersive 3D sound giving you positional queues for where your enemies are coming from. There are some changes made from the original so they've reduced the tension in the lightweight headband for improved comfort. They include a cord splitter, inline microphonewith remote and detachable boom mic and the boom mic works with the PS4, Xbox, Nintendo Switch and more without hassle. Get it today and new users who sign up on drop.com will get $20 off on this headset. We're gonna have it linked in the video description. So thanks for watching guys. If you enjoyed this video, maybe check out our previous part of this video where we built a monster client Folding machine as opposed to a server, that's some really good stuff too.
Info
Channel: Linus Tech Tips
Views: 1,513,638
Rating: 4.9566436 out of 5
Keywords: folding, folding@home, folding at home, folding project, distributed computation, servers, super computer, diy, server room upgrade, hard drives, mass storage, intel, seagate, kingston, ram, memory, ZFS, caching, ARC, RAM
Id: HaMjPs66cTs
Channel Id: undefined
Length: 17min 10sec (1030 seconds)
Published: Mon Apr 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.