What's inside a Facebook Datacenter Open Compute Rack?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

F&%k Facebook and f&%k their data collection practices.

👍︎︎ 4 👤︎︎ u/[deleted] 📅︎︎ Dec 29 2019 🗫︎ replies

This is awesome. Things are changing quick and I'm liking it.

👍︎︎ 2 👤︎︎ u/Linuxllc 📅︎︎ Dec 29 2019 🗫︎ replies
Captions
for a number of years there has been a specific way to put a server inside of Iraq and make a data center but we're finding that there is a better way to build a mousetrap Facebook is doing it with the Open Compute Project now we asked Facebook if they come on camera and show us how they're doing their data centers and they said no we can't go on camera so the solution is William and I have just spent the last hour going through this entire rack with the Facebook people and so him and I are going to do the best to present it so I guess bloom will start with this what is the reason for having a new rack we've done it the same way what makes this better so for a large hyper scaler someone like Facebook they can afford to change up almost all of the rack designs to meet the efficiency requirements and the cost requirements of their data centers in a way that is maybe more optimal in the standard 19-inch rack so for the rest of the industry it made more sense where everyone has the same standard rack and because they had to interoperate they weren't able to really play with the form factors or play with the different types of sleds and switch and power configurations that someone who has sort of more budget and more flexibility where they have their own data centers they they now have this ability to play with this stuff Facebook has a ton of videos and pictures that people are showing so they need a lot of processing power to process those videos they need a lot of storage to store all of that data Amazon for example needs just a lot of processing power they don't necessarily need any GPO performed not any but not much GPU performance and they don't necessarily need a lot of storage so this rack can be completely configured one way for Amazon and completely configured a completely different way for Facebook that's right yeah absolutely it's great so yeah Amazon can adopt this Google could adopt this Microsoft can adopt this all the specs are online at the Open Compute foundation they have links for Facebook stuff every one of these designs that you see here does have PDFs it has board schematics it has everything now one of the things they've actually put a lot of thought into how they're going to put this rack together so you'll notice as we go through with this component by component all of these things are to list it means the data center techs can get in get the problem fixed and get back out we're also going to talk a little bit about the onboard controllers the open BMC modules that are going to allow these machines to talk out to the data center technicians and they will actually be able to track tickets and identify problems before even a human has noticed they exist Soufan goes out in the back they don't have to have a data center tech walking around to see if the fans are running no it throws an alert and that open BMC controller will contact and file a ticket on its own and say hey help me one of my fans says do something has died so this is probably a cluster as compared to what you would probably see in in production it would be more you know you know singularly it would be more homogeneous so you'll see here in our overview there's going to be a lot of different types of machines in this rack but you wouldn't normally see that cut like kind of configuration in a facebook data center they'll space those things out like you have a whole bunch of GPU compute doing one specific set of tasks killed a whole bunch of storage maybe you may just have a whole bunch of regular CPU compute type nodes all in one area well we'll start with the actual rack design and then we'll go sled by sled and show you what each one of these devices do and how they're put together let's take a look at that [Music] let's start with the actual design of the rack itself now the rack is a different dimension than your standard 19-inch rack that's right if you look at this rack it's actually 21 inches wide you'll get a better shot from the front where you can see they've actually installed switches that are 19-inch compatible but have little expanders to fit the 21 inch rack here so the rack itself actually has this backplane and inside of this you can actually see there's two silver bars and those are actually providing 12-volt power to the entire rack so this power supply unit which actually contains also the battery backup for the UPS is supplying DC power to the rack which eliminates a lot of the power and efficiencies that you'd have in a traditional rack because you're not converting from AC back to DC back to AC back to DC as we get into the components all the components in the rack are going to be DC the back plane of the rack is DC and so we just make that conversion once as we have take AC power in convert it to DC as well as provide the DC power for the battery back up yeah that's definitely something that's interesting and different from your normal rack because on your normal rack you have UPS battery banks that are going from AC to AC and then you're incurring the loss of going through the ups and incurring a loss again when you go into the server and the other nice thing about doing DC in these centralized locations you'll see there's one up here and there's actually one if I reach down there's one down here if you do it in these centralized locations you can have more efficiency gains by having larger rectifiers and then you know your larger battery banks and stuff and you're not going back to AC again and all sorts of other sort of points of inefficiency we'll go through as we get back to the front side when we pull the sleds out you'll be able to see this certain certain sleds actually have a rail tray that that come out towards the front so these some of these sleds can actually stay powered on even while they're being serviced but we'll start with I guess William why don't you show us these fan designs a modular way to replace fans quickly yes so it's really interesting is something they've done here is they've moved they've added the fans on the back here and they're not actually attached to the trays that can be pulled out the front and all you have to do is unscrew the fan and it pops right out of the thing so you can take the fan module out and that the fan breaks you just unscrew the back no tools needed there is a screwdriver thing on here but it's a thumb screw and then you can just take that plug it right back in if you have a new one and screw this thing down now they've actually optimized this in their future designs you'll see down here if I go and I pull on this fan unit they've actually added in a handle and a lever so that you don't even need to use a thumb screw anymore and it's got both of the large fans on it there and they plug into the backplane and you get the same metrics to get fan speed all the stuff right to the BMC all the designs that are kind of incorporated into this rack is it's meant to be cost-efficient so they're kind of removing pieces they don't need everything's very kind of industrial and I wouldn't call it unfinished looking but it's just not fancy like some of the enterprise hardware you see where they have these weird cool whimsical designs and stuff going on this is all very functional and robust and it's cheap easy for texture a place all the things they want in the data center where they're managing thousands of these machines and the standards are open so anybody can implement this yep and you can't necessarily tell whose it is unless you've seen it before right like you can't know this is a Facebook server without having asked people at Facebook are people who know and are familiar with the matter it doesn't say Facebook on it or anything let's dig into the actual components that go into the rack and show you how that works alright so let's start with the actual top of the rack and again like William said you can see here these are standard 19-inch switches but they have adapters that actually expand them out to the full depth now the truth is they're not standard 19 inch switches these are actually something pretty special yes so what's interesting is they've actually adapted the power supply modules so that it'll pass the DC from the back of the rack back of the power poles that we just saw before straight through into the switch without requiring the inverter to AC and then rectifying back down at the switch level and the switches aren't just normal switches they're actually white label switches that are designed with open standards as well that's right so there's a company called barefoot networks as one of the vendors who will make these and sell them to your retail and I believe if I'm not mistaken and they've looked at the designs in the past they're just Broadcom switches on the inside they're switch a6 and Broadcom there's an x86 board in there based around an atom or a xeon d platform and then a BM c to go with that so it kind of fits into their typical server design model alright let's take a look at this very first one go ahead and pull that out all right so here is a disk tray that is just a disc tray and a controller so that you can get SAS out to a machine above it so let's let's serve that so in a traditional server you would have a processor and a disk and a graphics card and a network card in this case this entire tray is basically just a big disk it's j-bot just a bunch of disks you got a controller here for doing your external SAS and it's probably got a switch chip so it can do more SAS than what is subscribed here on the port and you would connect that up to one of the machines above it as a host where would host all the disks to the network presumably all right now let's pull out the other tray that actually has a host attached to it because that is one of the options we have with this design so you'll see here this one you can see the memory modules on top and the CPU is kind of contained within but this host was attached to the same exact tray of disks but now you get a network connection out so you can connect the disks directly to a network with a presumably fairly low power CPU like a Xeon D or an atom or something and then you get all of the 15 disks of capacity in one of the open compute rack units of space all right we're back to the front of the rack let's take a look at the power supply units because like I say this is pretty interesting it's handling both DC and AC so this I assume is where the battery units would go and then above this this is where the actual power supply units aren't tell me about those yeah that's right so you can actually see on here they haven't included the battery units but they are required to run a rack and you'll see there are actually labels on the front it may be too far away but they put the batteries below the power supplies and we can actually take these power supplies out without any tools like everything else on the rack and it's a giant monster of a power supply that supplies power for I don't know maybe 1520 machines between all three of these power supplies the average puppy on the back the hefty banks because they have these huge metal rods that go to the 12-volt rails in the rack and then you can see some of the inputs there as well the average power draw power budget as they put it is 10 - I think you said 10 to 12 kilowatts that's right you have three of these up here and three below so between all six you're doing 10 to 12 kilowatts now the batteries are required like you said to run the unit now that's not actually physical requirement they've actually enforced that infirm where that's right so if you actually were able to look at the back of this unit there's a management controller and it is able to tell you the status of these power supply units and the status of the batteries and it's able to run tests on the batteries to make sure they're performing optimally can drain them down bring them back up make sure they don't explode those sorts of things so it can be doing testing all the time and ensuring this thing is running optimally what's this next slide that we have down here yes so the next slide is kind of similar to the first one we talked about this one was hard disks and it was all SAS or SATA based this one is actually all nvm ease so it's your newer tier of storage so if we pull this guy out it'll look fairly similar to the one above so it's gonna be fifteen disks wide and three deep so you get sorry five just to my 15 total and you'll see that it has a similar but newer connector for the disks and you'll see there's a lot of metal going on here and so if we take one of these outs which apparently we are unqualified to do you'll see it's just a metal tray and it hooks up with your typical u2 style connector on the back here and then we can take off the top piece to expose the insides and it just fits standard m2 drives inside of this guy so whereas the other whereas the other sled that was doing regular hard disk so this one is actually designed for flash storage it's designed for flash storage and by the looks of it designed for a very high heat producing flash storage at fatback's something you might not think about in a traditional server is a lot of graphic power but that's something that Facebook needs let's take a look at this graphic sled Williams Show me what we got having up going on we got this sled it's called the Big Basin it's what they put eight graphics processing units in so if you see here we've got all the power delivery stuff in the back which if you look is an insane amount of power delivery going on and this is based on Nvidia's reference design for their p 100 platform and you can see we have eight different GPUs inside of this tray and they all connect over PCIe and within the Nvidia case they do have NV link connections between some of them and you can see in the back we have the PCIe switches which interconnect all of this and it exports PCIe out the front here via these little cards which in some cases everytime errs and sometimes they're just really simple dump pcie what's interesting about that is a lot of the Open Compute Rack is designed around commodity hardware is designed around standard connections so this is just a standard PCIe slot and in this case they're using it for an interconnect to the rest of the the sleds but you could use it to add a network card you could use it to I assume even at a graphics card yes so what's really interesting is you'll notice this tray only has GPUs there's a management controller in the back to make sure the fans are running to make sure the GPUs are actually up and on line and just to sort of communicate with the rest of the health status network but there are no CPUs in here and so they actually use the PCI Express over these connectors to the host machines in the rack in order to do any useful work so we're making our way down the rack now we showed you the 15 disk regular hard disk controller we showed you the flash controller William let's go ahead and open this one up this is a monster 72 disc hard disk controller William tell me about this yes so this is what they call the Bryce Canyon version - and this holds 72 different independent SAS hard disks and you can see here they're all stacked vertically and the backplane is kind of down in the back and it has two different machines to control it so it controls the front 32 disks is one machine on the back 32 disks is a separate machine they just put it all in one location so that it's easy to maintain so let's start taking this thing apart all right so in here in the center area we've got the power cabling and you'll see if you push it in and out it actually unrolls in this little plastic guy that they've got going and you'll see we also have these trays with these very interesting-looking connectors and the connectors are feeding SAS or SATA depending on the types of discs you have - the disks and then they're feeding PCIe from these SAS controllers that are located on this board back to the host machines in this front part of the unit now the interesting thing about that this cable unit that's rolling up and and rolling out is that this unit can actually stay powered on as it's being serviced so is it rolls out we talked to you earlier about the power rail that runs underneath for some of these units this would not work in this particular case because of the sheer power requirements to run 72 disks so they actually run separate cabling inside of here so that this unit can state power as it's being d rekt yeah that's right one of the goals of all the equipment here is to be able to have it as maintainable as possible while also preserving uptime so in almost all cases you can pull all the units almost completely out of the rack and service individual parts like swapping discs or swapping even PCIe cards and those sorts of things without taking the unit offline or in the case where you have multiple machines which we'll show you later you can pull one machine out of the tray without powering off the other machines that are also in that same tray William let's go ahead and dig into this I want to see what's underneath these pieces here yes so under here we actually have both of the management computers so these are x86 machines and you'll see later these are actually taken from a similar design that is just meant for compute so these guys you get a single most likely xeon d we're looking at some kind of BGA product under their machine with four different dims probably two dimms per channel two memory channels that sort of thing right now it's populated as you can see with two piece of sticks of memory and it also has support for an m dot to boot drive on there this is essentially there's an entire it's an entire computer right in one unit computer that's right it's an entire computer you can boot this if you just provided power and maybe networking you could boot this and use it as something without the rest of the disks so it's acting as the host so that you can get these disks onto the network and it plugs right into the backplane and then hooks up to the SAS controllers over PCIe if those self-contained computers interests you they have a an entire sled for those let's go ahead and pull this out and William will show you this sled contains up to four of those compute modules and we'll dig into exactly what's in in in every one of these but there's essentially a backplane that connects all these together isn't there right that's right so if you look underneath these modules there's a little base board or the management controller and a NIC hooked up and that connects all four of these computers to the network using kind of their centralized infrastructure and something to know in the rack here you can see from the unit that was taken out there's actually a power rail that runs all the way from the back to the front of the machine all of that is solid-state so you don't have like an extra cable wiggling around and that provides power to these machines even when the unit is pulled all the way out of the rack like this so when we pull it out this way it would still stay on and could pull an individual machine out and service it without taking the other ones offline or even disconnecting them from the network okay so we've pulled one of these compute sleds out and we now have it sitting on a table so let's go ahead and pull out one of these computers and and tell me what's inside of one yes so if you look this again is fully tool list we can grab the machine we can just pull it right out by opening these latches and that allows us to service the individual unit so you'll see this unit is very similar to the design of the one we pulled out of the storage server it's got that individual machine and it's flanked by eight dims in this case this is actually one of the newer ones and you'll see it's all self-contained and on the back we don't have storage on the front this time we flip it over we go to the back you'll see there are these metal shrouds where you can put m dot two cards and in this case it has three of them and again two lists you could pop these right off by pulling on the connector tabs that would pop it off and expose an m dot two connector and then it's seated using the same two PCIe looking devices and then it just goes right back in the way it came out so each one of these sleds essentially contains four independent servers that contain storage it does processing it does memory the whole nine yards are all containing each one of these units and then it connects to a backplane now it's interesting these devices actually are sharing a single NIC that's right so there's gonna be one network interface cable coming in on the one NIC on the front you can see there's a single port that will be for your BMC and for all four nodes and they'll roughly share about ten gigabits per node I think this might be a 40 or 50 gigabit NIC in here this is absolutely a better way to build a mousetrap this was a very cool piece of technology and a huge congratulations to not only the Open Compute Project who are designing this but companies like Facebook and Google and all of the other companies that are putting these in their data centers because they are fundamentally changing the way that we do servers and data racks I mean what do you think William pretty cool right oh I think it's absolutely cool and it's absolutely gonna replace the way people do things in the data center from mind drift media this is William Kennington hey and I'm Noah July if you like this video check out the ass NOAA show it's a weekly talk radio show where we highlight more Linux and open source technologies you can catch it every Tuesday 6 p.m. ask NOAA show com
Info
Channel: Mind Drip Media
Views: 227,477
Rating: 4.8516445 out of 5
Keywords: internet, data, communication, web, virtual, energy, google, cloud, dump, amazon, apple, personal, IA, spy, emails, planet, impact, electricity, facebook, gafam, geek, gafa, data center, cloud security, ask noah show, server room, open compute foundation, open compute, open compute project, technology, conference, scale
Id: 2l6gI-ksdKs
Channel Id: undefined
Length: 18min 31sec (1111 seconds)
Published: Fri Mar 15 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.