NVIDIA REFUSED To Send Us This - NVIDIA A100

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- We have looked at a lot of ballin' GPU over the years, whether it's the six titan views we had for the six editor's project, three GV100 Quadras for 12K Ultrawide gaming, or even this unreleased mining GPU, the CMP 170 HX. There are not a lot of cards out there that we have not been able to get our hands on in one way or another, except for one. Until now, the Nvidia A100, this is their absolute top dog, AI enterprise high performance compute, big data analytics monster, and they refused to send it to me. Well, I got one anyway Nvidia, so, deal with it. Just like everyone's gotta deal with my segues, smart deploy provides out of the box windows imaging support for over 1,500 computer models. You can deploy one windows image to any hardware model with ease and you can get free licenses worth over $500 at smartdeploy.com/linus. (upbeat music) The first two questions on your mind, are probably why we weren't able to get one of these. and what ultimately changed that resulted in me holding one in my hands right now. The answer to the first one is that Nvidia just plain doesn't seed these things to reviewers and at a cost of about $10,000. It's not the sort of thing that I would just, you know, buy. 'Cause I got that swagger. You know what I'm saying? As for how we got one, I can't tell you. And in fact, we even blacked out the serial number to prevent the fan who reached out offering to get us one, from getting identified. This individual agreed to let us do anything we want with it. So you can bet your butt, we're gonna be taking it apart. And all we had to offer in return was that we would test Ethereum mining on it, send a shroud, that'll allow 'em to actually cool the thing and reassemble it before we return it. So let's compare it really quickly to the CMP 170 HX, which it is the most similar card that we have. It's the silver metal and it's not ripped for my pleasure. Regrettable. - [Jake] Alright. - And we actually have one other point of comparison. This isn't a perfect one. This is an RTX 3090. And what would've been maybe more apt is the Quadro or rather they dropped the Quadro banding. But the A6000. Unfortunately that's another really expensive card that I don't have a legitimate reason to buy and Nvidia wouldn't send one of those for the comparison either. So the specs on this are pretty similar. We're gonna use it as a standin' since we're not really looking at any workstation loads anyway. So the A100. This is a 40 gigabyte card. I'm gonna let at that sink in for a second. And the craziest part, is that 40 gigs is not even enough for the kinds of workloads that these cards are used to crunch through. We're talking enormous data sets to the point where this 40 gig model, is actually obsolete now, replaced by an 80 gig model. And these NV Link bridge, connectors on the top here, let's go ahead and pull these off. These, there we go, are used to link up multiples of these cards so they can all pull memory and work on even larger data sets. Now the diet, the center of it, is a seven nanometer TSMC manufactured GPU called the GA 100. We're gonna pop this shroud off. We're gonna take a look at it. It has a base clock of just 765 megahertz, but it'll boost up to fourteen ten. That memory runs at a whopping one and a half terabytes a second of bandwidth on a massive 5,120 bit bus. It's got 6,912 CUDA cores and, what is it? 250 watt TDP. Woooh. She's packing. - [Jake] Oh, you're just going right for it. - I'm going right for. - [Jake] Oh geez. - This is Linus tech tips. - [Jake] And basically every part of this is identical to the CMP card. - It kinda looks that way. I mean the color's obviously different. - Yeah, but it looks like the clamshell is two pieces in the same manner. There's no display outputs. The fins look the same. - Now here's something the CMP card specifically didn't even contain the hardware for video in code, if I recall correctly. - Yeah, this doesn't have anything. - Okay, so it's not that it was fused off. It's just plain not on the chip. - Not on GA 100, yeah. - Okay but, - GA102, which is like 3090. - Yes. - Does have it. - Ooh. - And A6000. - Okay, you ready? - Oh God! So yeah. - Hey. - It's like exactly the same on the inside. Same junk power connector. - Wow. That is super junk, check this out guys. It uses a single eight pin EPS power connector, which you might think is a PCIE power connector. So here, look, I'll show you guys. This is an eight pin, like normal GPU connector, but watch, cannot go in. But if we take the connector out of our CPU socket on the motherboard, There you go although, the clips are interfering a little bit. I mean, what the heck is going on here, ladies and gentlemen? - You need more power. - Yeah exactly. - So you can combine two PCIE connectors into that. - [Andy] Can't remember how to get it outta here. I see the fingerprint of the technician who assembled the card though. - I think we have to unclip this part first. Oh, there's a little screw, right? - There's a little screw. - Haha, third type of screws. - [Andy] Yourself. - Didn't see that one, nerd. - [Andy] You're a nerd. - [Jake] Your face is a nerd. - [Andy] Your but nerd. - [Jake] Whoa. - It's not coming off, Jake. - What? You gotta like tilt it out, buddy. Whoa, whoa, whoa. Don't pull the cooler off. - See? It's like it's caught back here. - Hey ho. Hey, how you doing? - Jesus. - Stressful. Look, maybe if we break it, you'll actually have to buy one. - I don't wanna buy one. That's not the goal. - What? - I thought you put your hand up for a high five. I was like, "well, what are you talking about? I don't want to buy one." - Why not? Whoa, what is going on here? You see that? - It looks like there was a thermal pad there or something, but there isn't, its like greasy. - It actually, no, look at it closer. It's not greasy. It's, you see how this is brushed almost. Or looks somebody sandblasted it. - That part's not. I don't remember that on this card. - Alright, so the spring loading mechanism is just from the bend of the back plate, that's kinda cool. - [Jake] So I checked the CMP thing. Doesn't look like it. - [Andy] I wonder why they wouldn't have like a map. - [Jake] This doesn't look brushed at all. What did we, last time we twisted? - [Andy] No, I don't think we did. - Yeah we did. - [Andy] I'm pretty sure I just rimmed on it. - [Jake] Oh God! No. You were against rimming on it. And then we were like, just twist a little. - [Jake] Oh. God. Ah. It has an IHS. It looks basically the same. - [Andy] Yeah. - [Jake] We're gonna have to clean that off and see there's not much alcohol. - [Jake] No, I like to go in dry first. So yep, that's the same thing, alright. I mean, this isn't the first time Nvidia has used the same Silicon in two different products with two different capabilities. We see the same thing with their Quadro lineup versus their GForce lineup where things will just be disabled through drivers or fusing off different functional units on the chip. What I wanna know then is besides the lack of NV Link connectors on this one. - Well, they are in there. They're just not accessible and they probably don't work. - Right. What is the actual difference in function between them? (Jake sighs) - Well, this one doesn't have full PCIE 16X, - Right? - It does less memory. I think it has way less transistors, but it is still a GA100. - Yeah, so the transistors are there. - Yeah, they're probably just not functional. Let me see what the chip number is on that one. - Yeah, 'cause were we not even able to find a proper Nvidia.com reference to this one anyway. So we're just relying on someone else's spec sheet. So the transistor count could just be wrong. - Okay, so this is so the CMP card was a GA. - Look at this guy? - Yeah. What a weirdo. GA 100-105F. And this is a GA100-833. - If it's a GA100, I guess it could be a different GA100. I don't know. - I mean, it used to be back in the day, you would assume that it's just using the same Silicon as the GForce cards because Nvidia's data center business hadn't gotten that big yet, but nowadays, they can totally justify, an individual, like new guide design for a particular lineup of enterprise product. - And interestingly enough, the SXM version doesn't have an IHS at least it seems that way. But the SXM version is also like 400 Watts. And this is like 250. - [Andy] Yeah, totally different classes of capabilities, alright? Let's put it back together then, shall we? - I got your new goop. - Goop me. - I brought two goops. - We're going for the no look catch. - Oh yeah baby. - Yes. X marks the spot, baby. My finest work. - Maybe it'll perform better now. - [Andy] Probably not. (Jake laughs) (Andy truck signals) We're backing it up. (Jake chuckles) - [Jake] Cool story, bro. - [Andy] Thanks. Thanks bro. - Where's our back plate. Did you take it? Oh shoot. - Yes. - Black. I thought it was gold. I was looking for gold. (Jake laughs) - [Jake] Aren't we all. - I don't know about you, but I found my gold. - What's that? - Yvonne. - Shut up (chuckles) - Alright. Alright. Let's get going here. Which one do you wanna put on the bench first? - What do you mean? We're not gonna compare to that thing. It doesn't do anything. We don't need this thing. - But here we go, boys. - We can't put this in the first lock. 'Cause we don't have a display output. - You like the bottom one? - Yeah, - You're a bottom? - Sure. - This, okay. This is how you flex IT style. Now you might have noticed at some point that the A100 doesn't have any sort of cooling fan. It's just one big fat, long heat sink with a giant vapor chamber under it to spread the heat from that massive GPU. So Jake actually designed what we call the shroud donator. It allows us to take these two screws that are on the back of the cart for securing it in a server chassis, because that's how it's designed to be used. So it's passive, but there's lots of airflow going through the chassis, and then lets us take those screw holes, and mount a fan to the back of the cart. It's frankly not amazing. (Jake chuckles) - What? No. That is aerodynamics at its peak. You should hire me to work on F1 cars, okay? - Yeah. Not so much. - Yeah. It only blows probably more air out this end from the back pressure than it does on this end. (laughs) But it's enough to cool it, I swear. - It is. - Yeah. - Let's go ahead and turn on the computer, shall we? - Oh yeah, so a couple interesting points here. It wouldn't boot right off the bat. You have to enable Above 4G decoding. And then I also had to go in and I think it's called like 4G MMIO or something like that. I had to set that to 42. - Okay. - The answer to the universe. - Yes. Thank you. And they are both here. A100 PCIE 40 fricking gigabytes. - I installed the like game ready driver for the 3090, and then I installed the data center driver, and I think it overwrote it, but the game ready driver, it still showed as like active and you could do stuff with the A100 and vice versa. So it's probably fine. - Now, interestingly, the A100 doesn't show up in task manager at all. - [Jake] Did the CMP, I can't, - [Andy] remember. - No, no. I don't think it did actually, anyways. - What do you wanna do in Blender, classroom? BMW? BMW's probably too short. - Yeah. Let's do classroom. I think BMW on a 3090 is like 15 seconds or something like that anyway so. - That's also like the spiciest 3090. - [Jake] That you can get. Yeah, pretty much. It's just so thick. Why would you ever use it? - Because you wanted, - Is it even doing anything like (chuckles) - Here's one reason, 'cause you can do classroom renders in a minute and 18 seconds, that's why? - Okay. Well, what about the A100? You didn't plug the fan in, hey. - Oh whoops. How hot is this? - Probably warm. - Fortunately it hasn't been doing anything. Time to beat is a minute and 18 seconds. So let's go ahead and see how it does. - It feels like this is the intake. I mean it's hot. So like, - Oh yeah. But it's going. It's going Jake. It's going. You did good. - It works enough. This should be like, this is all. - This should be way faster. - Way huge GPU, right? - [Andy] It's actually slower. - [Jake] How much? Not by much. - It's like a few seconds, but it's slower. - So it's worse in CUDA. What about Optixs? So the interesting thing is this card doesn't have Ray Tracing cores. The 3090 does, see you'd think that Optixs would only work on the 3090, right? - Do you want me to just try the A100? - Yeah, sure. It's still GPU compute. - I mean you gotta give it to it in terms of efficiency. For real though, even running two renders to the 3090's one, the average power consumption here is still lower. - [Jake] Yeah well, and looking at while it's running, it's like 150 Watts. - Yeah. - [Jake] Versus 350 or whatever it was on the 1390. - Yeah, ready to go again? - [Jake] Yep. - Okay. - [Jake] Oh my God. - Man, this thing is fast. - What's the power consumption? - [Andy] Holy bananas. - [Jake] 353. Still like just, I want one of these. This thing is sick. (Jake laughs) It's way faster. - Yeah. There's no question. We don't even need to. - It's gonna be like thirty seconds. - Yeah. Not even close. - So do you wanna know why? - I would love to know why. - You said it earlier. You just weren't really thinking about it. This has half the CUDA course of a 3090, it's likes seven thousandish I think. - Right, so it's just full of like machine learning stuff. - Yeah, so it has basically half the CUDA cores. So the fact that it is even close is kind of crazy in CUDA mode. But in Optix, what I found out is Optixs will use the Tensor cores for like AI Denoising, - [Andy] But nothing else. - Which you'll see in there. So I think it's falling back to CUDA for the other stuff. - [Andy] Got it. - But the 3090 has Ray Tracing and Tensor cores so. - Right. - It just demolishes (chuckles) - Where's the thing where you can select apps and then tell it which GPU to use. Yeah, here we go. No, so it'll not allow you to select the A100 to run games, even if we could pipe it through our onboard, or through a different graphics card like we did with that. - [Jake] It doesn't have DirectX Ray - Mining card ages ago. No DirectX support whatsoever. - [Jake] Let's check it in GPU-Z. - So way fewer CUDA cores. You can see that we go from over 10,000, to a lot less than 10,000. Pixel fillrates are actually higher. I guess that's your HBM2 memory talking. - [Jake] One point five Gigabytes per second. - What's a 39, One point five terabytes per second. It's like - [Jake] 50% or more - 60% almost. - Holy banana. - But what about the supported tech? Yeah, so we can do CUDA, OpenCL, - [Jake] PhysX (laughing) - Sure. - [Jake] We should set it as the PhysX card. - Dedicated PhysX card. All the rag dolls everywhere. - [Jake] And OpenGL but not Direct anything or Vulkan even. - OpenGL. Now that's interesting. - [Jake] Go to the advanced tab. 'Cause you can select like a specific DirectX version at the top under General. Like well, the DX 12. What does it say? Device not found. It's the same as the mining card. It'll do OpenCL. So we can't mine on it (chuckles) - Alright. I mean, should we try that? - [Jake] Yeah, we could do mining or folding or. - Sure, I have a feeling that's gonna kind of suck for that too. - There's not. - Like AI in mining. - I don't think so. It's still a big GPU dude. - So you can't. - Well suck is relative, right? Like for the price you'd never buy. - I think it might be better than the CMP card though. Just a little bit. - Shut up. - I think so. So the only thing you can adjust, I think this is the same with the CMP card is the core clock and the power limit. You can't mess with the memory speed. - [Andy] And you can move the power limit only down it looks like. - [Jake] Yeah. Top is the 3090, bottom is the A100. - [Andy] Wow. That is a crap tone faster than a 3090. - [Jake] It's pretty much the same as the CMP, but look at the efficiency. - 714 kilo hash per watt. - [Jake] And I bet you if we lower the power limit to like 80, it's a little bit lower speed. Maybe we can go, I don't know. We probably don't have to tinker with this too much. I mean, it doesn't draw that much power to begin with, I guess. - Yeah. I think it's pretty fricking efficient right outta the box. - I mean the efficiency is better. It's a little bit better, but before it was doing 175 mega hash roughly at 250 Watts, so it's pretty pretty good. 3090, you can probably do like 300 Watts with 120 mega hash. We're running the folding client now. I've had it running for a few minutes, and it's kind of hard to say. The thing with folding is, based on whatever project you're running, which is whatever job the server has sent you to process, your points per day will be higher or lower. So it's possible that the A100 got a job that rewards less points than the 3090 did. It does look like it's a bit higher, but you can see our 39. This is like a little, like comparison app thing is 31% lower than the average. So it's probably just that this job doesn't give you that many points. - Got it. - The interesting part is the 3090's drawing. 400 watt. - [Both] 400. - Holy shnikes. - [Jake] A100 is drawing. - 240. (Jake laughing) Man, that's efficient and performance per what? Maybe gamers don't care that much. Actually we know for a fact, gamers don't care that much. In the data center, that's everything, because the cost of the card, is trivial compared to the cost of power delivery, and cooling on a data center scale. - Especially when you have eight of these with a 400 watt power budget, like you would get on the SXM cards in a single chassis, times 50 chassis, like that's a lot of power (chuckles) - Let's try something, machine learning. - Unfortunately for obvious reasons, most machine learning or deep learning, whatever you want to call it, benchmarks, don't run on windows. So instead I've switched over to Ubuntu and we've set up the CUDA Toolkit, which is gonna include our GPU drivers that we need to even run the thing as well as Docker and the Nvidia Docker Container, which will allow us to run the benchmark. We're gonna be running the ResNet-50 benchmark, which runs within TensorFlow two. This is a really, really common benchmark for big data, clusters and stuff. Except our cluster, is just one GPU. In a separate window, I've got Nvidia SMI running. It's kind of like the Linux version of MSI Afterburner, but it's made by Nvidia, so not quite, but what it's good for, is at least telling us our power and the memory usage, which we should see spike a lot when we run this benchmark, I took the liberty of pre-creating a command to run the benchmark. So we're gonna be running with XLA on to hopefully bump the numbers a bit. We will do that for the A100 as well. So no worries there. It should be the same as well as using, what do you want? Look, he left cause he didn't have time for this. And now he's back. This is the world's most expensive lint roller. (Andy chuckles) I even don't remember what I was saying, damn it. Distractions aside, we're gonna be running with XLA on. That'll probably give us a bit higher number than you would normally, but it is still accurate and we're gonna be running the same settings on the A100 as well. So no concerns there. We'll also be using a batch size of 512 as well as fp16 rather than fp32. So if you wanna re-create these tests yourself, you totally can. Let's see what our 3090 can do. Look at that 24 gigs of VRAM completely used. God, I don't know if there's any application aside from like Premier that will use all that VRAM. I'm sure Andy can attest to that (strained laugh) Okay, 1,400 images a second. That's pretty respectable. I think like a V100, which is the predecessor to the A100 does like less than 1000. So the fact that a 3090, which is a consumer gaming card can pull off those kind of numbers is huge. Mind you, the wattage, 412 Watts. That's a lot of power. It'll be interesting to see how much more efficient the A100 is when we try that after. The test is done now, and the average total images per second is 1,400 and 35. It's pretty good. I've gone ahead and added our A100 so we can run the benchmarks on that instead. And I'm expecting, this is gonna be substantially more performant. So it's the same test. I'm just gonna run the command here. Gonna wait a few seconds. We got Nvidia SMI up again. You can see that it's just running on the A100. The RAM on the 3090 is not getting filled. We're just using that as a display output. See, all 40 gigabytes used. That's crazy. (Jack laughing) If we thought the 3090 was fast. Look at that Andy. That's like a full 1000 images more, we're getting like 2400 instead of 1400 and the icing on the cake. If you look at Nvidia SMI, we're using like 250 Watts instead 400, while getting like almost double the performance. That is nuts. - Probably the coolest thing about this whole experience though, is seeing the Ampere architecture on a seven nanometer manufacturing process. 'cause you gotta remember while none of this is applicable to our daily business. What this card does do, is excite me for the next generation of Nvidia GPUs. Because even though the word on the street is that the upcoming Ada Lovelace architecture, is not going to be that different from Ampere. Consider this, Nvidia's gaming lineup is built on Samsung's eight nanometer node, while the A100 is built on TSMC's seven nanometer node. Now we've talked a fair bit about how nanometers, from one fab to another, can't really be directly compared in that way. But what we can do, is say that it is rumored, that Nvidia will be building the newer ADA Lovelace gaming GPUs on TSMC's five nanometer node, which should perform even better than their seven nanometer node. And if the efficiency of improvements are anything like what we're seeing here, we are expecting those cards to be absolute freaking monsters. So good luck buying one. (Jake laughing) Hey, at least you can buy one of these. We've got new pillows, that's right. This is the, what are we calling it? - [Jake] Couch ripper. - The couch ripper the couch rip. It's an AMD themed version of our CPU pillow with alpaca and regular filling blend. And this video is brought to you by our sponsor, ID agent. 90% of data breaches start with a phishing email. So you can reduce your organization's chance of experiencing a cybersecurity disaster by up to 70% with security awareness training. That includes phishing simulation, Bullphish ID by ID agent is a Phish simulation platform that transforms your biggest attack surface, into your biggest defensive asset. You can add every employee to your security team with security awareness training that empowers them to spot and stop Phishing threats. You can automate training campaigns and reporting for stress free, consistent training that gets results. Choose from a rich set of Plug and Play Phishing campaign kits and video lessons accompanied by short quizzes, or you can create your own fishing campaigns and training materials easily. Bullphish ID provides effective affordable one-Stop phishing resistance training that fits any business and budget. Get two months for free and 50% off setup at bullphishid@it.idagent.com/linus If you guys enjoyed this video, maybe go check out our previous video, looking in more depth at the CMP 170 HX. - [Jake] I like this silver better. - If we were smart, we'd be mining on this, but we're not that smart. - [Jake] Well, you know, mining is bad.
Info
Channel: Linus Tech Tips
Views: 9,199,341
Rating: undefined out of 5
Keywords: nvidia a100, nvidia a100 holy shit, Ai compute card, nvidia ai card, the fastest gpu nvidia, rtx 3090, machine learning, deep learning, ai compute
Id: zBAxiQi2nPc
Channel Id: undefined
Length: 23min 46sec (1426 seconds)
Published: Wed Feb 23 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.