Looking at NVIDIA H100 with High End Exxact System

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's take a look at the Nvidia h100 this is the new Hopper line for data centers and exact has been kind enough to let me take a look at one of these systems that I am accessing remote it's got actually two h100 gpus on it so this is just a introductory video showing you the h100 we'll talk about where how this amazing GPU fits into the the overall lineup of gpus that you might be using so let's let's begin by just looking at the specs for this amazing system this is a for you rack mount 4 000 Watt titanium level power supply it's got dual AMD 96 core gpus running at 2.4 gigahertz two Nvidia h100 80 gigabyte GPU so that's 80 gigabytes on each of these we'll talk a little bit more about that in a moment this is this is some serious dollars of gpus I've seen h100s and there's three different types but I and these are the pcie I've seen those go for around 40K US dollars so this is like a couple of cars here easily and then a massive amount of ram 24 times 32 and quite a few quite a few pcie um nvme disk storage or guess you wouldn't even say disk storage but fixed storage this is the machine here you can you can see it here the gpus are here and here so this is the pcie version of of these and these do use the the fifth generation of pcie you can see there is no NB link going on in this particular system but you can see the three slots here for Envy link Envy link really only exists on Hopper anymore at least of the current gpus this has been moved to the data center I assume these are placed on opposite ends for for cooling reasons but if you were doing the the Envy link you would definitely move them closer together you can see the massive amount of memory here on this system and the Dual AMD CPUs right here this is a extremely extremely Advanced system and of course the star of the show the h100 right there clearly clearly visible at the at the one end of the system so h100 GPU you can see here is what it would look like with nvlink with those three slots made use of and inserted in this computer anyway in in pairs if you look at the data sheet for this GPU one of the first things you'll notice is there's three different columns here where they're giving the performance for this type of a GPU you can see at the different types so in certain scientific Computing you'll probably want the FP 64 or the 64-bit floating point but amazingly for the large language models and these extremely massive massive neural networks what you're seeing more and more is actually fp16 and even fp8 performance because you can just really pack those parameters in there's so much I don't know if you want to say redundancy or sparsity in the in the the parameters that it can power through with that lower Precision I mean the the neural networks are all about dealing with the redundancy of the of the different weights so this this fp8 is really the performance that they're looking at on these three lines now the one that's in the computer that I'm dealing with and I'm going to show you a demo of in just a second so stick around to the end we're going to run run some stuff on this and push it push it to its uh to its limit but we're dealing with the h100 pcie this is the one that I don't know if I wanted to buy one of these and try to put it into my my computer back there um that's probably the one the one that I would I would buy because it plugs just straight into a pcie and it is the h100 sxm this does require the sxm special GPU interface that I believe is used on the Nvidia their their partners and various certified systems that can they can have these so obviously you can see here the interconnect and is is quite quite fast for the NV link the h100 nvl and this is for a pair of of those this is for two of them together this is really dealing with the large language models and this is a special variant of this designed primarily for those you can see memory across these is pretty amazing 80 gigabyte I have 48 gigabyte on the RTX 6000 Ada that I'm using but this is close to twice that and then 188 between those two and the NV link is letting you effectively almost bind those into the same sort of the same GPU if we look at this video from Nvidia this shows you really kind of how these fit together they the Envy link just allows the the computers to use those three connections the gpus to to connect and exchange data but they have the NB switch and the Envy switch within an individual computer allows 8 or 16 of these gpus all within a computer to have direct GPU to GPU communication when you have more than you can do with those direct NV links then Envy switch takes it to the whole next level where you can have up to 256 gpus spread across multiple hosts and this is this is the type of system that you're training these gigantic large language models on where you just connect hosts and hosts together and and spend some serious money obviously on compute so let's see what it's capable of if I just pop into the Centos that exact gave me the system pre-configured with you can see the Nvidia SMI here you can see the two the two gpus ready ready for action so let's give them some action and see really what this 80 gigabytes of of RAM is actually going to get us let's see if I can actually make it run out of memory you'll see here I am running stable diffusion I'm using automatic as my interface I've got a number of things installed on this I'm going to do another video where we we play with this a bit a bit more but let's just do one really quickly I will do server computer sitting on Mars so yeah I'm not being really creative with my props here but that's not the important part I'm not I'm neither a textual nor any sort of artist but if we just run this and this is the first time I've generated so there's not a lot of warm up here but if I just generate this with the default parameters that's extremely fast I mean if you've run this on diffusion b or something on a Mac this first things first let's go ahead and bump this up to a 2 20 48 by 2048 squared image I can do this on on my GPU with its 48 gig of RAM this takes a little bit longer to to generate and it's running right now you can see estimated time is about one minute ish I will go ahead and jump through that just so that we get to there although it's ticking down really really fast this is this is great and you can also bump up the batch size now if I max out the batch size on the 48 gigabyte GPU that I typically work with it I can I can absolutely kill it and you see it's it's starting to resin there but let's go ahead I'm not going to blather on for this entire thing we'll get to the next experiment in a second and there it is and it is at the it is at the higher the higher resolution obviously I could do with some prompt engineering here but I think it's trying to make the uh the server look like like lunar Landers or maybe a Mars probe so let's go ahead I I'm gonna why not let's just bump the batch size up as high as the the UI is going to let me go and we're going to go with eight this might blow the memory but we'll we'll see we're generating it appears to be alive even looking on the Linux side all right it's estimating approximately 10 minutes but this is not something I was able to do on a 48 gigabyte GPU so this this is pretty amazing now you've got an entire h100 sitting there in in the second GPU slot saying why why are you not using me because out of the box I have not seen really a way to do stable diffusion at least the the text to image on two gpus has anybody done that let me know in in the comments I'm not typically dealing with dual gpus when I'm running stable diffusion we'll fast forward through this I do want to make sure that it does actually get there okay it is still cranking along and is this quite typical for stable diffusion at least from what I've seen anyway it's going to finish in less than its initial estimate uh I did not keep track of what time this started exactly I will I will certainly do a a more tested controlled Benchmark of this in the next video when I talk about stable diffusion directly I will I mean I guess I do have when it started from the clock so I will put that in in post but I did log in with another one so if we want to see Nvidia SMI on this that's it running right there so we can see the the two the two gpus both of them are definitely not not running we can see wow I I have not I'm only using 60 gigabytes I mean only but I'm not using anywhere near the entire 81 gigabytes that I have available with this utilization is at around 100 or is at a hundred percent 76 Centigrade on on that so I mean it's it's working almost done here we will fast forward to the end here and there we are it generated the batch size of eight all right I'll do some more stuff specifically on benchmarking it when we in a future video where I timed this a little bit better I'll put the number in in post so that you can see exactly how long that take that did not feel like 10 minutes completely to me at any rate uh thank you for watching this video and definitely check out the systems on exact if you're in the market for this kind of thing huge thank you to exact as well for providing this system for this video was this video like for uh was this video helpful to you click the like button subscribe so that you don't miss any of the other artificial intelligence videos that I put out thank you very much
Info
Channel: Jeff Heaton
Views: 10,024
Rating: undefined out of 5
Keywords:
Id: 3STJyioJr7k
Channel Id: undefined
Length: 12min 17sec (737 seconds)
Published: Mon Aug 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.