Nvidia's Breakthrough AI Chip Defies Physics (GTC Supercut)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the rate at which we're advancing Computing is insane over the course of the last 8 years we've increased computation by 1,000 times 8 years 1,000 times remember back in the good old days of Mo's law it was 10x every 5 years 100 times every 10 years in the middle of the PC Revolution 100 times every 10 years in the last eight years we've gone 1,000 times we have two more years to go the rate at which we're advancing Computing is insane and it's still not fast enough so we built another chip Hopper is fantastic but we need bigger gpus and so ladies and gentlemen I would like to introduce you to a very very big GPU ladies and gentlemen enjoy this [Music] [Music] to what [Music] Blackwell is not a chip Blackwell is the name of a platform uh people think we make gpus and and we do but gpus don't look the way they used to this is the most advanced GPU in the world in production today this is Hopper this is hopper Hopper changed the world this is Blackwell it's okay Hopper 28 billion transistors and so so you could see you I can see that there's a small line between two dies this is the first time two dieses have abutted like this together in such a way that the two chip the two dieses think it's one chip there's 10 terabytes of data between it 10 terabytes per second so that these two these two sides of the Blackwell Chip have no clue which side they're on there's no memory locality issues no cash issues it's just one giant chip when we were told that Blackwell's Ambitions were beyond the limits of physics uh the engineer said so what and so this is what happened and so this is the Blackwell chip and it goes into two types of systems the first one is for fit function compatible to Hopper and so you slide on Hopper and you push in Blackwell that's the reason why one of the challenges of ramping is going to be so efficient there are installations of Hoppers all over the world and the same infrastructure same design the power the electricity The Thermals the software identical push it right back and so this is a hopper version for the current hgx configuration the second Hopper looks like this now this is a prototype board and this is a fully functioning board and I just be careful here this right here is I don't know1 billion the second one's five it gets cheaper after that so the way it's going to go to production is like this one here two Blackwell chips and four Blackwell dyes connected to a Grace CPU the grace CPU has a super for fast chipto chip link what's amazing is this computer is the first of its kind where this much computation fits into this small of a place but we need a whole lot of new features in order to push the limits Beyond if you will the limits of physics and so one of the things that we did was We Invented another Transformer engine and so this new Transformer engine we have a fifth generation MV link it's now twice as fast as Hopper but very importantly it has computation in the network and the reason for that is because when you have so many different gpus working together we have to share our information with each other we have to synchronize and update each other having extraordinarily fast links and being able to do mathematics right in the network allows us to essentially amplify even further so even though it's 1.8 terabytes per second it's effectively higher than that and so it's many times that of Hopper overall compared to Hopper it is two and a half times the fp8 performance for training per chip it also has this new format called fp6 so that even though the computation speed is the same the amount of parameters you can store in the memory is now Amplified fp4 effectively doubles the throughput this is vitally important for inference the amount of energy we save the amount of networking bandwidth we save the the amount of waste of time we save will be tremendous the future is generative which is the reason why we call it generative AI which is the reason why this is a brand new industry the way we compute is fundamentally different we created a processor for the generative AI era and one of the most important parts of it is content token generation we call it this format is fp4 that's a lot of computation 5x the token generation 5x the inference capability of Hopper seems like enough but why stop there and so we would like to have a bigger GPU even bigger than this one and so we decided to scale it so we built another chip this chip is just an incredible chip we call it the mvlink switch it's 50 billion transistors it's almost the size of Hopper all by itself this switch ship has four MV links in it each 1.8 terabytes per second what is this chip for if we were to build such a chip we can have every single GPU talk to every other GPU at full speed at the same time that's insane and as a result you can build a system that looks like this this is what a dgx looks like now remember just six years ago I delivered the uh first djx1 to open AI that dgx by the way was 170 teraflops that's 0.17 pedop flops so this is 720 and so this is now 720 pedop flops almost an exop flop for training and the world's first one exop flops machine in one rack just so you know there are only a couple two three exop flops machines on the planet as we speak and so this is an exif flops AI system in one single rack well let's take a look at the back of it so this is what makes it possible that's the back that's the that's the back the dgx MV link spine 130 terabytes per second goes through the back of that chassis that is more than the aggregate bandwidth of the internet we could basically send everything to everybody within a second 5,000 mvlink cables in total two miles now this is the amazing thing if we had to use Optics we would have had to use transceivers and retim and those transceivers and re ERS alone would have cost 20,000 watts 2 kilowatt of just transceivers alone just to drive the MV link spine as a result we did it completely for free over mvlink switch and we were able to save the 20 Kow for computation this entire rack is 120 kilowatt so that 20 kilow makes a huge difference it's liquid cooled what goes in is 25° C about room temperature what comes out is 45° C about your jacuzzi so room temperature goes in jacuzzi comes out 2 L per second we could sell a peripheral 600,000 Parts somebody used to say you know you guys make gpus and we do but this is what a GPU looks like to me when somebody says GPU I see this two years ago when I saw a GPU was the hgx it was 70 lbs 35,000 parts our gpus now are 600,000 parts and 3,000 lb okay so 3,000 lb ton and a half so it's not quite an elephant now let's see what it looks like in operation if you were to train a GPT model 1.8 trillion parameter model it took about 3 to 5 months or so uh with 25,000 ampers uh if we were to do it with hopper it would probably take something like 8,000 gpus and it would consume 15 megawatt 8,000 gpus on 15 megawatts it would take 90 days about three months if you were to use Blackwell to do this it would only take 2,000 gpus 2,000 gpus same 90 days but this is the amazing part only four megawatts of power so from 15 yeah that's right Blackwell would be the most successful product launch in our history and so I can't wait to see that let's talk about the next wave of Robotics the next wave of AI robotics physical AI so far all of the AI that we've talked about is one computer data comes into one computer we take all of the data we put it into a system like dgx we compress it into a large language model trillions of tokens becomes billions of parameters these billions of parameters becomes your AI so I just described in very simple terms essentially what just happened in large language models except the chat GPT moment for robotics may be right around the corner and so we've been building the end to end systems for robotics for some time I'm super super proud of the work we have the AI system dgx we have the lower system which is called agx for autonomous systems the world's first robotics processor when we first built this thing people are what are you guys building it's a s so it's one chip it's designed to be very low power but it's designed for high-speed sensor processing and Ai and so so if you want to run Transformers in a car or anything um that moves uh we have the perfect computer for you it's called the Jetson and so the dgx on top for training the AI the Jetson is the autonomous processor and in the middle we need another computer we need a simulation engine that represents the world digitally for the robot so that the robot has a gym to go learn how to be a robot we call that virtual world Omniverse and the compter computer that runs Omniverse is called ovx and ovx the computer itself is hosted in the Azure Cloud okay and so basically we built these three things these three systems on top of it we have algorithms for every single one now I'm going to show you one super example of how Ai and Omniverse are going to work together the example I'm going to show you is kind of insane but it's going to be very very close to tomorrow it's a robotics building this robotics building is called a warehouse inside the robotics building are going to be some autonomous systems some of the autonomous systems are going to be called humans and some of the autonomous systems are going to be called forklifts and these autonomous systems are going to interact with each other of course autonomously and it's going to be overlooked upon by this Warehouse to keep everybody out of Harm's Way the warehouse is essentially an air traffic controller and whenever it sees something happening it will redirect traffic and give New Way points just new way points to the robots and the people and they'll know exactly what to do this warehouse this building you can also talk to of course you could talk to it hey and all of this is running in real time what about all the robots all of those robots you were seeing just now they're all running their own autonomous robotic stack let's talk about robotics everything that moves will be robotic there's no question about that it's safer it's more convenient and one of the largest Industries is going to be Automotive beginning of next year we will be shipping in Mercedes and then shortly after that jlr today we're announcing that byd the world's largest ev company is adopting our next Generation it's called Thor Thor is designed for Transformer engines Thor our next Generation AV computer will be used by byd the next generation of Robotics will likely be a humanoid robotics we now have the Necessary Technology to imagine generalized human robotics in a way human robotics is likely easier and the reason for that is because we have a lot more training data that we can provide the robots because we are constructed in a very similar way it could be in video form it could be in virtual reality form we then created a gym for it called Isaac reinforcement learning gym which allows the humanoid robot to learn how to adapt to the physical world and then an incredible computer the same computer that's going to go into a robotic car this computer will run inside a human or robot called Thor it's designed for Transformer engines the soul of Nvidia the intersection of computer Graphics physics artificial intelligence it all came to bear at this moment the name of that project general robotics 003 I know super good super good well I think we have some special guests do we hey guys so I understand you guys are powered by Jetson they're powered by Jetson little Jetson robotics computers inside they learn to walk in Isaac Sim ladies and gentlemen this this is orange and this is the famous green they are the bdx robots of Disney amazing Disney research come on you guys let's wrap up let's go five things where you going what are you saying no it's not time to eat it's not time to [Music] eat I'll give I'll give you a snack in a moment let me finish up real quick first a new Industrial Revolution every data center should be accelerated a trillion dollars worth of installed data centers will become modernized over the next several years second the computer of this revolution the computer of this generation generative AI trillion parameters this is what we announce to you today this is Blackwell amazing amazing processors MV link switches networking systems and the system design is a miracle this is Blackwell and this to me is what a GPU looks like in my mind everything that moves in the future will be robotic you're not going to be the only one and these robotic systems whether they are humanoid amrs self-driving cars forklifts manipulating arms they will all need one thing Giant stadiums warehouses factories they going to be factories that are robotic manufacturing lines that are robotics building cars that are robotics these systems all need one thing they need a platform a digital platform a digital twin platform and we call that Omniverse the operating system of the robotics world thank you thank you have a great have a great GTC thank you all for coming thank you
Info
Channel: Ticker Symbol: YOU
Views: 1,279,455
Rating: undefined out of 5
Keywords: nvidia gtc 2024, nvidia keynote, jensen huang keynote, nvidia robots, jensen huang robots, nvidia blackwell gpu, blackwell, nvidia, nvda, nvidia stock, nvda stock, jensen huang, openai, chatgpt, gpt4, msft, microsoft stock, msft stock, googl, google stock, artificial intelligence stocks, nvidia stock news, semiconductor stocks, gpt-4, nvidia news, ai copilot, omniverse, ai stocks, best ai stocks
Id: odEnRBszBVI
Channel Id: undefined
Length: 19min 43sec (1183 seconds)
Published: Tue Mar 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.