NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is the new computer industry software is no longer programmed just by computer Engineers software is programmed by computer Engineers working with AI supercomputers we have now reached the Tipping Point of accelerated Computing we have now reached the Tipping Point of generative Ai and we are so so so excited to be in full volume production of the h100 this is going to touch literally every single industry let's take a look at how h100 is produced [Music] okay [Music] 35 000 components on that system board eight Hopper gpus let me show it to you all right this I would I would lift this but I I um I still have the rest of the keynote I would like to give this is 60 pounds 65 pounds it takes robots to lift it of course and it takes robots to insert it because the insertion pressure is so high and has to be so perfect this computer is two hundred thousand dollars and as you know it replaces an entire room of other computers it's the world's single most expensive computer that you can say the more you buy the more you save this is what a compute trade looks like even this is incredibly heavy see that so this is the brand new h100 with the world's first computer that has a Transformer engine in it the performance is utterly incredible there are two fundamental transitions happening in the computer industry today all of you are deep within it and you feel it there are two fundamental Trends the first trend is because CPU scaling has ended the ability to get 10 times more performance every five years has ended the ability to get 10 times more performance every five years at the same cost is the reason why computers are so fast today that trend has ended it happened at exactly the time when a new way of doing software was discovered deep learning these two events came together and is driving Computing today accelerated Computing and generative AI of doing software just a way of doing computation is a reinvention from the ground up and it's not easy accelerated Computing has taken us nearly three decades to accomplish well this is how accelerated Computing works this is accelerated Computing used for large language models basically the core of generative AI this example is a 10 million dollar server and so 10 million dollars gets you nearly a thousand CPU servers and to train to process this large language model takes 11 gigawatt hours 11 gigawatt hours okay and this is what happens when you accelerate this workload with accelerated Computing and so with 10 million dollars for a 10 million dollar server you buy 48 GPU servers it's the reason why people say that GPU servers are so expensive remember people say GPS servers are so expensive however the GPU server is no longer the computer the computer is the data center your goal is to build the most cost effective data center not build the most cost effective server back in the old days when the computer was the server that would be a reasonable thing to do but today the computer is the data center so for 10 million dollars you buy 48 GPU servers it only consumes 3.2 gigawatt hours and 44 times the performance let me just show it to you one more time this is before and this is after and this is we want dense computers not big ones we want dense computers fast computers not big ones let me show you something else this is my favorite if your goal if your goal is to get the work done and this is the work you want to get done ISO work okay this is ISO work all right look at this look at this look at this before after you've heard me talk about this for so many years in fact every single time you saw me I've been talking to you about accelerated computing and now why is it that finally it's the Tipping Point because we have now addressed so many different domains of science so many Industries and in data processing in deep learning classical machine learning so many different ways for us to deploy software from the cloud to Enterprise to Super Computing to the edge so many different configurations of gpus from our hgx versions to our Omniverse versions to our Cloud GPU and Graphics version so many different versions now the utilization is incredibly High the utilization of Nvidia GPU is so high almost every single cloud is overextended almost every single data center is overextended there are so many different applications using it so we have now reached the Tipping Point of accelerated Computing we have now reached the Tipping Point of generative AI people thought that gpus would just be gpus they were completely wrong we dedicated ourselves to Reinventing the GPU so that it's incredibly good at tensor processing and then all of the algorithms and engines that sit on top of these computers we call Nvidia AI the only AI operating system in the world that takes data processing from data processing to training to optimization to deployment and inference end to end deep learning processing it is the engine of AI today we connected gpus to other gpus called mvlink build one giant GPU and we connected those gpus together using infiniband into larger scale computers the ability for us to drive the processor and extend the scale of computing made it possible for the AI research organization the community to advance AI at an incredible rate so every two years we take giant leaps forward and I'm expecting the next lead to be giant as well this is the new computer industry software is no longer programmed just by computer Engineers software is programmed by computer Engineers working with AI supercomputers these AI supercomputers are a new type of factory it is very logical that a car industry has factories they build things so you can see cars it is very logical that computer industry has computer factories you build things that you can see computers in the future every single major company will also have ai factories and you will build and produce your company's intelligence and it's a very sensible thing we are intelligence producers already it's just that the intelligence producers the intelligence are people in the future we will be intelligence producers artificial intelligence producers and every single company will have factories and the factories will be built this way using accelerated Computing and artificial intelligence we accelerated computer Graphics by 1 000 times in five years Moore's Law is probably currently running at about two times a thousand times in five years a thousand times in five years is one million times in ten we're doing the same thing in artificial intelligence now question is what can you do when your computer is one million times faster what would you do if your computer was one million times faster well it turns out that we can now apply the instrument of our industry to so many different fields that were impossible before this is the reason why everybody is so excited there's no question that we're in a new Computing era there's just absolutely no question about it every single Computing era you could do different things that weren't possible before and artificial intelligence certainly qualifies this particular Computing era is special in several ways one it is able to understand information of more than just text and numbers it can Now understand multi-modality which is the reason why this Computing Revolution can impact every industry every industry two because this computer doesn't care how you program it it will try to understand what you mean because it has this incredible large language model capability and so the programming barrier is incredibly low we have closed the digital divide everyone is a programmer now you just have to say something to the computer third this computer not only is it able to do amazing things for the for the future it can do amazing things for every single application of the previous era which is the reason why all of these apis are being connected into Windows applications here and there in browsers and PowerPoint and word every application that exists will be better because of AI you don't have to just AI this generation this Computing era does not need new applications it can succeed with old applications and it's going to have new applications the rate of progress the rate of progress because it's so easy to use is the reason why it's growing so fast this is going to touch literally every single industry and at the core with just as with every single Computing era it needs a new Computing approach the last several years I've been talking to you about the new type of processor we've been creating and this is the reason we've been creating it ladies and gentlemen Grace Hopper is now in full production this is Grace Hopper nearly 200 billion transistors in this computer oh foreign look at this this is Grace Hopper this this processor this processor is really quite amazing there are several characteristics about it this is the world's first accelerated processor accelerated Computing processor that also has a giant memory it has almost 600 gigabytes of memory that's coherent between the CPU and the GPU and so the GPU can reference the memory the CPU can represent reference the memory and unnecessary any unnecessary copying back and forth could be avoided the amazing amount of high-speed memory lets the GPU work on very very large data sets this is a computer this is not a chip practically the Entire Computer is on here all of the Lo this is uh uses low power DDR memory just like your cell phone except this has been optimized and designed for high resilience data center applications so let me show you what we're going to do so the first thing is of course we have the Grace Hopper Superchip put that into a computer the second thing that we're going to do is we're going to connect eight of these together using ndlink this is an Envy link switch so eight of this eight of this Connect into three switch trays into eight eight Grace Hopper pod these eight Grace Hopper pods each one of the grace Hoppers are connected to the other Grace Hopper at 900 gigabytes per second Aid them connected together as a pod and then we connect 32 of them together with another layer of switches and in order to build in order to build this 256 Grace Hopper Super Chips connected into one exoflops one exaflops you know that countries and Nations have been working on exaflops Computing and just recently achieved it 256 Grace Hoppers for deep learning is one exaflop Transformer engine and it gives us 144 terabytes of memory that every GPU can see this is not 144 terabytes distributed this is 144 terabytes connected why don't we take a look at what it really looks like play please foreign [Applause] this is 150 miles of cables fiber optic cables 2 000 fans 70 000 cubic feet per minute it probably recycles the air in this entire room in a couple of minutes forty thousand pounds four elephants one GPU if I can get up on here this is actual size so this is this is our brand new Grace Hopper AI supercomputer it is one giant GPU utterly incredible we're building it now and we're so we're so excited that Google Cloud meta and Microsoft will be the first companies in the world to have access and they will be doing exploratory research on the pioneering front the boundaries of artificial intelligence with us so this is the dgx gh200 it is one giant GPU okay I just talked about how we are going to extend the frontier of AI data centers all over the world and all of them over the next decade will be recycled and re-engineered into accelerated data centers and generative AI capable data centers but there are so many different applications in so many different areas scientific computing data processing cloud and video and Graphics generative AI for Enterprise and of course the edge each one of these applications have different configurations of servers different focus of applications different deployment methods and so security is different operating system is different how it's managed it's different well this is just an enormous number of configurations and so today we're announcing in partnership with so many companies here in Taiwan the Nvidia mgx it's an open modular server design specification and the design for Accelerated Computing most of the servers today are designed for general purpose Computing the mechanical thermal and electrical is insufficient for a very highly dense Computing system accelerated computers take as you know many servers and compress it into one you save a lot of money you save a lot of floor space but the architecture is different and we designed it so that it's multi-generation standardized so that once you make an investment our next generation gpus and Next Generation CPUs and next generation dpus will continue to easily configure into it so that we can best time to Market and best preservation of our investment different data centers have different requirements and we've made this modular and flexible so that it could address all of these different domains now this is the basic chassis let's take a look at some of the other things you can do with it this is the Omniverse ovx server it has x86 four l40s Bluefield three two CX-7 six PCI Express Lots this is the grace Omniverse server Grace same for l40s BF3 Bluefield 3 and 2 cx-7s okay this is the grace Cloud Graphics server this is the hopper NV link generative AI inference server and of course Grace Hopper liquid cooled okay for very dense servers and then this one is our dense general purpose Grace Superchip server this is just CPU and has the ability to accommodate four CPU four gray CPUs or two gray Superchips enormous amounts of performance in ISO performance Grace only consumes 580 Watts for the whole for the whole server versus the latest generation CPU servers x86 servers 1090 Watts it's basically half the power at the same performance or another way of saying you know at the same power if your data center is power constrained you get twice the performance most data centers today are power limited and so this is really a terrific capability we're going to expand AI into a new territory if you look at the world's data centers the data center is now the computer and the network defines what that data center does largely there are two types of data centers today there's the data center that's used for hyperscale where you have application workloads of all different kinds the number of CPUs you the number of gpus you connect to it is relatively low the number of tenants is very high the workloads are Loosely coupled and you have another type of data center they're like super Computing data centers AI supercomputers where the workloads are tightly coupled the number of tenants far fewer and sometimes just one its purpose is high throughput on very large Computing problems and so super Computing centers and Ai supercomputers and the world's cloud hyperscale cloud are very different in nature the ability for ethernet to interconnect components of almost from anywhere is the reason why the world's internet was created if it required too much coordination how could we have built today's internet so ethernet's profound contribution it's this lossy capability is resilient capability and because so it basically can connect almost anything together however a super Computing data center can't afford that you can't interconnect random things together because that billion dollar supercomputer the difference between 95 percent networking throughput achieved versus 50 is effectively 500 million dollars now it's really really important to realize that in a high performance Computing application every single GPU must finish their job so that the application can move on in many cases where you do all reductions you have to wait until the results of every single one so if one node takes too long everybody gets held back the question is how do we introduce a new type of ethernet that's of course backwards compatible with everything but it's engineered in a way that achieves the type of capabilities that we that we can bring AI workloads to the world's any data center first adaptive routing adaptive routing basically says based on the traffic that is going through your data center depending on which one of the ports of that switch is over congested it will tell Bluefield 3 to send and will send it to another Port Bluefield 3 on the other end would reassemble it and present the data to the GPU without any CPU intervention second congestion control congestion control it is possible for a certain different ports to become heavily congested in which case each switch will see how the network is performing and communicate to the senders please don't send any more data right away because you're congesting the network that congestion control requires basically a overriding system which includes software the switch working with all of the endpoints to overall manage the congestion or the traffic and the throughput of the data center this capability is going to increase ethernet's overall performance dramatically now one of the things that very few people realize is that today there's only one software stack that is Enterprise secure and Enterprise grade that software stack is CPU and the reason for that is because in order to be Enterprise grade it has to be Enterprise secure and has to be Enterprise managed and Enterprise supported over 4 000 software packages is what it takes for people to use accelerated Computing today in data processing and training and optimization all the way to inference so for the very first time we are taking all of that software and we're going to maintain it and manage it like red hat does for Linux Nvidia AI Enterprise will do it for all of nvidia's libraries now Enterprise can finally have an Enterprise grade and Enterprise secure software stack this is such a big deal otherwise even though the promise of accelerated Computing is possible for many researchers and scientists is not available for Enterprise companies and so let's take a look at the benefit for them this is a simple image processing application if you were to do it on a CPU versus on a GPU running on Enterprise Nvidia AI Enterprise you're getting 31.8 images per minute or basically 24 times the throughput or you only pay five percent of the cost this is really quite amazing this is the benefit of accelerated Computing in the cloud but for many companies Enterprises is simply not possible unless you have this stack Nvidia AI Enterprise is now fully integrated into AWS Google cloud and Microsoft Azure or an oracle Cloud it is also integrated into the world's machine learning operations pipeline as I mentioned before AI is a different type of workload and this type of new type of software this new type of software has a whole new software industry and this software industry 100 of them we have now connected with Nvidia Enterprise I told you several things I told you that we are going through two simultaneous Computing industry transition accelerated Computing and generative AI two this form of computing is not like the traditional general purpose Computing it is full stack it is Data Center scale because the data center is the computer and it is domain specific for every domain that you want to go into every industry you go into you need to have the software stack and if you have the software stack then the utility the utilization of your machine the utilization of your computer will be high so number two it is full stack data scanner scale and domain specific we are in full production of the engine of generative Ai and that is hgx h100 meanwhile this engine that's going to be used for AI factories will be scaled out using Grace Hopper the engine that we created for the era of generative AI we also took Grace Hopper connected to 256 node nvlink and created the largest GPU in the world dgx gh200 we're trying to extend generative Ai and accelerated Computing in several different directions at the same time number one we would like to of course extend it in the cloud so that every cloud data center can be an AI data center not just AI factories and hyperscale but every hyperscale data center can now be a generative AI Data Center and the way we do that is the Spectrum X it takes four components to make Spectrum X possible the switch the Bluefield 3 Nick the interconnects themselves the cables are so important in high speed high-speed Communications and the software stack that goes on top of it we would like to extend generative AI to the world's Enterprise and there are so many different configurations of servers and the way we're doing that with partnership with our Taiwanese ecosystem the mgx modular accelerated Computing systems we put Nvidia into Cloud so that every Enterprise in the world can engage us to create generative AI models and deploy it in a Enterprise grade Enterprise secure way in every single Cloud I want to thank all of you for your partnership over the years thank you [Applause]
Info
Channel: Ticker Symbol: YOU
Views: 462,622
Rating: undefined out of 5
Keywords: nvidia, nvda, nvidia stock, nvda stock, nvidia gtc 2023, jensen huang, nvidia keynote, openai, chatgpt, gpt4, gpt3, msft, microsoft stock, msft stock, goog, googl, goog stock, google stock, artificial intelligence stocks, nvidia stock news, semiconductor stocks, tsmc, tsm stock, asml, asml stock, gpt-4, stable diffusion, nvidia news, jensen huang keynote, nvidia 2023, ai copilot, computex 2023, nvidia computex 2023, nvidia keynote 2023, omniverse, ai stocks, best ai stocks
Id: 0EIwhvqCX1c
Channel Id: undefined
Length: 26min 7sec (1567 seconds)
Published: Sun Jun 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.