GTC Taiwan - NVIDIA CEO Jensen Huang Keynote Address

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

TL:DW?

Im guessing AI?

👍︎︎ 13 👤︎︎ u/Doubleyoupee 📅︎︎ Jun 01 2018 🗫︎ replies

Captions

ladies and gentlemen please welcome nvidia founder and CEO Jenson Huang Raja Hall welcome to GTC Taiwan this is a conference that is dedicated to researchers and scientists whose work whose groundbreaking work are simply impossible with normal computers they need a supercharged a form of computer to solve the Grand Challenges that they're tackling that's what this conference is about this conference is also about the ecosystem of system makers supercomputer makers software developers who are creating the infrastructure to enable their work I've got a lot of things to tell you today so let's get started computing is the most important invention of humanity it is the single most important tool that we have ever created over the last 25 years the computer has advanced and performance 100 thousand times a hundred thousand times in 25 years and then it stopped Moore's law ended however application demand is greater than ever scientists and researchers are the brink of discovering solutions for precision medicine they're at the brink of being able to solve weather prediction and understanding climate we're at the brink of being able to discover the next groundbreaking material that's light and strong or new ways of store energy we're at the brink of discovering away from machines to operate themselves we're at the brink of discovering artificial intelligence computing demand is greater than ever so more than ever we need this computing performance to continue to extend we need to extend Moore's law and that's the reason why so many people have jumped on to the computing approach the computing architecture that we pioneered over ten years ago we call it NVIDIA GPU computing the developers are jumping onto this platform in droves in just the last five years developers have increased by ten times ten times eight hundred and fifty thousand developers of CUDA all around the world the number of CUDA downloads could as invidious software architecture computing architecture CUDA has been downloaded eight million times an increase of five times in the last five years incredible the number of attendees this is our record crowd in Taiwan we do GTC all over the world in Santa Clara in Japan in China in Europe and Israel in Washington DC in here and this last year the number of attendees grew to 25,000 people the largest ever this is in fact have become one of the largest computing developers events seven times in the last five years if you just look at the number of computing flops that are in supercomputers all over the world in just the last five years it is increased by 15 times that is so much faster than Moore's Law the reason for that of course is that without GPUs it is simply impossible to deliver the type of performance that is necessary in high-performance computing it isn't truck instructive to estimate what is the gap that we would create if we didn't find a solution forward in the last 25 years computing has advanced and improved in performance 100,000 times and then stopped so the next ten years if we just simply estimate the next ten years which is just around the corner the next ten years if Moore's law did not continue if we do not find a path forward applications will continue to demand more performance in the next ten years application will require another 100 times more performance using historical methods in fact I believe in the next ten years computing demand will be faster than a hundred times but even if it's just a hundred times let's estimate how much computing would be missed in the world in just ten years time this year let's estimate that we ship 20 million server class CPUs 20 million server class CPUs at 0.5 teraflops per CPU is 10 million teraflops 10 million teraflops by 2028 that will grow the demand will be 100 times more 100 times 10 million tera fluffs 100 times 10 million teraflops is essentially 1,000 exa flops the world does not have an exaflop supercomputer in the year 2028 we need to ship an equivalent of 1000 of those incredible amount of computing is essential to ship in just one year 2028 10 years from now so the gap is enormous the gap of computing demand is absolutely enormous just to put it in perspective if you take the highest performance GPU we have hey Paul could you hand me umm if you take the highest performance GPU we have this is the voltage GPU this is the most complex processor the world has ever created in the year 2028 we would need to ship an equivalent of 10 million of these 10 million 10 million of these would be equivalent to 1,000 exa flops unbelievable amount of computing is missing and that's why we need to find a solution today I'm going to talk about four things by the end of this presentation I will feel the schematic this is the only keynote from a CEO who has a schematic engineer keynote this is what you get when you come to an engineer CEOs keynote ok so we're going to talk about we're going to talk about several things we're going to talk about being all of the developments in the last year as we prepare for GTC the advances in the nvidia accelerated computing stack our eco system consisting of applications and system partners and i'm going to talk about six large markets the singular weakness the singular weakness of GPU computing is this you cannot just put a GPU a chip in a computer and magically it becomes faster there is no magical rearrangement of transistors that will allow you to put something inside the computer that magically makes it go faster if it was possible CPU makers would be doing it right now there are no more solutions to reorganize all of the transistors inside the CPU or even make more CPUs that will allow you to make applications go faster we now need to optimize the entire stack one NVIDIA GPU computing approach pioneered was the entire stack thinking from architecture to processor two systems system software api's libraries and application solvers we optimized across the entire stack one domain at a time one domain at a time and it is incredibly hard work and that's one of the reasons why it's taken us almost 10 years to get here there are six vertical markets today that we have optimized and accelerated and I want to talk to you about these markets because they're very very large if you're able to think across the entire stack my voice disappeared this morning I'm not yelling at you I just want you to hear me one second we are family it's okay right well I'm always so happy when I come to Taiwan [Applause] that's why I arrived early this is the only conference I go to where I ride two days early everybody asked me why are you going so early I said I said I don't know I just got to get there early so that I could see Spencer my son and we can eat everything so I already gained ten pounds in two days so good okay so accelerated computing accelerated computing to remember remember it's really about the architecture which we call CUDA GPU the design and all of the micro architectural innovations that go along in a process technology that we get from tsmc and all the packaging technology that we work work with you on advanced system memory technology system architecture systems software CUDA and nickel nickel for connecting multiple GPUs together and getting them to cooperating and synchronizing them very very complicated problem to solve us on top ku DNN tensor RIT I'll talk about many of them to the applications on top of that now the NAMM D is for molecular dynamics amber from molecular dynamics of quantum chemistry applications so and so forth so many different types of applications on top okay and so we optimize the entire stack in in the year 2013 in the year 2013 you add a GPU to the CPU you had a GPU to the CPU you accelerate the applications by already many times by just accelerating by just adding a GPU with that entire entire computing stack we can accelerate the CPU go forward go forward literally five years Fermi kepler Maxwell Pascal and now Volta five GPU generations later 5gp generation later notice every single layer of software has been innovated from 5.0 to 9.0 the entire computing stack has been modified improved enhanced new inventions and as a result of all that hard work we improved an application that was sitting at 1.0 x in 2013 to 100x in 2018 think about that we took an application that was at 1.0 x in 2013 you add a GPU to it and more and more generations more and more software by 2018 we've accelerated by 100 times without it the CPU would have lagged behind Moore's Law Moore's law would have increased naturally in the historical days would have increased by a factor of 10 in five years factor of ten in five years but it didn't achieve that only three or four and yet with the GPU we've been able to not only accelerate by a hundred times in that last five years the GPU itself the GPU accelerated stack itself has improved an increase by 25 times by 25 times what that basically says is we're now moving faster than Moore's law but it requires us to innovate across the entire stack I'm going to come back to that over and over again we're going to innovate across the entire stack and there's so many new ideas to innovate across the entire stack supercharged form of computing as in all forms of computing as in all technologies successful technologies ultimately does something that was impossible in the past but it becomes super successful if it was also more cost-effective this is what a high-performance computing cluster looks like in the past traditional 600 dual CPU servers would consume 360 thousand watts 360 kilowatts 360 thousand watts if it was GPU accelerated it looks like that can you see that before after did you guys see that did you understand it before after ok 30000 quad GPU servers 48 thousand watts one-fifth the cost 1/7 the space and 1/7 to power 1:50 cost how is it possible 1/5 the cost 1/5 the cost what a tremendous breakthrough what a tremendous breakthrough and that's why people tell me the more you buy the more you save the more you buy the more you save Nvidia is accelerated computing stack this is the promise of the future of computing but it requires a tremendous amount of work the amount of R&D that we dedicate into one singular computing stack is like no other computing stack in the world today but the benefit is really quite extraordinary the markets the first one the first market we started to cultivate and this was the if you will the tip up the sphere of GPU computing we went to the people who needed more performance the most in fact when I came to Taiwan once one of the most important things one of the most inspiring things that anybody ever said to me was a quantum chemist and as his son lives in California and told him about this brand new architecture we created called CUDA and his father was trying to do quantum chemistry on a supercomputer massive blue-jean IBM Blue Gene machine but it was not going fast enough and so he went down to the basically the retail store where the gaming store and bought a whole bunch of g-forces and he came back he programmed CUDA and on his quantum chemistry code and he was so amazed by the speed-up that he thought the answer was wrong and he had to wait for many more days for his IBM supercomputer to finish its job to compare the answer to realize that it was in fact right we created for him a time machine we created the world's first time machine and he said to me he said Jensen because of your work I can now do my life's work in my lifetime it almost brought me to tears it was just the most amazing thing anybody's ever said to me and it's just super super inspiring and so if you take a look at this industry this is the tip of the spear this is where we started a ten billion dollar industry I believe every single supercomputer in the future will be accelerated it is impossible to build a next-generation exa flops machine in any kind of a reasonable budget in any kind of a reasonable power using traditional methods you have to have acceleration the future of supercomputing its accelerated and we have now the entire stack all of the world's top supercomputing codes have now been accelerated with CUDA we have 550 high performance computing applications now ported on top of CUDA and accelerated and videos GPU is now in thousands of high-performance computing data centers and some of the world's most advanced and largest supercomputers including recently announced here in Taiwan most will build a supercomputer here in Taiwan for researchers for high-performance computing and AI thank you [Applause] if I talk loud and fast enough the frog in my mouth will come out the stack GPU computing basically works in several ways the first step of course is to build an amazing GPU that's the first step the first step is building an amazing GPU the second step is to create the libraries for that domain the system software the systems architecture the api's and the libraries accelerated libraries for that domain and in the case of high-performance computing it's linear algebra as FFTs it's all kinds of different types of libraries and we have all the libraries created and now with deep learning ku DNN and with inference tensor RT the libraries are in place the third step is to work with all of the application developers the solvers technical teams work hand-in-hand to accelerate to refactor their algorithms of their application and run it on our libraries if you're able to achieve that you can achieve a hundred times speed-up over five years just as I saw just as I showed before and then lastly technical teams all around the world work with the end markets to adopt a technology so architecture full stack application optimization ecosystem technical teams working with the market one two three and four this basic methodology has been applied in our company for over ten years and so the first application of states of GPU computing a ten billion dollar supercomputing market annual supercomputing market I believe will be completely accelerated in the near future the amount of computing that's needed in the world is growing not slowing although CPU performance has slowed demand in the market place is in fact growing this just came out this is from open a open AI measured what is the amount of computation that's necessary to train the most extensive neural network that year and he plotted it does it Illya it's open AI he plotted it over five years five years Moore's law in five years is 10x Moore's law in five years is 10x this is five years starting with Alex net in literally five years time the computation required the amount of computation that's required to train the network just one time increased 300,000 folds 300,000 this is the groundbreaking neural network that started everything called LX net 300,000 times in just five years the amount of computation that's essential to the future of computing is really quite extraordinary this is the reason why the reason is before this time software was written by humans and software engineers can only write so much software but machines can write enormous software machine that doesn't get tired and it types very fast with GPUs it can type very fast so long as there's data so long as there's knowledge in how to create the architecture creativity we can create absolutely enormous software and this is the future of computing gigantic software every single company in the world who develops software will need a supercomputer Nvidia today has 1,000 pet flops in operation for our software engineers and we need a lot more we absolutely need a lot more we're building it as fast as we can for the very first time in our company over the last five years we had to go build supercomputers so that our supercomputers could write software every single company in the world who believed software is their future we'll have to use AI and anybody who develops a I will have to have supercomputers to write the AI makes a lot of sense this is one of the reasons why software is going to get larger it's going to get more complex it's going to turbocharge into a double exponential they'll move faster than before and all software companies or all companies who develop software will need AI supercomputers it is for this reason that we reinvented the GPU our GPU has been advancing with more more performance and more computer graphics capability and more general-purpose over time every single generation we made our GPUs more general-purpose so that the number of applications we can process we can accelerate can continue to grow however with Volta we fundamentally re-engineered reinvented reimagine what a GPU is we created a brand new breed of GPUs a brand new species of GPU we call it a tensor core GPU a tensor core GPU computes in two fundamental ways and it's fused in one architecture the traditional way of computing is to understand cause of a physical phenomenon and the effect from the effect learn to cause from the effect figure out the cause the scientific method and from that scientific discovery you have a simple way of predicting or simulating the outcome based on the inputs transfer function of a well known equation maybe this Maxwell's law maybe is Newton's law maybe Schrodinger's equations all based on first principles laws of physics through the scientific methods we discovered what was the cause what's the effect simple set of equations through very large supercomputers we can predict outcome from the ink from the input whether it's fluid dynamics methods or finite element methods all of them are based on fundamental first principles of physics there's a new approach today where there are problems we simply cannot where we simply do not yet know what is the cause we only know the effect and we have a lot of data for it maybe it has something to do with the restaurants you like to eat maybe has something to do with the books you would recommend somebody to read maybe has something to do with the music that is the most popular large social effects have no fundamental root cause very difficult to write a simple equation and machine learning methods where you discover where you discover the features and the important structure from data through an enormous amount of data you will discover a way to predict the outcome deep learning these two forms of computing will not only solve different problems in the future in the future that will come together to solve similar problems new problems will be solved by unifying these computation approaches the unification of this computing approach needs a new type of processor and so we invented Volta the world's first tensor core GPU it is a multi precision multi architecture approach and it has the ability to process using traditional scientific methods F P 64 F P 32 as well as new inferred methods of computing probabilistic methods of computing where the precision is a combination of P 32 F P 16 and one of its most primitive operations is a 4x4 linear linear algebra methods 4x4 matrix multiply and accumulate and we can do that at incredible rates Volta is the world's first processor that unifies all of these methods into one architecture and we call it a tensor cord GPU fusing HPC to traditional scientific methods and the new methods of deep learning machine learning probabilistic methods for inference for categorization classification grouping regressions predicting the outcome of the future in a world that is too complex and not based on first principle laws of physics okay and so these two approaches have been fused together into our new GPU called Volta tensor tensor core GPU but the world wants even bigger ones in fact Volta pushed every limits of physics it is as large of a chip as anybody on the planet can manufacture I still remember the first day we started talking to TSMC about Volta the yield would have been zero chips per wafer and because of all the high-speed memories and because it's 3d stacked the yield of a module would be exactly zero and because we have to connect all of them together and operate them for hours and hours and hours because it's a high performance computing problem the fit rate is instant fit rate failure in time meaning it never works when you combine all of these factors together Volta was an impossible machine and that's why we thought it would be perfect to work on it and so all of our teams dedicate ourselves for several years and we created the world's most advanced GPU the most advanced processor we call a tensor core GPU and then as soon as we launched it everybody told me told us can we have it bigger and so we said why don't we make it even bigger and so we challenged ourselves to create a new programming architecture a new switching architecture and we made it possible to connect 16 Volta tensor core GPUs together all at once here's the challenge many researchers would like to do model parallelism there's model parallelism and data parallelism model parallelism is to take a very very very large model and break it up into small pieces and to have a whole lot of processors working together to simulate that model which means that every single processor has to coordinate with every single other processor there's the other type of processing which we do very very well but it's data parallelism one model that fits into one GPU 16 GPUs or a thousand GPUs all running different data but on one model one GPU one model as many different types of data as you like data parallelism model parallelism there's no shortcuts you simply have to create an architecture that when all the processors are synchronizing with each other it continues to scale and so we created a new type of computer with a new type of switch and this new switch called env switch has the ability to transmit the programming protocol the memory protocols and the semantics of memory all the way through the bus so that everybody all of the other GPUs look like one GPU and everybody's memory looks like your memory connecting all of this together unfortunately requires an enormous amount of bandwidth because these GPUs are so fast they get work done so fast they're constantly communicating so the communication overhead is very very high well we went off to try to solve that problem we created a brand new system called an MB switch and together as a result our GPU this virtual GPU the virtual gigantic GPU has 512 gigabytes of memory there's like a frame buffer today for a gaming card is like 2 or 3 four gigabytes 512 gigabytes of memory high-speed memory all working together 80,000 processor cores in this one virtual GPU and has two petaflop of computing all pretending like looking like one GPU it's made out of a switch a brand new type of switch we call it the MV switch it's 2 billion transistors has 3 has 18 lengths basically 3 MV link ports it can connect up to basically two GPUs and another switcher or three GPUs we use six of them on top and six of them on the bottom to connect all of eight which basically says every one of the GPUs can talk to every one of the GPUs simultaneously every one of the GPUs can talk to every one to GPU simultaneously at a bandwidth of 300 gigabytes per second 10 times PCI Express okay it's basically like every single GPU has 16 PCI expresses that are 10 times its bandwidth connected to each other so that everybody can talk to each other all at the same time it's built out of 12 nanometer TSM sees a process 2 billion transistors the fastest series that I know altogether each chip has 900 gigabytes per second if what is called cross-sectional bandwidth the aggregate bandwidth that crosses that chip at one time 900 gigabytes per second and multiplied that across all 16 basically what happens is let's see let's do some simple math let's see nine hundred gigabytes per second a movie a movie is like 50 gigabytes per second so we could move essentially across the backplane 54 came excuse me 304 k movies in one second 300 movies none of us can name 300 movies we can name put 300 movies on this back plane and it will be transferred to each other like that okay kind of crazy but in but by doing so we've now created one virtual GPU and a programmer can program this one gigantic virtual GPU let me show you what it looks like it looks like this this is the world's largest GPU it can even play a game the world's largest GPU 30 terabytes I already told you 50 gigabytes is one movie so 4 times 30 ok so 30 terabytes so 4 times 120 movies 4k movies could fit into the non-volatile storage PC I switch to connect the GPUs to the CPUs one and a half terabytes of system memory a regular PC is about 32 gigabytes 32 gigabytes so it's a few hundred pcs of memory system memory two of the highest performing Xeon Platinum's and each one of the trays each one of the trays are eight GPUs and it's connected through this backplane that connects the top tray and the bottom tray all MV switch protocol all running at 300 gigabytes per second per GPU for every connection as a result you have to pet of two peda flops 512 gigabytes of HP m to 10,000 watts 10,000 watts this is what the world's largest GPU looks like 350 pounds ladies and gentlemen two of me [Applause] so this I was gonna say this weighs as much as two of me but I think I'm being too generous to myself three hundred and fifty pounds ten thousand watts to petaflop s-- one operating system one operating system this is the fastest single node computer humanity has ever created one operating system one programming model programmed this as one computer this is like a PC except it's incredibly fast okay what you're looking at this is this is also fun you know you know what our invidious creative department did do you guys know stop-motion animation you moved it something a little bit you take a picture you move it again a little bit you take another picture stop-motion animation it's like back in the good old days claymation okay and so so art our creative department instead of rendering it it's so beautiful it won't we blow it up stop-motion animation you want to see it again it's so great your watch only in video people get so excited about this right we have a fun job but you know what's amazing is when you look at when you look at this entire stack let's what look what happened okay we took a new GPU called Volta we added 32 gigabytes to it so now it's the largest graphics card graphics chip in the world and then we connect it we create this thing called env switch we connected 16 of them together into one virtual GPU we have to create a whole system architecture brand new operating system and create the entire stack of software on top of it but what's the benefit the benefit is amazing the benefit is this look at this literally six months ago six months ago Nvidia announced dgx one or one year ago we announced ejects one six months ago we upgrade it with 32 gigabyte Volta and this is all of our software six months later we created this whole new computer the entire stack has been updated everything has been updated everything enhanced new technology invented as a result we improved the training speed of deep learning by a factor of 10 in literally six months there's a new law in town you guys didn't understand that joke yeah there's a new law there's a new law in town this new law of computing says if you are able and if you are willing to optimize across the entire stack the performance optimization the performance improvement you can achieve is incredibly fast but you have to know how to operate across the entire stack one stack at a time and so 10x in just six months does that make sense well Ilya at open AI he's the head of open AOL yet open a I needed 300,000 three hundred thousand and five years three hundred thousand and five years okay so this if it's ten in six months five years is ten periods right ten to the tenth if we can keep going like this computing is going to see no boundaries there is no end to the future of computing you just can't do it the same way anymore you have to do it the gpu-accelerated method 10x in six months DG x2 is three hundred and ninety nine thousand dollars three hundred and ninety nine thousand dollars [Applause] three hundred and ninety-nine thousand dollars this is the highest price graphics card and Vidya has ever made three hundred and ninety-nine thousand dollars is available in q3 but the value is incredible you guys how pingyao so show a so show game did I say that right give that home I say so you think only three hundred and ninety-nine thousand dollars what an incredible value let's take a look all right so if you had to compare this computer you need to compare this computer alright this computer this sexy computer right here you could take this computer to bed it will fit nicely I love this computer so much I will have one okay 399,000 look at this it would take this many computers 300 dual CPU servers three million dollars three million dollars a hundred and eighty thousand watts are you ready for this point [Applause] here take another look look at that before after oh so sweet and you save so much money you save two and a half million dollars in fact you will save more money on the cables in the back of that computer than to buy this computer so beautiful number pou hua pyongyang ee so good ladies and gentlemen the world's largest GPU the dgx - incredible value $399,000 and it will be able to run models models and dupe model parallelism like nobody has ever seen before now let's compare against Ilyas chart of 300,000 times shall we so in a few years ago just six years ago Alex Hirsch F ski Alex Khrushchev ski at the University of Toronto working for geoff hinton as as his advisor created this brand new network called Alex net and he trained it on two and Vidia GPUs two of our high-end and video GPUs and it took him six days he submitted his result and won the world's leading computer vision contest without understanding computer vision he became the world's leading computer vision researcher by simply not by simply but by inventing a new way of writing software called deep learning time to Train Alex net was six days six years ago it is now 18 minutes 18 minutes five hundred thousand times in fact the reason why deep learning has advanced three hundred thousand times is because NVIDIA GPU accelerated computing platform has advanced approximately the same time they used every drop of power performance that we delivered every step of the way and every neural network breakthrough still takes about six days so today you would get a DG x2 and you would run an amazing neural network model to discover the next future software and it will still take you six days that's why it's so vital for all of us to continue to push the limits well pushing the limits pushing the limits and driving everything to the speed of light has enabled us to make great contributions to deep learning and today if you look at our platform today I want to announce something it's really quite something we're announcing today that we have achieved five speed records the fastest single chip for training this is training resonant 50 much more complex than Alex net ok this is today's modern state-of-the-art computer vision network resident 150 single cheap single chip record single node record on dgx to 15,000 images per second for training the fastest training time at scale give you all the computers in the world how fast would it take you to Train resident 50 14 minutes once you're done developing the network you have to run it in operation just like you write the software in C++ but you have a dot exe the run time the wrong time is right here this is the run time of deep learning it's called inferencing okay the run time of deep learning the latency is really important because when you're doing speech synthesis or when you're talking to you hello and you talking to a neural net work and that neural network has to respond pretty fast the latency of a neural network is really important this is a speed record one point one millisecond people have always said that NVIDIA GPUs have very high throughput but the latency was a problem that's just because they don't understand our GPU our tensor core GPU one millisecond no CPU could touch it no FPGA could touch it no as ASIC could touch it one point one millisecond and then this at the hyper scale data center level because you have millions of people who are querying the internet and querying the data center and running their application each one of those applications will touch many neural networks in the future so the throughput of the data center will directly affect the number of customers you can support the quality of service you can provide them and very importantly the cost of the data center and so the throughput is incredible just to put this in perspective a CPU is four not thousand not hundred just four images per second we know that she'd six thousand images per second and as a result of all of this the more you buy the more you save that's right because in the future as you know in the past when a microprocessor performance goes up Moore's law is about decreasing the cost of computing Moore's law is about decreasing the cost of computing when performance goes up cost comes down there are cousins in the future the new law is data center performance and if the data center performance goes up the cost comes down and we are going to drive performance in the data center up like nobody has ever seen before and the reason for that is because obviously we could drive costs down money for everybody but more importantly because I believe that the number of applications and the type of software that machines will write in the future will be gigantic and we need the performance so five speed records the type of newer networks that are being developed all over the world is just amazing and so we thought we would show you one example NVIDIA has one of the great deep learning research teams and kweilyn has been so gracious to come to Taiwan for the very first time and he's been doing research in a type of neural network that's really really clever and and so so this new this form of neural network not only recognizes your speech but it can predict it could synthesize it could generate images almost at superhuman levels okay so quaintly let's take a look at that okay okay so here we're going to show you left and right let me set it up and then you can show us okay left and right these two pictures are the same cornelian is going to modify the one on this side so he's going to modify the one on this side and you will see the result on this side okay now of course he's going to modify it over here and you can decide whether the modification is almost superhuman or not all right go ahead okay so let's try to remove the cable yeah cables so ugly let's get rid of that okay I also try to remove a oh come on okay really we need to keep some because we need people to know this is Taiwan okay let's also try to remove the yellow taxi automatically automatically it figured out what it should look like and it drew based on what we would have drawn okay pretty amazing superhuman levels let's get let's get it rid of this car this car is uh running a red light let's let's okay let's protect that guy oh come on see I gave you a trick one because that light that car had a light right okay yeah I gave you a trick one that's a totally moon really mind no way no way you could do that see I could trick I could trick scientists [Applause] ladies and gentlemen before after so good right in krub let's do another one let's do it let's see if we can find the limits of your artificial intelligence network okay that I got let's show that before enough it that's so good okay oh this is uh yeah come on this is so beautiful get rid of this thing get her this pen why is that building there doesn't make sense oh so beautiful well what get get rid of get rid of this thing get rid us Oh add more forest I want more forest can you give me more forest I want more forest Wow that's incredible let's do another one let's do another one that's beautiful before-and-after look at that this is 2017 and this is 1800 okay let's do another one Carlene let's go to another one oh I love this one I love this hotel but that light pole is terrible let's get rid of it there's no way you could do this one because this light pole is in front of a tree and in front of a hotel how could you do it okay let's see if he can get me the loophole wow that's amazing and so so this new network this new network is has learned how to generate images it looks at all of the surrounding pixels and it says what would what would be the right images to replace it with okay it's essentially what we would do we would imagine what's the surrounding pixels and what's the image and we would we would try to predict how to replace it Wow this is a Taiwan before cars before electricity what do you guys think how it's homing on artificial intelligence let's do another one let's do the next let's do last one let's do one more ok let's do one more that's not me I'm younger ok let's make let's make anything look a little younger ok yeah and and uh make him lose a little weight I've seen a lead lighter before suicide what did Sheila Oh Angley was very attractive huh he's very attractive look at that oh that's perfect Wow okay good set set translucent on so we could see before and after we took we took ten years off his life look at that look at that that's incredible okay Krillin I want you to go through all the internet photographs of me and fix it okay okay all right good job you guys ladies and gentleman people are doing amazing work and artificial intelligence all over the world here in Taiwan we're seeing some really great work done as well Foxconn is training a neural network to inspect to do visual inspection looking for manufacturing defects and you know how many things that they manufacture and at what speed their manufacturing yet and so the ability to be able to detect the slightest anomalies in using visual inspection using an artificial intelligence network will help them improve their productivity and prove their throughput and reduce cost health care doctors at the CMU H the China Medical University Hospital is detect developing a neural network to detect cancer metastasis at the National Taiwan University they're developing a neural network to segment organs looking for potentially organs at risk okay public safety in AI labs AI Labs is working with tight the taina and city government apparently there's just a whole lot of bridges 1,600 bridges I had no idea 16-hour bridges in Tainan and Tainan and they because of because of the typhoon there could be potential structural damage they teach drone how to fly around and look for structural damage the tower in city government is working on autonomous vehicles they would like every closed path vehicle constant path vehicle to be autonomous in the future it makes perfect sense okay so so everybody as you can see whether it's in healthcare in manufacturing in smart city you're seeing all kinds of really great work being done in AI it all started six years ago with the LX net Alex Khrushchev Khrushchev ski I mentioned before it was only eight layers deep eight layers deep and about a million parameters a million numbers this met this this body of computational graph was eight layers deep about a million parameters in just that short time and this one network ignited the revolution of modern AI it was able to solve a problem that is essential for artificial intelligence for software to interact with the real world by discovering the important features of a large amount of unstructured data automatically the front layers the convey the convolution neural network layers was able to discover the essential patterns that we classify an object and it did it automatically you don't have to write any software you just have to give it a lot of data and needs a lot of computers supercomputer to be able to discover the interesting patterns out of information by itself it's like you and I being able to discover patterns out of experience we live in experience we do something and over time you said hey this pattern exists and it keeps emerging and then we generalize that pattern into knowledge otherwise known as intelligence the ability to recognize features recognize important patterns create the structure of the data of the patterns and generalize it hierarchically into knowledge revolutionized modern AI from there it just took off look at that people thought that CNN was the beginning in the end it was just barely the beginning since then Cambrian explosion just literally every single type of neural network has happened CN NS not only has it gotten more complex different layers are all created our n ends to recognize sequential patterns CN NS recognize spatial patterns RN ends recognize sequential patterns the combination of the two does amazing things as well you also have generative generative adversarial networks a neural network that's essentially two network one network trying to generate an image let's use image as an example one of them generate an image generate a sound or generate a story okay one of them is generating the other one is testing the output of the generation to see is it good is it real is it good enough and the two of them are competing with each other that's why they call it ever serial it's generating it's a generation Network but it's adversarial and as a result when both of the networks become super good at generating new ideas the other one testing to see if it's good or real the Gann generates outputs that are basically superhuman and so generative adversarial networks reinforcement learning is like the way we learn trial and error try this try that when we're learning how to walk we use reinforcement learning we try it and if we do a good job we get to get a reward if we do a bad job you get a disincentive and so you tried over and over and over again this is how we're gonna teach robots how to be good robots and then there's all kinds of new species that are still coming out in just six years time thousands of neural networks have been invented and the complexity of it is incredible a lot of people ask me this the GPU is fantastic at training the performance that you have is incredible at training but what about inference what about inference in fact we know this the inference problem is even harder than the training problem and the reason for that is because in a large-scale data center now first for a little tiny device for an appliance like Assyria or a smart speaker smart microphone it's very trivial but for a large-scale data center for a high-performance datascape data center the type of networks that you're running into this inside this data center is very diverse and they're all very complicated and they're growing constantly and they're getting more complicated over time I created a simple way for you to think about inference inference is like plaster okay are you guys ready plaster the P stands for programmability in order for the data center to be able to support all kinds of neural networks that are being created by all the engineers inside and outside your company that data center cannot only run CNN's that data center has to be programmable so the first thing you need programmability the second thing is you need a processor that has very very low latency as I mentioned earlier the speed record of latency 1.1 milliseconds the latency directly translates to if you say okay google you wanted to come back instantaneously when you say ok Syria you wanted to come back instantaneously you want the result you want the inference to happen with very low latency third accuracy you could sit here and return answers as fast as you like but if the answers are wrong is very frustrating so the accuracy of the network is very high you need to have you trained it with 32-bit floating-point the question is when you run it in the data center what kind of precision are you willing to enjoy the accuracy has to be programmable third is size the smaller the network the more you can put inside the data center the size of the network has everything to do with the speed and also the power of the data center t4 throughput ultimately you have millions and millions of people they're sending queries of all kinds or taking pictures they're asking questions or talking to their computers and now in the future you're going to have all kinds of smart microphones everybody's going to be talking to everything they're all going to be connected to this hyper scale data center and sort of throughput of the data center is directly related to the cost of the data center energy efficiency 1/3 of the total cost of operations is energy and then lastly rate of relearning the rate of learning the thing that we know about software is that software needs to be fixed there are always bugs they're always something that it didn't recognize properly or something that it's not doing properly and so once you deploy the neural network into operation as quickly as possible you're probably going to have to fix it and so the rate of learning and redeployment into the data center is really important if you have to change the design of the ASIC every single time imagine your rate of learning imagine your rate of fixing bugs and so plaster programmability latency accuracy size throughput energy efficiency rate of learning there will be a quiz at the end of the keynote okay the first quiz is if I say the more you buy the more you say that's the right answer the second is plaster what is the P stand for programmability very good alright good students all the world's best students right here in Taiwan in order to serve this market in order to allow all of these complex neural networks to now go into the data center we have to solve several things the first thing of course is that the type of neural networks is really quite diverse you got speech synthesis speech recognition speech synthesis you have natural language understanding what did you mean you have natural language translation translating from one language to another the universal translator in the future you'll talk to it in Chinese it'll come out the other side in English so when you're making a telephone call you could talk to anybody in any language okay so natural language understanding language translation when you make a recommendation when you buy something and make a recommendation for you what rect what kind of news would you want to read what kind of movies would you like to watch those kind of recommendation systems recommenders okay and of course speech synthesis recommendation not only ok image video ok and it could be video could be captioning the video it could be detecting inappropriate video and so all of these type of content is coming into the data center and we want to be able to run all of these different types of neural networks models well these models the output of a training neural network framework is a gigantic it's a gigantic file it's not a source code it's a computational graph okay this computational graph has weights and biases and gradients after you're done training it has a whole bunch of weights and so all these different weights have they're connected to different nodes and they're massive there are all kinds of different types of layers what we need to do is we need to take the output of that training output of the training and make it run plaster it has to be low latency we have to improve the performance we have to reduce the accuracy wherever we can but protect the accuracy wherever we must get rid of unnecessary nodes compress fuse remove layers and so the optimizing compiler is studying the graph and creating a new graph the output of the graph of a framework and the graph that we run are fundamentally different but it has the same result and so that compiler is called tensor RT a new type of optimizing compiler whereas we used to have C++ compilers we now have newer Network compilers okay and so in this newer Network compiler is called tensor RT it's a big piece of software that is very very difficult to write a lot of groundbreaking work that we're doing here we're supporting more and more types of networks and graphs we're supporting more and more type of NVIDIA GPUs from the traditional CUDA to int ate a bit integer to tensor core and we're supporting more and more compiler techniques at GTC we're announcing that this generation we have four new features tensor RT 4.0 has now been integrated google has integrated 10 so RT 4.0 into tensor flow so that it is a native output compiled tensor flow called e which is the world's most popular natural language framework called e has been optimized for TR T as well and onyx which is the export format of Pi torch and MX net is now accelerated by tensor RT and Wynn ml which is Microsoft's format also accelerated by 10 so RT the result of it is pretty amazing whether its image or video 190 times natural language understanding 50 times recommendation systems 45 times speech synthesis 36 times speech recognition 60 times basically what's happening here is by recompiling the software that you're running in your data center the neural networks that you're running in your data center and you add a tensor core GPU into your data center you could save just an enormous amount of money let me show it to you but before you could run it in the data center you have to do something else the hyper scale data centers infrastructure software is very complicated there's a piece of software called kubernetes which takes all of these different accelerated stacks and it container eise's and we containerize it and videos GPU is the only accelerated processor that has a docker container associated with it the only accelerated docker so NVIDIA GPU containers which is accelerate inside in with docker can now be deployed as a container you run the container on top of kubernetes and you could scale it out into all of the resources within the data center in fact even outside of the data center and as more workload happens more servers are spooled up as the workload disappears the servers are released and if anything were to happen to a server it moves the job somewhere else so kubernetes is the hyper scale if you will operating system kubernetes is now GPU accelerated so if you look at this entire stack from the GPU to all of the api's and libraries that we created that you can now accelerate applications to put into a docker into a container which then runs on top of kubernetes that entire software stack is so complicated it has been the work of hundreds of our engineers for several years let me show it to you okay this is a CPU doing image recognition with ResNet one 172 right yep okay this is Red's net 152 the most advanced image recognition network out there today the skylake CPU can do 4.5 4.5 images per second not bad pretty amazing it could recognize flowers of all these different types let's stop at one right these flowers incredible garden phlox I bet no one has ever said the word phlox before if you told me what's a garden phlox I have no idea Snapdragon is not that snapdragons a chip thorn apple thorn apples an apple not a flower Lenten rose so none of us would have passed this test this neural network can detect flowers at superhuman levels none of us would have been able to pass this test impossible but yet a neural network running on a CPU can do that millions and billions of people could take pictures or do whatever they want and look for a particular object they want to buy send it up to the cloud and the cloud will say that is a brand new leather jacket okay and it will tell you what to buy it and so so imagine all of the people that are using neural networks in cloud and so we last year last year last year we were here and I showed you the state of the art the state of the art in inference performance and it was 580 images per second this is four and so when I showed you 580 images per second the audience went Wow okay that was one year ago but as you know we have many engineers and they've been working very hard at GTCC a few months ago a GTC a few months ago right are you still with me yep this is GT c1 GPU last year was 580 at Jeep GT see a few months ago 970 on 1 tensor core GPU 1 GPU 1 GPU is now 200 well 250 times faster how could you add something to the most advanced CPU in the world and run 250 times faster if you think about that for just a second and as a result the amount of throughput you can enjoy in your data center but see that's not enough we have so much more architectural mojo magic' that we are going to show you our latest results ladies and gentlemen the brand-new tensor RT running on our Volta tensor core GPU 2,500 inferences per second now if we make if we make if we make this go any faster Nvidia boys watch in law that's out mighty quite a hollow but thank goodness the world's Internet is consuming so much machine learning capability and our job is to simply drive the cost down as fast as we can and improve the performance as fast as we can and so all of the software and the layers and layers and layers of software that goes on top of our GPUs making this possible so this is tensor our t4 on tensor core GPU but that's not enough we can do more and so we accelerate we finally reported tensor core excuse me kubernetes accelerated with our GPU we call kubernetes on NVIDIA GPUs K Ong kubernetes on NVIDIA GPU okay you cannot ko ng Kong yeah Kong and so Kong let's show let's show them come sure so imagine you need more performance so let's step back and see what happens thank you so in this case our load has increased so now we need more compute we need to bring in more processors Tuukka to computing so let's add those in there we go so we brought four new v1 hundreds online and so just now all of a sudden all of your customers showed up and you're so happy right but if the customers show up and you're so happy you cannot send IT manager to the data center and add more GPU this will naturally scale by itself kubernetes will simply scale it up okay so for GPUs now each GPU Wow Wow look at that that's all customers [Applause] you can have four customers or you can have 10,000 customers this is much better right chels it's one chimba Joe quite okay and COO Denny's doesn't just scale for your on-prem datacenter you can also burst out into the cloud so if you need to add more capacity I'm gonna add four nodes from AWS you'll see them pop in and we can add those into the load balancer and we're gonna see another doubling of performance pretty amazing now the amazing thing is this look at this one GPU is in Amazon and these GP the for GPS are in Amazon these for GPS are in a dgx computer in Santa Clara and now we can run all of this work live completely invisibly and scale it up all running kubernetes GPUs ok thanks a lot Ryan good job ok so so I talked about high performance computing hyper for 10 billion dollar market we have a specialized acceleration stack for all of the high performance computing and supercomputing applications molecular dynamics weather simulations particle physics fluid dynamics physics simulators quantum chemistry you name it ok and we also have all of our AI for deep learning which is training all the different frameworks we support every single framework in the world now we have an accelerated stack for hyper scale inference for hyper scale inference you take the same GPU you can now run it into the entire data center the world's data center in the cloud is about 30 million servers today I believe every single node will be accelerated in the future every single node will be accelerated and the reason for that is the value proposition is too compelling from 4 images per second to 2500 images per second the throughput benefit is so great adding a GPU we simply need compiler so complicated tensor RT 4.0 running on top of CUDA running with qu DNN we have all the necessary infrastructure software develop on top of that container software docker running on kubernetes accelerated by GPUs this entire stack is fully accelerated the number of customers jumping on this platform is really quite exciting we have customers in United States and China we have Internet companies we have people who are doing smart City we have people who are designing cars applications of all kinds okay so the third market AI for training AI for inference high performance computing today we're announcing we're announcing something new these servers are really hard to design their supercomputers are hard to design and they're hard to deploy the architecture is complicated it's very high performance the amount of power density is very high notice we took several hundred thousand watts and reduce the down into one or two rats so the power density is increased the compute density is increased and so the server designs are complicated we've been working with all of the leading server companies in the world they happen to be all right here in Taiwan and so today today we're announcing a brand new architecture for future hyperscale data centers we call it the nvidia hg x - let me show it to you all right this is this is the hgx to seven uh can you guys see this this is 350 pounds I'm very strong look at how many tsmc transistors anybody from tsfc here a hundred plus billion transistors all right here six eight eight GPUs connected by 6mv switches the envy switches connect each one of them one to each other one to each other one tree and then all the back all of them to the all of the others okay so 300 gigabytes per second of communications between every single GPU every GPU is about 300 watts 300 times eight and the MV switch protocol comes in through the back and so this is this is basically as you know the ATX motherboard revolutionized the PC industry and just as Taiwan was at the center of the PC revolution and at the center of the cloud computing revolution we are gearing up together to start the AI revolution and the type of computer is different for each type every computing revolution required new computers so you remember the power of the PC 80 and the ATX motherboard the standardization of that motherboard accelerated the velocity of computing tremendously this is a hyper scale data center motherboard standard and we will ship literally together an entire motherboard just like this and this is incredibly heavy oh here we go Paul yeah be careful that is a two hundred and two hundred thousand dollars and they're and they're connected in the back by this high-speed backplane which connects all of the MV links directly into this and two motherboards to give jr. and then you clamp it in supercomputer with just three motions one two three okay the future of computing made easy well shaza you have to be a good athlete to be a CEO Wow okay alright so so we are at the epicenter of the future of the AI revolution and our partners I'm so grateful for our partners in the last 10 years in the last 10 years this country produced 100 million servers 100 million servers for data centers in club representing 90 percent of all of the servers produced in the world it's produced right here in Taiwan the world's leaders the world's leaders have all adopted the NVIDIA GPU acceleration architecture from quanta the Foxconn we went i n invent tech as rock Acer gigabyte a sous with your partnership anybody who wants to use this future way of fused computing with HPC high-performance computing an AI can it doesn't matter where they are it doesn't matter what industry they're in doesn't matter what markets they serve what data centers they run their operations from we have servers of every Singh kind I want to thank you for your partnership all the engineers that work with us on making this revolution possible thank you we've now have we now have with the HTS to a growing family of servers look at all the different configurations we have HT xt-1 4 4 8 GPUs 2 CPUs training the new HDX T 2 for training and for supercomputing HDX I one for inference HDX I 2 for inference so depending whether you want one Houston's low powers or you could aggregate more computing into a node you have the s CX e 1 s CX e 1 otherwise otherwise I call I call her sexy one sexy two sexy three and sexy 4 ok these are all of our all of our different types of servers and notice all the different applications that they serve from different configurations different power levels for different rec power density they could be used for training inference high-performance computing for Smart City for VDI Nvidia has a full software stack for grid we call a grid and quadrille virtual workstation that allows you to take all your PC's and put into the data center VD on remote workstation and for rendering just about every way you would like to enjoy the future of high-performance computing is represented here in Taiwan and so I want to thank all of you for your partnership that's really fantastic thank you [Applause] every server maker is adopting the architecture every IT company has from IBM to Dell HP that Super Micro and many others every single market weathers enterprise the cloud of all industries of all countries are represented we can now take this computing model to literally anybody and everyone okay so widespread adoption of IT companies now I just described the architecture the system architecture the interconnect but what about the software because this how are you going to get the software the software delivery is one of the most complex things because there's so many different layers the operating system the container the container is a ssin all of the api's the libraries and the accelerated stacks are all so complicated you saw all of the different version numbers 5.0 9.2 so many different versions how do we possibly deliver the software to the marketplace there are so many fragments and markets from Life Sciences to deep learning to people who are designing cars to manufacturing to smart city how do we possibly deliver all the software so we created a new way of software developer we call it the NGC cloud NGC cloud is a registry it's simply a storage of all of our software in container formats they've all been containerized every single layer of the software has been accelerated has been tuned has been tested we can reproduce the results it's all up in NGC cloud 20,000 companies have come to download NGC cloud because the acceleration software stack if you can get the a really wonderfully tuned version the performance is so fantastic ok and so the GPU container is now contained all of the deep learning frameworks a whole bunch of the most popular high-performance continued computing codes visualization codes and we're going to keep on going there 30 application 30 containers in there today we've certified it on a number of clouds AWS Google's cloud LD cloud Oracle's cloud and also for the DG X and we're in the process of certifying systems of all different kinds and so anybody who would like their systems certified just contact us and make sure that we prioritize it GPU containers so from the architecture to the systems to all the software we're putting it together so let's do a bit of a status report so initially I said I was going to fill out this schematic okay this is this the first part has to do with the accelerating computing stack a brand new type of processor called the test record GPU MV switch a new type of system called DG x2 and the entire stack on top of it 850 thousand developers 550 applications without killer apps a new computer is useless and then without the ability to deliver the architecture and the systems to the marketplace of course people cannot enjoy it we have all the leading server companies in the world all the IT companies in the world and every single cloud has invidious architecture inside this has been the work of 10 years it has taken us a decade and billions and billions of dollars to get from there to here creating a new computing architecture creating a new computing platform that everybody and anybody in the world can use has not happened since PC has not happened since AWS and it's happening again and so creating a new computing architecture is incredibly difficult and it's taken us a decade to get here all of the software will be delivered in our registry it will run on any cloud it will run in every cloud and it will run in every datacenter I talked about two markets AI for training AI for inference and we started with supercomputing now I want to talk to you about four new markets for new markets if the AI market alone the AI market alone is the future of software there's not one company not one industry that's not going to be affected by it high performance computing is 10 billion dollars computer graphics has been the driving force of GPUs since the beginning of our company computer graphics is really very unique application it is both computationally incredible as well as the volume is great but we up to now have not served every single segment of computer graphics the segment that we have never served is called CGI computer generated images photorealistic images that are used in several applications this is all rendered on CPUs today they call them render farms games every single game that looks beautiful is partly rendered in real-time but partly the global illumination I'll show that to you in just a second the global illumination tends to be pre-rendered it's very complicated to do and it's baked into the textures movies several hundred thousand frames per movie say 300 thousand frames per movie each frame of the movie takes several hours to render so just rendering the film from the beginning to the end one time will take well hundreds of millions of hours and that's the reason why supercomputers are created called render farms to go render to movies every single car advertisement you see today they're not taking the car and flying into Paris flying to Shanghai fly into Taipei they're not doing that it's all computer-generated images and so it has to be photorealistic the buildings of today are no longer square a building today is like a product except it's gigantic it's not a square box and so as a result the only way to understand the feeling of it the dynamics of it is to render it photo realistically every single one of them are rendered that way we would like to take that rendering in fact and put it into virtual reality so that you can enjoy photorealistic rendering in real time these things are simply not possible today ladies and gentlemen after ten years of R&D literally ten years of R&D trying trying prototyping trying prototyping quietly in our labs we're going to announce this year the most important breakthrough in computer graphics in my opinion since the beginning of computer graphics and this is going to be the greatest contribution we've made to this industry in the last 15 years since we invented the real-time programmable shaders that revolutionized modern computer graphics it's called the nvidia r TX the nvidia r TX fuses three fundamental methods of generating images one the GPU architecture has been revolutionized as I mentioned before a brand new type of real-time computer real-time computer-generated graphics second real-time ray tracing and third deep learning artificial intelligence by combining these three technologies we've been able to achieve something that we thought wasn't going to happen for another ten years we've been able to achieve real-time film quality rendering let me show you what that means this is what g-force renders it's beautiful as you guys know that if not for the programmable shading technology we invented modern computer graphics would not look like this this is through just an enormous amount of R&D that has allowed us to get here and this is just beautiful this is modern computer graphics and yet when you compare this to photorealistic images this is what Nvidia r-tx looks like can you guys see that you want to see it one more time before state-of-the-art real time GPU technology before after let me break it down a little bit for you so here's one so ambient occlusion ambient occlusion is the surfaces the-the-the the intensity that is reflected on the surface of a materials from the ambient light and the ambient light is coming from every direction unlike spotlight which comes from one direction it's easy to compute ambient light come from every single direction and the notice that the shadows in the creases this shadow in the creases comes from ambient lights it's called global illumination and that's called ambient occlusion the part of the scene that has low intensity from the ambient light and so here you can see this right here is not a shadow its ambient occlusion it's very difficult to do let me show you what ray tracing does look how beautiful that is it's like grape juice Chateau Latour wine grape juice great wine pot stickers ding Thai phone ok you see this look at this look how beautiful that is look at it today can you see that subtle subtle little things but when I put this in motion you can all see it ok so that's called ambient occlusion you need global illumination for that reflection and refraction for a translucent material light from all the different sources passes through the material and some of it is reflected some of it is refracted ok and so this is traditional methods we use a technique called screen-space refract and refraction which is very advanced and we invented it but this is the better way of doing it you see that our TX GTX RTX and as a result you get this effect called caustics naturally when you have light sunlight over a swimming pool you see that the the beams on the bottom okay that's called caustics light comes through and bends through the curved surfaces and they collect the the light intensity collects in a particular area create this beautiful effect and it does it all naturally using ray tracing this is called subsurface scattering light goes through to a material it goes into the surface it goes underneath the surface in a material it bounces around and it comes back out in all kinds of angles in different than incident angle of the light and so as a result it looks like Jade it looks like gummy bears it looks like Jade milk has that effect the human skin is not brown it has subsurface scattering this is what subsurface scattering looks like with ray tracing you ready okay so this is before Oh yummy gummy bear not so yummy gummy there yummy gummy all right so all this technology all this research is so that we can make yummy gummy but as a result computer graphics and computer-generated images can be completely photo real let us show it to you now Geoffrey he's going to show you the work that we've done with a partnership between us and Industrial Light and Magic and epic what you're seeing right now is completely done in real-time this is not a movie this is computer-generated images in real-time running on four Voltas you think she hurt us for once [Music] nvidia r-tx technology okay so hey Geoffrey let's quickly go through this one time so this is all completely in real time as you know this is a technology called area light light that's coming from every single directions very hard to do notice the shadows notice that light go ahead yeah we're changing the ambient lighting right now you'll see on the reflections on the helmet on the shoulders that it changes with the bytes source hey Geoffrey Geoffrey yes you can you hear me yes I can hear okay so I just wanna let you know that that unfortunately because I took too much time you don't have any time got it okay so so so it's my fault it's my fault why is it cuz you're quiet you know they went nobody's watching don't worry okay okay go alright so notice look how beautiful the light is look how beautiful the shadows are they're not all crunchy and and it looks fake look how soft it is and it's all dynamic and the shadows automatically castle on top of each other okay let's go into it let's go somewhere else this is fantastic let's see something shiny all right put it let's go let's add a new character and phasma this is all completely in real time no look look at phasma you could see the reflection of phasma on phasma and all of the that this is the the stormtroopers boss her name is phasma and so let's go down the elevator look at the reflections look at the reflections all completely generated in real time this is the power ray tracing you don't have to trick the computer graphics but you have to create a brand new technology literally took us ten years to do this look at this incredible right what video gamer doesn't want this [Applause] but with this technology we can revolutionize the way films are made okay and so how about let's stop stop the elevator stop dear let's let's go to the engine room why don't you walk out did you see the door open the door open on her chest all of this is being generated in real-time the parts of the surface of phasma that where light is absorbed it absorbs it where it needs to reflect as reflected wherever is refracted it's refracted the materials on the ground notice is less shiny but we can make a shinier incredible right ladies and gentlemen and video r-tx okay good job Jeff [Applause] and video r-tx for the film industry for the film industry we've now created this entire software stack so that it can run and synthesize film for these billions and billions of images that are being created and so let me show you what that looks like on the left traditional CPU farm rendering on the right and video RTX until now until now it was simply impossible to use a GPU to render the image because the images would simply not be as fine and as beautiful as software running very very slowly on CPUs and for the very first time we can create images that are completely photo real and yet it's completely accelerated the benefit is to the customer so good traditional render farm 280 servers accelerator and fire the more you buy the more you save that's right okay so go and tell all your friends the more you buy the more you save this this stack remember remember the basic equation we create the architecture the acceleration stack in this case the r-tx technology we work with all of the software developers from Pixar's render man to v-ray to Clarisse to Arnold and then we have a team of technical experts in the ecosystem in the world working with all of the leading publishers and leading developers and and film makers to create their film in the future with this accelerated stack and everybody loves saving money nobody I've met doesn't like to save money saving money enormous industry today completely using render farms in the future could be accelerated and the more they buy it the more they say a new industry we're going into this is the work of almost a decade as well for a decade we've been working with the medical imaging industry the earliest the best way to to to to to extend life is to detect a disease soon and medical imaging is the fundamental and the accession essential tools of doctors there's all kinds of medical imaging approaches there's there's pet there's MRI there CT MRI is good for soft tissue CT is good for bone structure you have ultrasound of course all of those methods working with the leading medical imaging companies from GE to Philips to Siemens to Kenan and Japan we have been working with them over the last 10 years to put computational methods and CUDA GPUs inside their instruments the latest generation Philips ultrasound incredible technology has CUDA inside g e--'s revolution this incredible MRI machine CT machine has GPUs inside and so where computationally accelerating medical imaging however in the last several years advances have moved so fast and there are millions of medical imaging equipment around the world that unfortunately will take decades to upgrade and so what we decided to do is created a virtual remote supercomputer sits at a data center or could eventually sit in the cloud this data center will run the entire software stack of a medical imaging system from CT reconfirms from reconstruction to applying ai for detection to visualization we virtualized the entire software stack and we created what we call Clara and this will just run in the data center and so you have the GPU server the containerized and virtualized software all of the api's and acceleration layers and the software that goes on top of it let me show it to you on this side what I'm showing to you is this the input for both of these sides this is traditional MRI machine running a filtered back propagate back propagation or projection sorry okay and so is a projection filtered back projection yes okay so this is a traditional technique for filter Bank rejection it assumes it assumes that your your body does not attenuate x-ray and but that's not true the way that they solve this problem is simply to increase the dosage of x-ray so that your body does not attenuate the signal and so that they could collect the back projection and reconstruct the image from it this is a technology called iterative reconstruction this doesn't assume a particular model that computes and x-ray model called radon model this doesn't assume that this iterative reconstruction method one beam at a time calculates estimates and corrects and over time creates this it's essentially the rate tracing if you will of medical imaging and it's a it computationally is several hundred times more than the traditional method this is what's inside today CT machines except for the latest generation ones the vast majority world CT machines like this if we could use this technique iterative reconstruction we could redo reduce the dosage of x-ray tremendously while increasing the resolution and fidelity of the reconstruction and so what we're seeing here is simply this what goes in what goes in is a Sun centigram which comes out of a CT machine the raw data of the CT machine comes out it goes into this Clara server it runs a piece of software called Astra Astra is running F filtered back projection F BB f BP f BP and this this one is running iterative reconstruction IRT okay and so let's go ahead and Mike fire it away alright so we're just gonna go ahead and even launch this since that's already been setup but what you're gonna see here on the Left versus the right is the CPU versus the GPU reconstruction and this is all happening within the browser and as the layer is calculated you can see them start to come back in so what's happening here as you know this side is doing 400 times more computation than this side and it's doing it faster this side running on GPUs is running several hundred times more computation than that and look at the results with the same amount of input from a sonogram you could create much higher fidelity images the other way to think about that is this if this is the level of fidelity that you're comfortable with you could reduce the amount of energy dose by a factor of 6 another way of saying it is this we can now use CTS for even children today you can't use CT for children because the x-ray dose is too high so the opportunity to be able to use medical instruments with computational advances is really quite groundbreaking that's why GE calls this but this particular machine with IRT revolution okay so so now that we can do that but we could do so much more this is what the old machine looks like now that we can put this on Clara we can apply AI to it and so what what mike is doing here he's applied this neural network that has been trained to detect organs volumetrically we trained this new your network called v-net to detect 3d volumetric organs I I don't know any of the organs okay and so so now we can use AI to train the organs but it still looks kind of difficult to understand so we could now apply ray tracing to render this image all right you guys look at this are you guys ready for this before after before after [Applause] without with Nvidia acceleration okay Wow okay yeah okay good job Mike thank you so seven trillion dollar healthcare industry we've now created a computer for healthcare you could use it for imaging you can use it for genomics you could use it for AI and it sits in your data centers virtualized we've turned we turn healthcare revolutionized it with modern computing once again a fully accelerated stack special libraries working with the application developers and each one of the industries and having a technical team developing it with partners safe city same strategy a billion billions of cameras will monitor public spaces in the future keep us out of harm's way keep traffic moving figure out where the energy density is the highest and divert energy from the energy grid and as a result we could create safe City using exactly the same architecture however we need a specialized stack we have a deep stream SDK the deep stream SDK takes video from hundreds thousands and in the future millions and billions of cameras comes in it has to decompress it do image processing on CUDA run inferencing on all of these different types of networks and whatever networks comes comes and then pass the metadata to the VM VMs the video management system and we work with a technical team work with developers all over the world as they create smarter cities to trillion dollars smart city market everything in the future that moves will be autonomous AI is the single greatest contribution and for autonomous vehicles in my opinion is a single greatest contribution we can make to humanity safer roads reduce the cost of transportation reduce traffic congestion keep people out of harm's way save lives and not to not to not to forget that a billion people are going to come into the world sue a billion more people and because of our online shopping habits the Amazon effect we have so few truck drivers now there's a great demand for truck drivers as we all want everything delivered to our house instantaneously we want Neronian delivered to our house we want a television deliver to our house what yo tell delivered to our house incredible and so the number of truck drivers the number of drivers in the world simply can't keep up with that it's a we believe that the answer is using AI and create an autonomous computer this this computer is a one-of-a-kind type of computer we created of course the driving car computer and I'll show you that in a second but that's the way I mentioned the software development of the future is fundamentally different the software development methodology of the future is fundamentally different you're collecting an enormous amount of data in our case we have tests test cars all over the world we collect a few petabytes of data data per car from all of the sensors per day we then run an artificial intelligence network on our supercomputer to figure out which one of the images we should label we then take the output of that AI and we took it into a label factory data factory people that are just precisely labeling identifying the objects inside the images we then have a deep learning team a deep learning team with their supercomputer and this deep learning team is creating new artificial intelligence networks and training the networks and then after you're done developing the software of course you have to simulate it you have to test it the world drives ten trillion miles per day okay and so the world drives a lot of miles however the best we can do is to simulate to test our cars if we're lucky millions of miles and so we need to find a way to simulate a billion miles or so inside virtual reality and as you as you know we have the technology to do that so simulating testing and then that results in a drive computer the drive computer looks like this hey Paul drive dry px Xavier okay so this is this is a this is a level this is a future drive computer for a an autonomous car and so if you have a branded car it would have a computer like this the this chip is called Xavier it's the largest SOC the world's ever made and this computer has to xavier z' and to Voltas on there and this is going to be a driverless taxi these two architectures is exactly the same okay and so as you can see this motherboard this computer goes into a self-driving car there are going to be hundreds of millions of cars in the future that are going to be powered by this a hundred million cars made each year trucks shuttles and then of course in order to develop the software for it give me the HDX - yeah yeah no give it to me quite Ian okay all right ladies and gentlemen this goes into the supercomputer here for training and for simulation and then when you're done with the software from here you put it in here and you put it in the car does that make sense okay one architecture one architecture the software that runs here in the software binary compatible incredible right Wow hold so now okay all right okay can you do it I bet you can't do it you're not strong enough not like me okay all right and so so our engineers are developing all the software from beginning to end and all the driving software that runs on that car we have hundreds of partners now all over the world let me show you every time I come I show you the latest latest update of our of our card this time you will notice the three stages autonomous machines perception reasoning in the car in the case of a car localization and then action in the case of a car path planning driving ok perception localization and planning you will see that we use camera we will also use lidar and as we go across these three different ones there's one more thing that we will show you this time and that's the car is going to create its own map you know when you when you visit a place the first time you're a little bit not very confident you're not sure where everything is but the second time you come you're very confident and the reason for that is because you created a map in your head our cars are gonna do the same the first time you drive drive a new place we're gonna create a map the next time you come back we're gonna use that map along with everything around all of the other algorithms I describe to in to enhance your safety okay and so mapping and then lastly I'm going to show you for the very first time location to location driving okay let's fire it up [Music] the car has so many different neural networks we're already up to ten twelve or something like that we'll have twenty before the end of the year any weather condition figuring out where all of the other cars aren't called lane assignment and does lane assignment and surround light our object detection and perception it's now mapping using visual odometry basically a camera only we can map with camera m and lidar the AI is also watching inside the car to make sure that you're not falling asleep and there you're paying attention this is Janine one of our employees now the car has mapped and when you use that map to drive by itself the second time [Music] it's came out of our company stopped at a red light stopped on another red light made a turn all by itself is gonna go on top of no HD map at all completely mapped by the car goes on a highway [Music] noelite are no HTML [Music] ladies and gentlemen the future of autonomous vehicles and so so first of all the important thing to realize this there's they're probably gonna be a hundred and fifty million cars and taxis and shuttles and trucks and you know vans and buses that are gonna be autonomous each year okay so we're gonna build a lot of a ton of the vehicles however the thing to realize is the software needs to be developed and every car company will have supercomputers in the future and these supercomputers are one of the hg x or hg hg XTS or XG x eyes or de Sexy's okay and so they're gonna be different classes of computers used by different phases of developing the software but there will be billions of dollars of servers billions of dollars of servers that will go to car companies all over the world as we all develop autonomous driving software and videos platform is open and that's one of the reasons why there's so many companies or working with us let me show you one more thing this is really fun and I want to expand your horizon I showed you I showed you this at GDC and basically it's called project Wakanda let's run it just keep running it this is project Wakanda you go into a holodeck you sit down holodeck creates a virtual reality car around you inside your holodeck experience you see the world outside you see out this window you see this truck right here it's right there can you guys see that this person is inside our company in a holodeck this car is not driving by itself it's driven by this guy in holodeck just like Black Panther okay that's why we called it project Wakanda key guys see this there's Tim driving the car in virtual reality he sees everything steering the car look how accurate he can steer the car oh come on get get apart oh look at that Oh perfect parking completely remotely in virtual reality okay guys we call that project what kind of but we brought it to Taiwan the thing that I wanted to do this you know black panther project Wakanda you guys know right okay imagine your black panther but we took it into ant-man can you imagine this Black Panther ant-man the two movies come together Justin calls this project we Conda small we Conda I want to show it to you are we going to be able to show it to them I think so you think so I think so you think so yes you hope so ladies and gentlemen project week Honda little tiny miniature okay so so there's Justin right in front of you he's in virtual reality here's Justin we just ladies and gentlemen Justin Ebert he's driving this car in virtual reality okay but watch look how small he is okay so yeah go ahead yeah so actually I'm on the top of this building in a small car sorry and we build a little wooden mini city here and we've got a little tiny quarter scale car there can you see this this is what Justin sees in his virtual reality world he's sitting inside a virtual reality car outside the car he sees all this and you see all those guys up there he's a miniature car so I think somebody needs to activate okay cuz I'm unable to drive I see if I can can you drive I would love to drive yeah there's nothing I would love more than to drive right now no apparently I'm just I'm dumb look at me I didn't put it in drive so I think that's what I did wrong hey there we go yeah there we go can you see him alright what is the application for this well in the future as you know we're gonna have a bunch of little tiny pizza delivery things right but sometimes they're gonna get stuck so we will be in virtual reality and we will go into that robot and we will help the robot get unstuck if you have a driverless taxi there's nobody inside the car if something happened and it gets stuck we can go through virtual reality into that car in the future if we make robots and the robot is guiding people from a hotel in your hotel from from place to place and it gets stuck you can go into the robot and you could navigate the robot in the future if you want to have telepresence meaning you're in Taipei but you would like to attend a meeting with me in California and you would like to sit virtually in the conference room you will go through virtual reality into that robot and you will sit there and you will look around just like this and maybe the glasses are even very very small in the future in the future you will be able to merge with a robot hopefully the robot is attractive one but you will emerge with a robot you can have telepresence you can go anywhere you want right now justin is upstairs but he's not upstairs he's right there but he's enjoying upstairs like controlling that car in virtual reality does that make sense the future of virtual reality telepresence and autonomous machines are going to come together in a new way and project what Conda was to explore that this is just to expand your horizons of some of the things that we're working on Justin Ebert thank you very much guys are on Ryan Jeffery Mike [Applause] well in this year [Applause] so that's it Moore's law has come to an end we need a computing approach going forward if we just consider that applications will continue to grow in demand exactly in the same trajectory we will need in just 10 years time in the year 2028 we will need to deliver in the industry 10,000,000 Volta equivalent GPUs incredible amounts of computing but yet I believe that computing because of a I will write larger software and what we'll need even more computing than that GPU computing it starts with a great chip and we created this new chip new processor called tensor core GPU but that's just the beginning what accelerated computing is all about is full stack so creating a new type of processor that fuses HPC and AI number to optimize across the full stack developing switches developing motherboards developing systems systems software api's and libraries full stack we have now reached a critical mass every developer in the world who needs more computing has jumped onto GPUs this is how they're going to do groundbreaking work 850,000 developers around the world 550 applications have been accelerated literally every single leader of servers has adopted our architecture and I want to thank you all for that HG x2 foxhunt invented quanta which Tron so many others we are working with literally the entire computer industry because everybody needs a solution forward everybody understands the importance of AI it's going to be the future of developing software delivery of the software is complicated we created a cloud a cloud of clouds of a registry we call NGC all of the acceleration stacks are there and great computers and great platforms need applications and need demand and now with all the work that we've done over the last ten years one domain after another one vertical after another we create proprietary specialized software stacks that accelerate the most important applications in each one of those verticals ai for training ai for inference could have four high performance computing supercomputing drive that entire server has a reference design called training simulation testing for autonomous driving in it infrastructure I call it tostada if you can't remember what this is just say tostada an Nvidia salesperson will help you I would like to have tostada and then they take you to a Mexican restaurant and so each and every one of these vertical markets trillion dollars of vertical markets can now be served by GPU accelerated computing I want to thank all of you for all of your support and all of you interest in the work that we do the future of computing is rich with opportunities this is the beginning of a new revolution I want to thank you all for being here today have a great GTC [Applause]

Info

Channel: NVIDIA

Views: 34,065

Rating: 4.6211877 out of 5

Keywords: NVIDIA, GTC 2018, GTC Taiwan, Jensen Huang, AI, HPC, Artificial Intelligence, High Performance Computing, NVIDIA HGX-2, Deep Learning

Id: cgG3h87IeIo

Channel Id: undefined

Length: 131min 1sec (7861 seconds)

Published: Fri Jun 01 2018