NVIDIA CEO Jensen Huang keynote address at GTC China 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I am AI I am a creator freeing our imaginations and breathing life into our wildest dreams I am a visionary anticipating the needs of others simplifying our busy lives and bringing us closer [Music] I am a protector the sink creatures out of harm's way watching over as we explore the world and helping our heroes make it home safely I am a human revealing more together picture precision an innovative finding smarter answers to complex tasks and working in harmony [Music] I am even the composer of the music you are hearing [Music] a brought to life by Nvidia deep learning everywhere [Music] ladies and gentlemen please welcome nvidia founder and CEO jensen huang [Applause] - aha point eyes are like Nvidia GT see that's all the Chinese I'm going to speak one day I will be able to speak to you with a universal translator I will speak in English you can hear in any language you choose but today I'm gonna have to speak in English I have so much to tell you because we've been very busy this year so let me get started and Vinnie as 25 years old we've dedicated our company to build computers to solve problems that normal computers cannot solve we built computers for the Einsteins the Michelangelo's the Leonardo's of our generation we built our computers for you and this is your conference GTC this year sixty-one hundred registered attendees growing 250% in just three years time no conference in the world covers this much about the future of technology and future society the work that is being shared here is incredible look at all the topics artificial intelligence inference cloud tools data science edge computing automotive autonomous machines gaming 5g rendering design finance high-performance computing healthcare life sciences graphics virtualization artificial intelligence frameworks and industrial applications when you say that all out loud in one sentence it is really quite mind-boggling the impact the computing technology we have created and that we have done together I want to thank you all for that particularly I want to thank all of our partners and sponsors for making GTC possible so I want to please join me in thanking them for helping us and supporting us for GTC accelerated computing is now recognized as the path forward as Moore's law comes to an end it is very clear we need another computing approach accelerated computing that we have been pioneering for over 20 years is a very sensible and logical approach use the right tools for the right job a program has sequential and parallel components run the sequential part on the CPU run the parallel components on a processor designed for parallel processing that was the great insight of our company to use the right tools for the right job not to replace the CPU but to accelerate the portions of the application that another processor could do much better however accelerated computing starts with an amazing chip it starts with a revolutionary processor we call the GPU however that's just the beginning accelerated computing is a full-stack engineering challenge in order to gain the benefits of accelerated computing you have to re-engineer and re optimize the stack from the processor to the design to the algorithms the system software the tools and even the applications by working together as an ecosystem we've refactored the most important applications in the world and have achieved speed ups that are simply unimaginable 10 times speed up 25 times speed up 500 times speed up sometimes incredible speed ups the journey of refactoring the software allows us to scale the performance beyond the processor itself so that now the system can contribute - accelerating the application as a result nvidia has been a system architecture company thinking about accelerated computing at the very large scale even the data center scale but the single most important thing about accelerated computing is architecture and the architecture part is one architecture it is so important for developers to have one coherent consistent reliable architecture that they can look forward to and develop software to engage the install base since the very beginning of CUDA we have dedicated ourselves to be consistent and have one CUDA and CUDA everywhere every single GPU that Nvidia produces whether it's for gaming self-driving cars cloud supercomputer your laptop your desktop even your embedded systems has one architecture one single architecture all compatible as a result of that a developer can dedicate themselves to improving the software stack knowing that 200 million people can benefit from it there are now over 1.5 million CUDA developers around the world the number of CUDA downloads each year continues to grow and it's really really quite exciting it is very clear we have reached the tipping point and this architecture is here to stay each one of the developers when they optimize the performance benefits the entire install base this positive feedback system is the reason why we are now seeing such incredible embrace for invidious architecture but this is just the concept the reality is the reality is when you have one architecture you are future proof when you have one architecture the software continues to improve when you have one architecture your computer will continue to improve over time even though you purchased it long ago our company has dedicated ourselves as I mentioned to a full stack this full stack is complicated if you look at the software that's on this full stack it's some of the most incredible software that is available in computing today CUDA of course we are in version 10.2 - think that the market texture is available since 10.2 and every single application that's developed for CUDA will run in computers that we have not built yet and every software that is written to the future will run in CUDA today that capability is really quite remarkable on top of it however is some of the richest computational algorithms and libraries in the world today this is our in fact my greatest source of pride the fact that this computing stack has benefited so many people in so many fields of science in so many industries is really quite phenomenal this year alone we reduce we released enhanced new and enhanced 500 different libraries 500 different libraries this year alone and in each case in each case the computer got faster for example without changing the computer training performance improved by a factor of four in two years without changing the computer just the continuous refinement of the software in the libraries between us and our developers enabled the install base to quadruple its performance in just two years in the case of inference something that we've been working on for the last couple of years doubling in just one year the computer didn't change the silicon didn't change the GPU didn't change the software and across this entire stack allowed performance to improve by a factor of two what took many weeks what took many weeks with GPU acceleration became days what took a couple of day can now be done in just a few hours this continuous refinement of software this continuous dedication to one architecture is what has put us here today and so I want to thank all of you and all of the developers around the world for working with us to continuously improve the stack thank you very much [Applause] nvidia innovates at the intersection of three main technologies we innovate at the intersection of computer graphics high-performance computing or scientific simulations and artificial intelligence nvidia is a simulation company we simulate the world we simulate physics we simulate human intelligence we simulate the world we are a simulation company at the core and we do it so fast new applications are possible today my talk is going to be in three major chapters first chapter is computer graphics then I'm going to talk about high performance computing and then I'll talk about one of the topics I'm sure many of you are interested in artificial intelligence and some of our work that we're doing there computer graphics last year we reinvented redefine the future of computer graphics by realizing a long term dream of making real-time ray tracing possible a dream of one of our researchers 35 years ago last year we introduced the technology that redefines how computer graphics will be done we call it Nvidia r-tx with r-tx we can simulate light in a much more natural way as a result shadows reflections and all of the subtle beautiful things that we see in the world are made possible and made possible in a very natural way without all kinds of of course software optimization that is necessary like light probes or reflection probes and pre-baking with global illumination all of those techniques that make modern computer graphics attractive but it also makes it inflexible I'm about to show you a demonstration of a very simple game on surface this is minecraft it is the single most popular game in the world hundreds of million subscribers 300 million here in China alone a hundred million active players each month 100 million active players each month the reason why minecraft is so incredible is because it allows you to create any world you like minecraft is really a world simulator and that's one of the reasons why we love it so much but minecraft is created by you you create the world which means that it's impossible to Bri bake with offline computer graphics to improve the visual quality everything is done in real time and as a result we have to simulate the light in real time we simulate the light we simulate the way it bounces in the environment we simulate the way and interact with the materials the way in reflects to refracts as a result minecraft can be beautiful but it is computationally intense what I'm about to show you is something we've been dreaming for a long time and it's all completely in real time now this is a project we're working on with Microsoft and hopefully over the course of the next several months we'll be able to allow more people to enjoy it ladies and gentleman let's take a look at a video of the work that we've been doing in Minecraft [Music] [Music] [Music] [Music] what do you guys think minecraft the amazing creativity of gamers and NVIDIA r-tx coming together for that incredible experience there are so many amazing game developers here in China and they're doing beautiful beautiful work and they're telling wonderful stories there's just a few to show you but it is very clear now that RT X has taken off the ability to calculate in real-time reflections and shadows make everything more realistic without artificial light probes and environment probes and pre-baking of light using global for global illumination we can make it so easy for these video game worlds to appear like it is in the real world the reflections are wonderful the shadows are wonderful these are just incredible games I can't wait to see them it is now a foregone conclusion that ray-tracing is the future and RT x has been a home run we have redefined computer graphics this next one I'm going to show you is really quite amazing this video game was created by one developer I don't mean one developer studio I mean one person it is not possible to do this unless you have an army of artists to help you do light probes reflection probes pre-baked lighting for global illumination and ambient occlusion this is simply not possible with just one person but it is now with r-tx ladies and gentlemen let's take a look at this [Music] [Music] ladies and gentlemen bright memory this is the work this is the work of one extraordinary developer but of course it is one developer I want to show you something else so last couple of years been working on this project called max Q max Q we've always imagined wouldn't it be amazing if we had our gaming system with us all the time unfortunately gaming requires powerful pcs and powerful pcs tend to be large when we started working on notebooks gaming notebooks they look like this they were quite large they were quite large so that the GPUs which are very powerful requires a lot of thermal and electricity and therefore the system's tend to be quite large but today we love computers that are thin and sleek and beautiful and so we set ourselves up dedicate ourselves to create a whole new category of products we call max Q from the architecture to the design to the system software and the system engineering that goes in with all of our partners we made it possible to put the highest end GPUs into sleek beautiful gaming notebooks the success has been incredible the success has been incredible just this last year we sold 5 million gaming laptops here in China alone this is now unquestionably the fastest-growing new gaming platform let me show it to you this is one example right here thanks ball guys you guys saw that this is a bright memory look at this can you guys see this ray-tracing in my hand what was previously just 2 years ago completely impossible to do now we're doing ray tracing in my hand and because I've been working out I could do this for a long time Paul thank you okay max q we don't just want to put incredible gaming on a notebook we also want to help everybody who has a weak PC to enjoy amazing games we estimate that the Nvidia installed base of active gamers about 200 million there are 200 million gamers who are enjoying beautiful games on their PC some of them most of them on desktops some of them increasingly will be on notebooks however we estimate some 800 million gamers whether they have low-end pcs their laptops are not powerful enough or don't have the necessary graphics maybe they have a MacBook maybe they have a Chromebook and the PC games they want to play don't play on those computers and so we've been working on cloud gaming for some time I'm super excited to announce today that the largest game publisher in the world one of the great internet companies in the world Tencent is going to launch a new cloud gaming service they call start and it's going to be powered by Nvidia we're going to extend the wonderful experience of PC gaming to all of the computers that are underpowered today the opportunity is quite extraordinary they're starting to do beta trials all over China and we can extend PC gaming to the other 800 million gamers in the world that aren't possible today so let me welcome Tencent and let's thank them for their support look forward to their service rate racing and video games of course is going to make it more beautiful more fun the special effects are going to be more incredible one of the applications where people would have thought that Nvidia would always have been part of is film quality rendering film quality rendering photorealistic rendering has historically been only done on CPUs and the reason for that is because the program is too complicated it is too large the algorithms unfortunately did not map well to fixed-function graphics and even programmable shaders had its limits until we created the Nvidia r-tx with Nvidia r-tx we can now do ray tracing in a general way and so with it path tracing ray tracing rasterization and all of various hybrid techniques of computer graphics are possible we've been working with all the world's leading developers to bring the most important rendering software packages to Nvidia r-tx and today I'm delighted to announce that the three top rendering software packages in the world are now r-tx accelerated Autodesk chaos group V rip and blender the open source blender ladies and gentlemen let's welcome them to the RT X these rendering packages will run out of the box on the brand-new systems that we've created we even created special line of computers for creators they're underserved for a very long time and yet the creative platform is so computationally intensive everything from the amount of day that they have to operate they have to work on the rendering software they run it is a computationally intensive application and so we created a whole line we called Nvidia studio the Nvidia studio ranges all the way from laptops beautiful laptops sleek and thin to desktop computers to powerful workstations with four r-tx 8000 GPUs each with 48 gigabytes of memory connected into one system or a server with 8 r-tx 8000 GPUs this entire range completely architectural II compatible every application runs on every computer and every application is now incredibly sped up and so we would like to make it possible for all the developers in the world to be able to enjoy their creativity and their art without the burden of the heavy computation that they have to experience we announced omniverse earlier this year omniverse solves a really really great problem it turns out that high quality 3d animation is one of the most complex workflows and most heavyweight computational pipeline of applications we know when you think about the workflow of 3d animation from the concept art to the geometry to the rigging of the characters to the animation to the texturing and the lighting each one of these thefts and stages requires different tools and the number of tools that they use are just really quite incredible and because a 3d animation requires so much labor so much art so much experts so many experts in so many different fields of tools they're in different studios around the world it is not possible for one studio to be able to create one large movie by themselves and so that work is spread all over the world among many many studios so you could imagine the amount of data that they create is gigantic terabytes and terabytes and terabytes of data the tools that they use is complicated and different and the expertise is different and diverse and various and the studios are spread all over the world how is it possible that they can get their work done well they don't have the benefit of a Google Docs or cloud Docs where they could share content until now we created we created a universe we call omniverse a place where all of the content creators could create portals from their application into this world by creating a portal into this world they could share their data they could share their content they could share their design with different tools and different pipelines across the entire workflow they could do that from around the world because omniverse is in the cloud it's in the data center it is remote and it gives you also one Universal view so that you could see everybody's work with one ground truth and we call that omniverse it looks basically like this omniverse sits in the middle it's a it's a database that updates with creation with portals from USD Universal seeing description length descriptor language that was created by Pixar incredibly popular every single application that has created a portal through USD into omniverse can attach to this developers from different parts of the world in different parts of the pipeline can see one common view of the content we created a viewport that is able to render physically accurate physically based photorealistic ray tracing and path tracing with physics integrated and with the state-of-the-art material language that we created called MDL all integrated into this world we announced that in the industry has been so enthusiastic so excited to work with us and today we're announcing a brand new part today we're announcing that we're bringing omniverse to architecture engineering and construction this industry is of course booming it's booming for two reasons maybe more the two reasons that are exciting to me is that the future buildings are essentially gigantic products they're gigantic machines there are such spectacular miracles with so much complexity it is impossible to achieve this spectacle without completely designing it in digital simulating it every step of the way with software that are interconnected from design to simulation it has to be created just like a machine these future miracles are going to make it possible for us to welcome another billion people into the world in the next 15 years and so this area of engineering is going through a complete revolution the second part of course the second reason is that this industry is growing so fast because more and more people are moving to the cities and as a result buildings have become more sophisticated cities are being reinvented and these buildings are works of amazing technical marvel and technical and beautiful works of art let's take a look at one of the examples and I'll come back and explain to you how this all works so this is a rendering from the omniverse viewport that I just described and the lead of omniverse the chief architect and the engineering lead revela marion is going to talk to us rev why don't you show us omniverse alright so what we have here is a a tool used by many architects and industrial designers Rhino by Robert MacNeil and associates inside this this tool we can see a building in Chen said it's the China resources tower designed by KP f KP F is a leading architectural firm they've designed for out of the 10 tallest buildings in the world Rhino is an excellent tool used by these designers to create the most complex buildings and architecture and design them so inside this world we can we can take a look at this building in all of its glory it's a special building because unlike normal buildings it has an exoskeleton with 58 vertical columns on the outside to maximize the the amount of space you can have on each floor of the building these fifty eight columns converge to 28 columns at the botton at the top there's a lot of complexity here this isn't just about looking at a beautiful renderer this is the actual design of the building so with our new tooling we created a portal to go from rhino into omniverse plugin so let's take a look at what it looks like when we go into into omniverse a typical workflow for architects earlier on in the design is to create a foam model that's usually on a tabletop where they can take a look at what the building will look like without being distracted by all the materials and and those details architects like to see it in context of what it'll look like in the city and this is Shenzhen Bay over here with the building place accurately around the other buildings but one thing we can do inside this virtual world that you can't do in a real world is simulate what the Sun will do in terms of shadows and lighting we can do that here inside our tool and we can do it in real time change the time of day and match exactly where the sun's placement will be at this time of year relative to to the buildings at that latitude and longitude now first of all so first of all rep will just go quickly until honor first came came came about this tool rhino and this tool weren't able to communicate and so now they're able to communicate through this portal that both of them has created into omniverse the second is the ray tracing global illumination is done completely in real-time and so you've probably noticed you probably notice the beautiful shadows that are being casted the shadows and the indirect shadows the indirect lighting that's that's happening it's all completely done in real-time otherwise this would have taken hours to render each frame and that's the reason why people made clay models or phone models alright so let's go to some some Beauty shots of what the building looks like after we apply the materials and go to the next stage of design we so you might have thought that that last scene was actually a photograph or a pre rendered image in fact just now you saw Revd make the move this is completely done in real-time let's show you some one of the shots let's take a few more everything I hear is done in real time every single scene all Rev is doing is changing the camera angle inside that world and we can also change the light source to change the side the Sun in real time and the last one you probably notice the flowers look like flowers because the flowers have subsurface scattering look at the beautiful surfaces REME is that amazing every everything was done in real time it was rendered on eight GPUs on just eight GPUs rendering this entire scene in real time and Rev was able to change the camera angles anywhere inside that scene and it instantaneously recreates that photo real image now designers and architects could enjoy their building before they build it and know exactly what it looks like exactly how it feels now omniverse makes all this possible I'm diverse makes all hey thanks a lot guys that was great so what you were seeing what you were seeing were the different applications it was portal portal into on Bert omniverse this other universe that is shared by all of the users different designers of different tools and in different locations that they like and they see one viewport that viewport could be streamed to any device it could be streamed remotely to a Chromebook or a MacBook or a PC or a laptop our phone doesn't matter where it is it could stream it to you as a result everybody can enjoy the ground truth what it looks like in exactly the same way and from anywhere okay I'm nervous ladies and gentlemen this is omniverse for AC early access is available now the enthusiasm all over the industry is just incredible thank you very much guys we want to put rendering everywhere we of course would like to make the computers faster-- themselves but one of the best things ever is cloud computing anybody with just a small budget can have an opportunity to enjoy what a supercomputer can do in the cloud here in China here in China the single largest cloud rendering platform is called ray vision ray vision renders for 85 percent of the studios and designers here in China the top three movies were made with ray vision this is a gigantic rendering cloud ladies and gentlemen today we're announcing that ray vision has adopted Nvidia r-tx so that they could accelerate clout rendering in the cloud for all the designers here in China now let me show you the amazing results in the end in the end what it what is what a developer what a creator wants is to be able to create their art as cost-effectively as cost-effectively as possible this is a comparison between running the rendering on the CPU or accelerating the rendering on NVIDIA RTX GPUs look at that 485 hours for a sequence of shots costing 310 dollars to 39 hours and just $40 unbelievable from three weeks to just a couple of days one-seventh the cost you guys have heard me say this before the more you buy the more you save that's right that's right the more you buy the more you save and that is the perfect example of that okay so rendering in the cloud that was the first chapter RTX games rendering creatives omniverse let me change pace now and talk to you about the second chapter high-performance computing I want to talk to you about high-performance computing from the context of a few new applications that weren't possible before something that you've probably not experienced before and so the first thing is this of course simulation is possible nasa and vidya have been working together on simulating their march lander by the end of 2030 nasa will send astronauts to mars six astronauts will be inside a lander and this lander is about the size of a condominium so imagine we're gonna put six ash in a condominium fire them into Mars into space and have them land on Mars when they come to Mars they will be traveling at 12,000 kilometers per second 12,000 kilometers while per hour 12,000 kilometers per hour they're going to be moving incredibly fast as they entered that low atmosphere which is just a fraction of earth the amount of propulsion necessary to stop it in time to stop it within just six minutes or that it could land safely on the surface of Mars is incredible it has to fire the retrorockets at exactly the right time exactly the right angle exactly the right intensity and so we've been working with them on simulating hundreds of thousands of these simulators simulations fluid dynamics simulations so that they could experiment and imagine what it's like and designed their propulsion system design their lander designed their landing algorithm so that they could safely land these astronauts however the simulation results generates tons of data about a hundred and fifty terabytes of data a hundred and fifty terabytes of data now the question is what are you going to do with that data and how do they analyze it and so we created a platform based on our dgx a brand new software stack we call Magnum IO for being able to stream very high data rate from storage we call GPU direct storage connected to a whole bunch of Mellanox Knicks and rendering on top of a software stack called Nvidia index distributed volumetric rendering software so all of that technology the dgx index Magnum I owe the Mellanox Knicks directly connected to DD and storage is going to make what you're about to see possible it looks like a movie because it is a movie but it is completely rendered in real time ladies and gentlemen take a look at landing on Mars [Music] [Music] [Music] every everything you saw was simulated everything you saw was simulated there was no art all of that is fluid dynamic simulation 150 terabytes of data now you can fly through it with a supercomputer the first application I want to tell you is basically that HPC will be used for analytics in the future whether it's for scientific simulation because you create so much data or is for data science you have so much data analyzed you need a supercomputer to analyze that the first is high-performance computing for data analytics second we've been so interested in this field for so long whole genome sequencing the ability to analyze completely the human genome the benefit of that is you could predict you could predict as a result alterations in your DNA and to find early on potential inherited disorders or cancer mutations that are causing its progression or discovering a widespread disease the ability to sequence the human genome in totality is incredibly powerful this can be used of course to enhance our understanding of life improve our health but also in agriculture in livestock so that we could protect livestock and enjoy better better vegetables and farms that produce more effectively there are so many applications for whole genome sequencing WGS the breakthrough that makes it possible our recent advances in NGS where the goths next-generation sequencing machines and one of which is the bee GI live science supercomputer it is able to sequence 60 whole genome sequences per day what used to just recently took 15 years to do the first human genome and still millions of dollars a decade ago it is now possible with the I live sign supercomputer to sequence 60 genomes per day 16 whole genomes per day in order to sequence and understand the human DNA the first step is to do that and it generates a whole bunch of small fragments of your DNA these little small fragments is called short reads these short reads have to be reassembled into your genome the way they do that is they compare it to a reference they compared these little tiny short segments to a reference and they figure out which one of the segments go in front of to which ones which ones connect so they reassemble and reconstruct your genome from all these little tiny fragments comparing it to a reference and then most important thing the goal is to for them to do what is called call variant to identify variations the identification of variations allows us to detect as I as I mentioned earlier earlier mutation potentially cancer developing cells or that you might have a disorder that was inherited from from your parents and so that reconstruction process genomics analytics is done with a toolkit called gatk genomics analytics tool cave that's the industry standard it was created at Broad Institute it is used all over the world it is incredibly computationally intensive as you can imagine because the human genome has three billion base pairs it's like a sentence that with three billion characters inside okay so this is a very large book each one of us has a very large book encode that encodes us and codes are our DNA and that human genome 3 billion base pairs comes in a bunch of little tiny fragments that we now have to figure out exactly how they're ordered and whether there are potential variances gatk is the toolkit that is used to do that is an incredibly CPU intensive in fact it takes about 30 hours it runs on one node of computer is not very scalable gatk runs on one node of computer it takes 38 30 hours a day and a half to go and compute one of the 60 genomes that are being generated and and sequenced by the BGI LiveScience supercomputer once that is done petabytes of data are going to be used to analyze what's going on this is a classic machine learning problem in fact bioinformatics was one of the first industries just started to use machine learning before there was Python there data science is prepared at Ascension languages called R and they used machine learning to go find clusters and irregularities deviations and so on terabytes and terabytes petabytes and petabytes of data are being analyzed to find anomalies this is the three stages of what is called bioinformatics from gene sequencing using a next-generation sequencing machine that goes into a genomics analysis toolkit so that you could sequence the whole human genome to create data that you could compare against all of the other previously sequence genomes represents the bioinformatics pipeline this is one of the most important workflows in the world as you can imagine why and so it has we have dedicated ourselves to find a way to make this more productive now one of the things that you already know is we're working on machine learning and so all of the work that we're doing with rapid that I talked to you guys before rapid high-speed accelerated data Sciences of large data this is going to go directly into helping them here however one of the areas we have to dedicate ourselves is the genomics analytics pipeline gatk we've been working with a small company called para brakes para brakes are amazing amazing scientist that focuses in this space and they created a toolkit they call para brakes and it accelerates gatk this industry standard produces precisely the same answers their work their work and our work with them was so exciting was so exciting that we decided to join forces and have para BRICS join us and for us to really double down and put this technology in the hands of every single jananam genomicists and genomics scientist in the world so ladies and gentlemen today we're announcing that Nvidia para bricks our genomics analytics toolkit is available to everybody pair breaks the software the pipeline is available on NGC it's accelerated with acceleration you could imagine to speed up 30 to 50 times so instead of instead of instead of a server and rendering it for 30 hours while the sequencing machine is generating it at 60 per day now we could keep up with the data rate of the BGI LiveScience supercomputer now this is such an important first step because in the future we're gonna sequence more and more people and the reason for that is this we've discovered that every human is not exactly the same and based on our ethnicity based on where we grew up different regions there are different standards and different references using one common reference for all human is not accurate enough and not sufficient so we're going to create many more references we're going to make it possible so that we could of course compare against all those references and find varying call and duke variant calling against all the references so the amount of computation for genomics is going to go up and of course we would like to do it more frequently okay and so this is an area that is really really important I'm super excited about that this year this year if I didn't include if I didn't include the first application for HPC that I mentioned earlier which is ray tracing we added two new applications to the Nvidia CUDA stack and two applications that have extraordinary importance to the future of society the first of course is BGI BGI using our pair Brix stack for genomics analysis but the other I announced several months ago Nvidia's CUDA is now able to process very large-scale 5g and the first user of our platform we call Arial is Ericsson and so it is very sensible to all of the computer scientists in the room that Nvidia CUDA should be good at the radio the virtual radio virtualizing the radio and running it completely in a software stack now we could put that virtual radio inside a data center instead of the edge and have it be be be much more flexible in adapting to the different traffic patterns that changes over time and also to be able to layer on and add AI functionality that was impossible in the past could you imagine if we could use AI to optimize for traffic in the patterns and adjust the beams in real time so that the amount of energy that's used for 5g could be reduced the bandwidth can go up the quality service could go up all of those benefits are possible to brand new applications that brings the Nvidia accelerated computing into to new industries one telecommunication and then the other genomics thank you very much armed the most pervasive CPU in the world a hundred and fifty billion shipped almost everybody in the world has an ARM processor 1700 licensees 95% of the world's customers OCS are based on arm it is the most configurable CPU core and CPU eisah we have in the world it is therefore sensible to see so many industries in so many countries there's so many companies building arm servers two reasons for that the first reason of course is that high performance computing requires very low very good energy efficiency very good power and so we could scale it up into a gigantic machine energy efficiency is vital to high performance computing and second because the world is moving to cloud hyper scale is completely open sourced hyper scale is a brand new stack anyways and hyper scale does not care about the CPU eisah and that's the reason why you see hyper scalers and supercomputers adopting and creating custom arm CPUs these custom arm CPUs allows them to change the ratio of course to memory bandwidth to i/o bandwidth and as a result solve a whole lot of problems that a single CPU cannot you see all kinds of amazing CPUs ampere is building a CPU they call emag amazon recently announced graviton two really exciting looking CPU marvels thunder x2 Fujitsu the most energy-efficient supercomputer in the world number one is Fujitsu were number two but they're number one Fujitsu with their arm a 64 FX and Huawei with their component 20 that's dedicated to edge computing as well as hyper scale and AI all of these processors are super exciting and the industry has been crying out to us and asking us is it possible for us to accelerate all of these appliqué all of these processors because as you know acceleration is vital to the future of computing and so this year we announced just a few months ago that we're bringing CUDA to arm and to fill that hole so that we could enable arm to be a fantastic platform for high-performance computing as well as AI and so this is our arm reference system our arm reference system has two external PCI Express connectors and it goes every every two CPUs connected to four GPUs the reference system is based on marvels thunder x2 and in just a few short months look at the ecosystem all the tools and all the amazing applications that are now running on arm we have molecular dynamics we have quantum chemistry fluid dynamics finite element analysis we have computer graphics we have ray tracing let me show you one of them this application is the simulation result of Nandi Nandi is a molecular dynamic simulation after you're done understanding the molecular machine so that you can understand at the simulation level understand this biomechanical machine you want to visualize it essentially this this application is called VMD visualization of molecular dynamics VMD is essentially a computational microscope and you could look under this confident computational microscope and understand how molecular biological machines function and so you simulate and then you put that simulation result in essentially another supercomputer and you use VMD to visually understand it on top of VMD is made possible with two applications two libraries from nvidia one of them it's called optics and the other is streaming our ability to capture and code and remote that computer graphics makes what you're about to see possible so VMD VMD is a vitally important tool and let me show to you now so we are looking at a life rate tracing of the Rift Valley fever virus running in vmd on an ARM based computer this virus is a biosafety level 3 pathogen and it's fatal to approximately 20% of the infected people the problem is that we don't have a vaccine yet and to design vaccines it is important to understand the structure of the viral proteins professor Li from Tsinghua University managed to solve the structure of the virus so this can potentially help a lot of people in the future and the data set to solve the structure we are seeing here was about 50 terabytes large and it took one week to process with nvidia tesla 300 GPUs and they probably couldn't have done it without it this is really fantastic let's thank professor Lee in the Xinhua University tip for helping us demonstrate this incredible capability and also thanking them for the important work that they're doing guys thank you ma'am d VMD on CUDA unarmed the single most important application in the world HPC application in the world uncorrelated sir flow tensor flow is being used of course for machine learning and artificial intelligence but is used for scientific computing now it's used for industries is used in business this particular application and application framework tools framework is pervasive scaling up tensorflow is a high-performance computing challenge it is just not easy to do so and so we've dedicated ourselves over the last about five six years working with Google to enhance tensorflow so that it could be accelerated to a limit this is tensorflow 2.0 it just came out just a couple of months ago and ladies and gentlemen you can now know that it runs on CUDA on arm and the performance at scale is very close to the state-of-the-art this says something about the power of CUDA and the power of factory factorizing all of this software so that it could be accelerated this runs on a multi GPU configuration in a multi no the entire stack is identical to the stack that we use on x86 and the performance is fantastic I want to thank the folks at Google for working with us on on tensorflow and and all the people that worked with us on making CUDA unarmed possible thank you let's talk about artificial intelligence in 2012 the confluence of several factors kicked off modern AI the abundance the overwhelming abundant abundance of data and some researchers using our GPUs to run deep learning these new algorithms that automatically detect features and because as because of its deep structure can hierarchically learn the representation of knowledge so that it could be generalized the effectiveness of deep learning has been extraordinary the confluence of these three factors Big Data deep learning and NVIDIA GPUs kick-started modern AI since then since then amazing things have happened we've seen in just a few years time the achievement of superhuman levels of image recognition superhuman levels of speech recognition and now some very important breakthroughs in natural language understanding when you look back on 2012 and it was just this incredible breakthrough we call deep learning and the winning of imagenet almost no one realized the potential impact of that moment on the industries around it image net or Aleks net Aleks net in 2012 really kick-started innovations in so many different industries as a result of computer vision finally being solved it's not completely solved of course but computer vision has achieved incredible results as I mentioned earlier superhuman levels that moment has made it possible for us to have kick-started the self-driving car revolution and the work that's happening is kick-started amazing photographs to be taken on phones to the point where you can now take pictures in the dark you can now have radiologists be able to use AI to detect disease the number of applications in a number of industries manufacturing robots are now going to be advanced the number of industries that will be affected is truly daunting all started as a result of imagenet last year something very happy very important equally important happened an arguably long term will be even more important which is the creation of Bert this natural language model that is pre trained on a large corpus of information and somehow understood by looking at all of this data in Wikipedia all of this text all these stories all of these sentences it learned the structure of language Bert just as imagenet is decoded has decoded computer vision Bert is in the way in the process of decoding the code of human knowledge natural language language is the way that we communicate and we transmit knowledge to each other we encode it to protect it to remember it we encode it to transmit it and share it language is the code of knowledge and now with Bert we have decoded the code of knowledge understanding the code of knowledge as if h.264 or JPEG or understanding that code unleashes all kinds of incredible innovations we're going to find in the next several years the ability to do some amazing things in the way that we deal with text and to do way we deal with language at the core of what makes that possible of course it's the engine of learning from large data in the last several years in the last five years we have accelerated training of deep learning by 300 times just think about that in five years Moore's law in five years is 10x we've accelerated training by 300 times and the ability to Train large models tremendously we did so by innovating at every single level as I mentioned earlier from the GPU architecture v100 in inventing the brand new tensor core the COAS packaging system where chips are layered on top of chips and wafers the 3d packaging of memories the use of high-speed memory call HBM connecting all these GPUs together into a super high-speed link we call env link creating a system we call dgx all of that so that deep learning researchers can continue to explore the boundaries to explore the limits of what's possible with AI it is the number one fastest computer on m/l perf challenge two years in a row this engine is really has really helped researchers all over the world and happened just in time if you take a look at what's happening notice what's happened everything was moving along just fine and computing computing levels was moving at traditional rates and all of a sudden all of a sudden the emergence of deep learning learning from all of this data computers that write software by itself computers that learn automatically important features and patterns from a large amount of data all of a sudden emerged and it caused computers to skyrocket the computer workload rising quickly makes sense and the reason for that is this if you say to yourself that AI is about machines computers and software that learns to write software by itself why wouldn't you want the world's fastest computers to write the world's best software and so the number of number of HPC installations around the world has really skyrocketed in just the last several years and now performance requirement this is workload computational workload is doubling every three and a half months this trend is likely to continue and the reason for that is because one of the areas that we would like to explore is multi mode learning to be able to take different molds modes of information combine it together and learn from it find patterns across different modes of information whether it's sight vision and language video and language images and language previous text your history and language all kinds of different areas could be mixed together for us to learn patterns and connections from our AI journey starts our AI journey starts with the creation of the platform for training models dgx is our training system this is the beginning of the journey the most important journey of course is applying AI applying AI and so the question is how do we apply AI and bring this capability to the world the first step is sensible it should be in the cloud the availability of data the type of services are simply not possible without AI without machine learning and so our first journey was cloud the platform we call for cloud computing is hyper scale and therefore hgx dgx was our deep learning GPU accelerator dgx was designed for training hgx is designed for cloud egx is the next stop of our journey as we move a eye closer to where the action is we call that edge and that system is called eg X and eventually we would like to bring AI all the way out to the world where the AIS are autonomous and moving among us that system and that platform is called AG x AG X for autonomous or robotics eg X for edge hgx for cloud d GX for training one architecture but the software stack and the computing stack completely different and the reason for that is big very obvious because each one of these computing platforms whether its autonomous at the end or in the cloud have very different capab the requirements the way you manage the way you operate its form factor all different and so this represents the taxonomy the language if you will the mile the mile markers of the rest of my talk with you I'm going to talk to you from this side to that side we have now in the AI chapter the first section we've already talked about I've got three things I want to talk to you about the first the first is the single most important AI model on the Internet this is my schematic of the single most important AI model of the internet without this model it is impossible for us to enjoy the Internet and the reason for that is this the internet gave us access to the world's information unfortunately the world has a lot of information the amount of data that's out there it's trillions of web indices hundreds of billions of tik-tok videos I think it's hundreds of billions right billions of products on Taobao news in the millions books into millions movies into millions the era of search has ended if I put a trillion things a billion things a million things and it's changing all the time how can you possibly find anything the era of search is over the era of recommendation is here everything has to be recommended of course the Internet companies have known this for some time it started with of course different and much more simple versions of recommendation systems using matrix factorization whether it's collaborative filtering or content filtering most of those applications were light on computing but it had limited capabilities didn't have the ability to understand for example information that are unstructured for example it takes deep learning to understand information that's unstructured it has limitations on how far or how big of a corpus it can learn from how big of a directory can learn from how big of a database you could learn from this is a block diagram of a if you will canonical recommendation system the most important thing is this there are users and items users and items we have six billion users in the world there are tons of items as I mentioned two billion products in just how about alone there's all kinds of movies and social videos and websites and stories and books and every single one of those items is something that we're interested in it is in the trillions there are billions of people there are trillions of choices somehow we have to find something we like somehow we have to find something that makes sense to us how do you solve this computer science problem well recommendation systems is the way that it's done on this side ultimately are billions and billions of choices and it has to go through a candidate generation system from it's a filtering system to go from billions to hundreds from that hundreds this is the most important part it has to figure out how to rank it to you the way it ranks it to you depends a lot on your what is called implied preferences it has to learn your implied preferences what do you like and from that implied preference that it learns it will rank order this list and present to you a few choices from billions and trillions of items billions and trillions of items and changing over time to just a few recommendations it could recommend to your news it can recommend to your books can recommend to your food restaurants you should go to recommend you products recommend videos you should enjoy that you will likely enjoy tweets that you want to read so many things that it could recommend the recommendation system is extraordinary as I mentioned it used to be based on CPU based collaborative filtering techniques and and content filtering techniques but now the industry is moving to deep learning moving to deep learning has all the benefits of being able to learn preferences from unstructured data the ability to scale to a much much larger system however the processing time is also much more intensive there's a couple of things that has to happen the input the input are the billions of dimensions that I mentioned the billions and billions of dimensions billions of people billions of things that is extremely high dimensionality it goes through this concept called embedding kind of like word Tyvek embedding which takes a high dimensionality reduces the dimensionality from that dimensionality of embedded embedded items and users we can learn information learn relationships and interactions and connections associated with those embeddings and it's the reason why it is the reason why King - man is equal to clean it is the reason why pair Apple - Apple is equal to pair those embedding systems allow these relationships to be learned the embedding system and the ranking system is learned together using deep learning it is incredibly computationally intensive as you could imagine this recommendation system that used to be based on CPUs are now moving to GPUs and that is a really really important moment so bi do bi do they want to go to deep learning and they want to create this new thing called AI box and it's based on wide and deep it takes it takes wide vectors wide vectors really super sparse entries that are because of human known preferences we know that people have these preferences and so we don't need deep low to do it and that sparse sparse table is billions wide a hundred billion white dimensionality is very very large the sparsity is very very high on the other hand they have the embedding table ten terabytes large together between that white and deep they have to Train this ranking system this ranking model well it turns out a hundred billion a hundred billion dimensionality what a dimension of 100 billion and a ten terabyte embedded embeddings file and betting's table is simply impossible to do cost effectively on CPUs and so we work together to move it on GPUs and they were able to reduce this reduce the training time and reduce the cost by 90 percent just one tenth of the cost just one tenth of the cost as a result of course they could reduce their cost but very importantly they have so many models they have to train and they would like to move all of them to deep learning hundreds of models from products to news to websites to banners all kinds of models has to be developed for deep learning and so this is really a great achievement they're paid their poster is outside their researchers outside and they would love to love to talk to you about more about it and and it solves this problem of taking enormous amounts of data that's available on the Internet to many choices to filter it to filter it through the recommendation system so that you only see ten how do you take this massive amount of data hundreds of billions of large and dimensionality and in the future trillions large in dimensionality and reduce it to just ten items that's frankly a bit of a miracle and this is the miracle of deep learning this is the miracle of AI and Baidu calls it AI box and the results are really fantastic and the thing that I would like to say the thing I would like to say that because of the incredible reduction in cost the more you buy the more you save Chinese people love saving money right I love saving money makes me so happy speaking of saving money alibaba's recommendation system also powered by a video you guys know Alibaba singles day Singles day the single largest e-commerce event the single largest shopper shopping event in the world in the universe in the galaxy it is impossible what happens here in China on singles day why are there so many singles people two billion products 500 million shoppers on one day 1.5 times the size of United States everybody shopping who is working everybody shopping 500 million people shopping on one day trying to decide which one of the two billion products to choose from billions of queries per second now one of the things that I mentioned earlier is of course this model this recommendation system is fairly universal this is a canonical architecture it's fairly universal and this Universal architecture has all kinds of different algorithms and different architectures and there are lots of different ways to refactor it but basically the concept is the same and to be able to filter all those choices down to just a few and to do it with deep learning the computational load for deep learning was too high for CPUs when they moved to deep learning the effectiveness was higher their click-through rate was higher and you know that click-through rate in e-commerce and Internet directly contributes to the success of the Commerce and so click-through rate absolutely improved however the computation time also improved using a CPU alibaba's model could only do three queries per second three queries per second I mentioned earlier that there are billions of queries per second these 500 million shoppers and all the people that interacted with Alibaba but did not shop created a lot of queries those queries has to be responded quickly and cost-effectively if it can only query three queries per second with one CPU and there are billions of queries per second how many CPUs do you need the entire world CPU is not enough and so therefore we work together to accelerate their deep learning model on our GPUs that is the power of deep recommender systems deep recommender systems unlike traditional collaborative filtering or content filtering matrix factory matrix factorization was very good for CPUs but unfortunately it wasn't very good for GPUs deep recommenders are more accurate can handle unstructured data the features of your implied preferences could include a lot more things the size of the data could be much larger and it can be GPU accelerated whereas the CPU could do three queries per second a t-4 GPU can do 780 g by bosses versus Sun very big difference this is the perfect example of the more you buy the more you save perfect example this also is very interesting this is one of our one of our brilliant data scientists and this is what Taobao recommended to him this music player called a Basso and the world's largest hamburger he must be very very hungry incredible alright so number one thing I want to say is deep learning inference it's wonderful for deep recommender systems and the recommendation system the recommender is the engine of the Internet everything that we do in the future well everything that we do now passes through a recommendation system and it's going to be based on deep learning the future one of the most important tools that we create is called tensor RT 10 so RT is a computation graph compiler and optimizer for CUDA GPUs it takes the output of tensorflow and it goes through and finds optimisations nodes and edges that can be optimized shared fused and generates optimized CUDA code to run on any of our GPUs tensor RT makes it possible for us to not only train with tensor flow but to take the output of tensor flow to run in a very fast way on GPUs tensor RT is very important last year we announced here in China tensor rt5 tensor RT 5 has the ability to handle CNN's views horizontally in the same layer at this in the same layer of the graph to be able to fuse horizontally different nodes combine nodes and edges it could fuse vertically it has the ability to automatically detect places where it could reduce the precision of the mathematics and use different parts of our tensor core GPU FP 32 p16 or into eight to accelerate the deep neural network applicant I'm while reducing power and not sacrifice accuracy that entire loop was made available last year that's called trt 5 it has 30 different optimizations and transforms that it does unfortunately unfortunately as you know some of the most important applications are not CN NS but they're based on our n ends CNN's is a feed-forward network data propagates forward in the network RN ends is complicated RNs is a feedback network state of the passed previous state previous memory along with current data affects the next output it's a state machine RN ends is a feedback network and supremely complicated well ladies and gentlemen I'm really excited to announce something that's really really important and it's probably one of our greatest achievements tensor RT 710 so RT 7 tensor rt7 has the hint has the ability to handle CNN's of course but also transformers rnns auto-encoders which are which are versions of CNN's and be able to do that automatically whether whether you whether it's whether it's the the different all of the different configurations of aren't ends that needs to be generated the first thing it does is it does code generation kernel generation automatic kernel generation so that there are mathematic kernels that has to be generated for CUDA that wasn't possible to have pre described because RNs have so many different configurations they have so many different activation functions there's so many different ways you could remember the past the state machine is very complicated we fuse horizontally wherever we can we fuse vertically wherever we can we Auto generate code for all of the custom kernels of RNN we even fuse over time we look for opportunities in the processing pipeline over a period of time to find ways to share the computation to reduce the amount of processing load on memory to reduce the amount of traffic fuse over time and then lastly of course we still automatically detect all the different areas where we can reduce complexity reduce precision while continue to keep the accuracy if you look at the number of models that we support last year last year we supported basically resonant 50 and SSD and all the other versions of CNN's those two versions those two networks were you know of all that these are the top networks being used in the world the types of networks that people use those two were the only ones that were supported by tensor RT 5.0 and all of a sudden this year with tensor RT 7.0 automatically it will compile and optimize for all versions all varieties of RN ends and LST M's transformers of all kinds and in fact the most important neural networks of our time today it has thousands over a thousand different ways of optimizing kernels and fusing fusing operations from 30 to over a thousand and so tensor RT 7.0 is available now I think it's going to be put up on NGC in the near future and then you'll be able to develop the networks that you would like it would take the computational graph and optimize it into a CUDA runtime and you can get the benefit of really really fast inference ladies and gentlemen tensor RT 7.0 so what can you do with TR t7 what can you do that you could not do before well of course of course of course you could do them slowly of course you could do it with more cost but there are some things you simply cannot do if you don't accelerate the entire pipeline of neural networks one of them is one of the most important developments in AI today conversational AI several breakthroughs have made conversational AI possible for the very first time the ability to do speech recognition at a superhuman level natural language understanding models that correct what it heard wrong so that the precision is really really high to have the ability to understand your intention make recommendations do various searches and queries for you then come back summarize what they learned the AI learned into a text-to-speech system and then to synthesize the voice in a very natural and happy way that loop is now possible it takes 20 to 30 models to make this possible however all of the technology are now in place the challenge of course is that it is too slow conversational conversational speed is of course real-time and if you ask a question and doesn't respond fast enough if you have to wait several seconds it seems like the individual the person is not even interacting with you low latency computing is really really vital and it's we now have achieved with tensor rt7 the ability to compile and optimize every one of those networks from end to end and to do it in 300 milliseconds it is now possible to achieve very natural very rich conversational AI in real time okay so this is what tensor RT 7 can do thank you since last year our journey of TRT has really made progress since last year our journey to accelerate inference has made enormous progress we postulated when I stood here in front of you last year I said that inference is a super complicated problem and the reason for that is very obvious inference starts with many many people developing very complicated software and this complicated software is written by a computer and that computer is gigantic that gigantic computer is generating very complex computational graphs the largest the world's ever seen some hell we have to have another program understand that graph optimize that graph and target it into a computer so that it can run at very high speeds at very high speed so that it could be deployed largely broadly at very low cost the targeting of that generic computational graph which is already very complicated and make complicated by our intents and feedbacks to be able to target that into a machine is really quite a technology undertaking and that's the reason why I've always felt that inference was going to be a great challenge there's another reason why inference is a great challenge the rate of innovation and the number of ideas that people are coming up with and deep learning is growing not slowing it's growing not slowing and so therefore the number of architecture a number of ideas like Bert are growing not slowly it is impossible to know it is impossible to know whether a computer will be ideal for inference unless you knew that it was ideal for training if you can learn on the computer of course it will run on that computer someday and so one of the benefits one of the reasons why I felt so strongly though this is a an area where Nvidia should put our energy in is because inference needs to be future-proof the number of data centers all over the world tens and tens and millions of GPUs or CPUs we need to know that that body of computers will be compatible and optimal for software that's going to be written three years from now four years from now it has to be future proof well one of the great things about CUDA is that it is future proof we are in CUDA ten cuda ten is compatible with CUDA applications written today it's compatible with applicate with architectures we ship long ago the nvidia cuda is future proof applications can continue to grow on top and we will continue to optimize software develop software that would run wonderfully on the install base years from now and as a result one by one by one as all of the internet companies in China moves towards using deep learning and recognize the the power of conversational systems natural language systems and recommendation systems we've seen an enormous success here so I want to thank all of our partners here and all that worked with us to make this possible thank you very quickly we have enjoyed our industry has enjoyed the AI moment two things made it possible the smartphone and the cloud the smartphone in the cloud has made it possible for the iPhone moment the smartphone moment to have made it possible for the industry to be completely reshaped and society be completely reshaped well it is time now that every industry is going to enjoy the smartphone moment we're gonna see the smart everything moment everything will be smart in the future and the reason for that is because sensor will soon be connected to everything some people call I or II and that sensor information will be streamed and it will be processed by AI to recognize automatically patterns changes and to reason about what actions should be taken and take it we are going to automate everything automation is going to be one of the great forces that are going to make each and every industry more productive the automation of everything smart everything revolution that AI cannot run in the cloud that AI has to run at the edge at the point of the action the reason why it can't run to the cloud is because sensor data is being streamed continuously 24/7 these sensors many of them are going to be high resolution sensors they could be cameras they could be light hours they could be radars imaging radars they could be all kinds of infrared cameras they're gonna be streaming continuously from everywhere it is not possible to stream all of that data to the cloud second the latency of traveling to the cloud and processing and back is too long it is impossible to have a robot that is working with you responding to you and the processing done far away the latency is simply too long and then lastly data sovereignty and data ownership cannot be guaranteed in the cloud sometimes you don't want to put private data in the cloud and so there's a lot of reasons why we want to put in intelligence at the edge we call that system egx we started working on egx a few years ago and the momentum has been fantastic we now realize there are so many different applications far beyond our dreams and every single industry has the benefit of now applying AI to automate their industry whether it's in health care a third of the world's population are elderly they should be monitoring and watched all the time to make sure that if they fall if something were to happen to them we can send help right away 550 500 17:4 in the world there's no way to have people monitor all of that we put sensors everywhere two million factories will be automated 13 million stores 27 trillion dollar industry has the opportunity to be automated and to make more productive and more profitable the Universal Translator call centers that understands every single language there's so many different applications smart cities more convenient airports all of these different places has an opportunity to be automated we call that intelligence at the edge and the computing platform is called egx we've seen incredible success egx recently let me tell you about a couple of them one of them is Walmart has put into all of their stores they're deploying into their s'mores stores smart retailer system smart checkout systems make it much more productive if they could save just a few dollars a few small percentage of a twenty seven trillion dollar industry the returns fantastic you still United States Postal Service a half a billion mail pieces per day and using computer vision at a very large scale and very high rate streaming basically it's almost like streaming AI we can now help them do a much better job with sorting mail all kinds of interesting applications and and one application that I'm very excited about it's the ability to put our GPO into the fabric of wireless communications so that AI can also run not just on internet but also on the Wi-Fi excuse me on the telco telco networks okay so the work that we're doing with Ericsson is really important let me now talk about robotics robotics is a special type of computing robotics autonomous systems has to basically do three things we all basically do three things we sense the environment we reason about we might reconstruct the environment in our mind we reason about the environment we reason about the environment relative to our goals and then we plan action sense reason plan that loop is continuous in intelligent systems we would like to put that loop that high-performance real-time computing loop of sensors and reasoning and planning right at the edge so that robotics would be possible we called this basic system Jetson and we also created several applications and stacks for different applicant different types of robotics applications one of them the most important one in the moment is drive using robotics technology to make autonomous vehicles the second is putting AI into instruments in the future your medical instruments will be self-driving self-driving medical instruments so that while you're taking and acquiring the sensor image the system will help you and assist you in acquiring the proper image and acquiring the best image the a I would then improve the image the AI would also help in detecting disease and so AI autonomous self-driving medical instruments from the acquisition to the processing and the detection pipeline really really profound we call that Clara and then lastly for robots that are maneuvering inside an unstructured world cars can drive on lanes but robots cannot they have to operate in unstructured worlds or manipulators those robotics algorithms are different and so we created several different stacks jetson for general-purpose embedded drive for autonomous vehicles Claire for medical instruments and Isaac in each one of the cases our offering is n 10 our mission is not to create self-driving cars our mission is to create the infrastructure the computer and the software so that every company in the world could build self-driving cars we believe that everything that moves in the future will have autonomous capability where there's passenger owned vehicles or vans or trucks or shuttles or trucks trucks or or delivery BOTS everything will have autonomous capability it could be completely autonomous or it could have us in the loop autonomous and the algorithms and the computing structure is basically the same the first thing we do for self-driving cars from the data collection data labeling to the training of the models the simulation of the self-driving car to the in-car computing platform we create all of that we create all of it and we operate it ourselves as if it's our car however we make the entire computing platform open the software stack from the operating system to the middleware the reference applications and the pre train networks the networks that we use to to create this autonomous vehicle are all made available to our partners from infrastructure computing stack as well as pre-trained networks this is our this is our body of work and when we work with our partners all of its made possible available to them today we're announcing something that's rather new we're going to make our pre train models the models that we create and there are tens of them there are so many different types of models that it used in order for self-driving cars to be possible we have been requested time and time again because of the quality of our networks we treat these networks as of their industrial strength networks that we will ship and maintain for as long as we shall live and so these models are done with extraordinary care to the highest possible quality and their capabilities are really quite incredible we have a lot of expertise in this area and so our partners have asked repeatedly whether it was possible for us to share our models with them and the answer today that I'm announcing is from now on it is possible for all of our partners who are using Drive to come to us and we would share our models with them the pre train models the question is what can they do with that it comes with it a tool the pre train models the pre train models of course are designed and optimized for our car configuration with the transfer learning tool it is possible to download from NGC the NVIDIA GPU cloud registry you could download these pre train models and you could use the tensor that transfer learning tool to adapt it to your own configuration you would of course collect your own data we're happy to share with you our labeling system and show you the standards by which we label data that collect the data could then be used to refine refine adapt our pre train data into a new network which would then be optimized and compiled using tender tensor RT into your platform now all of a sudden if you're developing something where the sensors location are slightly different than us maybe maybe you're creating a truck while we're creating a car and maybe the camera location is slightly higher maybe you would like to you like to collect data for a particular use case in region that you feel that our data collection is under contributed there are a lot of different reasons why you can then now take our day our pre train networks and adapt it to your car we call it drive transfer learning this is the first step the really exciting stuff the really exciting part is federated learning we have been asked repeatedly whether it's possible to partner together where your data and your data collection your labeling and our data in our data collection literally could come together to train a common network but we don't have to move data because the amount of data is so gigantic or it might not be possible to share data but we could share a common training of a model network and we call this federated learning we've now developed a system all the infrastructure is put together and so when you collect your data you train it with our transfer learning tool it is now possible for us to now return the weights the Delta weights of the network back to the master server for us to do federated averaging of all the different partners combine that into a new to new network update all of the partner networks this has the ability to protect data keep our privacy reduce the amount of movement of gigantic data that will happen over time and enable cross company collaboration across different countries as well okay so federated learning is really really powerful today I'd like to announce our next generation Xavier Xavier was the world's first robotics processor it was the world's first robotics processor because we designed it for one purpose we designed it for the computing stack of real-time sensor mapping localization and planning that loop is the fundamental robotics loop and we created a processor called Xavier for that fundamental purpose it was the world's first SOC designed with that only application in mind and has been a gigantic success and I'm so proud of it today we're announcing our next-generation a giant leaf or we call it Oran this is the AG X horn it is seven times the performance of Xavier it is seven times the performance of Xavier it's cheap a it's seven times what industry from generation to generation increased performance by seven times the reason is because the robotic stack and self-driving cars is a very hard problem we would like to increase the resolution of the sensors we would like to add more sensors we would like to increase the processing speed so that reaction time of the cars are higher it's very very important we would like to make it software-defined so more software can be put onto the computer and this is the most important part we would like to make it safer than ever so the amount of redundancy the amount of redundancy inside this computer is much greater every single CPU can lockstep with other CPUs every single GPU can lockstep with other GPUs so we can run these processors concurrently and we can check their results because they were running in lockstep inside oran is a very special security processor so that all the data in motion and all the data in place are encrypted we want to make sure that this machine is safe to cyber cyber attacks and that it is safe to tampering the robotic systems of the future must be safety first in mind and so Orin was designed with these characteristics 17 billion transistors 12 CPU cores 12 CPU cores 200 trillion operations per second Xavier was 30 this is over 200 incredible amounts of performance this is what the xavier stack looks like we create a one architecture that's scaled from l5 all the way down to l2 l2 all the way down to l5 one architecture completely software-defined all the software that runs on xavier runs on xavier plus GPU runs on Pegasus which is multiple xavier zhan multiple GPUs some customers chose to build l2 s by taking a fixed function a dashed chip and adding a CPU so that you could have link keeping and ACC some customers decide to use Xavier for l2 so that you could have the ability for some round cameras and be able to do lane changes this basic approach this stack up is consistent across the world almost everybody who's doing self self driving vehicles that are Software Defined are using Xavier and then there are fixed function a dash systems sometimes with CPUs looking forward looking forward we believe that much more people will continue to use software-defined and the reason for that is because operating a fleet of cars is like operating a fleet of phones it is not possible it is not possible to operate this fleet without the ability to have continuous CI CD you want to continuously enhance you want to continuously update and add features to the whole fleet fix problems as quickly as possible CI CD is vital to the future of IOT systems this smart everything revolution this stack that worin enables essentially increases performance by a factor of seven or performance by a factor of four while reducing power at the same time one of the things that we're doing with Oren is we're creating for the first time a cost reduced Oran a cheaper version of Oran so that l2 companies l2 companies with just one camera with just one camera and maybe some round radars could have an entry-level AV car that's also software-defined one of the things that we really love about this and our partners love is that the development of the system all of the software that you develop here will carry over torn as you know everybody in the audience knows that software cost us the vast majority of the engineering cost in the future you ship it you ship a computer once but you maintain the software forever you ship the computer once but you maintain the software in an enriched it with new a is forever this is going to happen in autonomous vehicles the capability is so complicated we are going to be developing software for as long as we can see and we would like to use the same architecture so that the software can be updated across the board across many many years across many many cars in your fleet one architecture completely software compatible from l2 to l5 from generation to generation okay so that's word Xavier was designed for the 2020 start of production cars Oran is designed for 2020 to three years from now started production cars today I'm now seeing a very special partner we've been working with him for some time we worked with them across a ice as they connect connect customers writers to writers to drivers we work with them because they're one of the world's largest largest AI companies the body of work that they have to do is quite extraordinary you know you know you know that that there's a company here that that that connects millions and millions and millions of drivers and writers every single day and that company is DD they're the world's largest ride-hailing company and we work with them we work with them across data analytics across developing of AI to connect and recommend which driver could connect with which writer and now we're working with them to bring autonomous capabilities to their fleet of cars DD has selected drive AV for their autonomous AV systems and they're going to be testing their fleet in the near future and Shanghai let's welcome DD okay the largest industry in the world ten trillion miles driven per year a hundred trillion dollar industry we believe that everything that moves will be autonomous someday and will have autonomous capability someday and the number of things that move from cars to trucks mobility services their startups the delivery vehicles the ecosystem in order to enable this autonomous future is large this is not the work of one company this is the work of one industry we've created an open platform so that we can all team up together to go realize this autonomous future and so our rich ecosystem is really rich you could see all the companies that are working on here just incredible companies that are building all kinds of really innovative products and mobility all the way from building cars to on the other end simulation this rich ecosystem is really a testament to the openness of the platform the software-defined nature of the platform so all of these companies can innovate on top of it and the fact that they could rely on this platform for their entire fleet as well as generations to come and so I want to thank all of our ecosystem partners for helping us and partnering with us to realize this amazing future of autonomous transportation thank you okay I'm going to show you a quick video this is a video of our latest work what you're about to see every year when I come I show you a quick video this one this one is a dress to a dress from a car stopped drive across 17 miles three highways four interchanges with lights autonomously merging into traffic changing lanes the car had to create its own map if a car is a great self-driving car it's also a great mapping car and so the first time you drove it maps the roads it fuses multiple drives together and so now I didn't remembers I've been here before and it's a guide to help it localize after which it does real-time perception all of the models that I showed you earlier is in operation and this is captured this last week our bb-8 driving in california so we'll shoot sensitive [Music] [Music] [Applause] [Music] what do you guys think open platform BBA driving by itself the car even makes sure that you're paying attention sophisticated AI posture recognition all kinds of interesting AIS are going to do for a cockpit as well as the confidence view so that the a I could show you what's in their brain the confidence view is very very important to give you a sense that the AI is doing the right thing we need intelligence to give us feedback the confidence view is what gives us feedback so that's our that's our latest bb-8 the platform is open and we love we love all the work that we do with you guys I want to thank you for that we've done the same thing for robotics same three parts the infrastructure the AI development infrastructure robotics development infrastructure the computing platform the computing platform that's the computer the software stack the algorithms and reference applications in the case of self-driving car in the case of drive AV there's the reference application for AV and there's the reference application for IX intelligent user experience drive AV drive IX are the two reference applications in the case of robotics our to reference applications are carter and leonardo carter designed for indoor navigation or navigation in unstructured worlds navigation in unstructured worlds the reason why that's so different and so hard is because well first of all self-driving cars very hard but it has lanes and signs it has lanes and signs inside a warehouse or inside a building there are no lanes and signs and so it has to navigate in a different way how it understands word it where it is mapping localization and path planning are different and so Carter is the reference robotics application for indoor navigation we also have a Leonardo reference application that's for manipulation manipulation indoor navigation recognize that these two applications even though way you collect data is fundamentally different and the reason for that is because a self-driving cars degree of freedom is basically this forward left and right and as a result we can collect the data for lane signs lights cars trucks people we can collect all that data and carefully label it unfortunately for robotics the perception has to be completely free formed it is impossible to collect every angle of information to label it accordingly so that the robots can learn perception in 6 degree of freedom that has to be done in simulation as you know NVIDIA is very good at computer graphics and so we created a world where Isaac the Isaac world where robots can learn how to be robots it learns how to recognize and perceive things in six dimension and so the ability to simulate to train as well as simulate to navigate those pipelines are available on our platform and then you could develop the software it's designed so that it's very easy to use and then you could create these robots yourself if you have learning if you have special needs for deep learning we also have pre trained models and again these models could make be made available to you you could use the Nvidia transfer learning tool and you could adapt it for your needs let me show you what our engineers did with this SDK in just a weekend okay so the next video this is how easy it is to create robots now [Music] [Applause] isn't that cool with the Isaac SDK you could create all of these and it's so it's so simple and of course there's nothing easy about robotics but the algorithms the algorithms and the applications and all the technology has been encoded and embedded into the Isaac SDK is open to you you're welcome to use as much or as little as you like and to customize it and create your own networks and hopefully together we could create some magical amazing robotics together let me show you one more thing this next one is really quite a miracle and it's impossible without simulation we know that articulation human articulation is one of the great challenges the ability to teach a robot and to generalize it the important thing is to generalize it so that independent of the environment and independent of how you interact with it it can respond to you and that they could respond quickly and gracefully it can respond safely it can understand and perceive six dimension of freedom six degree of freedom pose it could work in seven degrees opposed and of course it can interact with you meaning that it can replan whatever it was trying to do it could replan in just the most clever way to reach around to interact to back away as it interacts with you we call this robot Leonardo and it's a work of some amazing researchers at the Nvidia robotics research lab up in Seattle and so to to show that to us as its lead architect Nathan Radcliffe he's gonna bring out I hope Leonardo hey Nathan hey guys hey we saved the best for last Wow okay well first of all Nathan you want say hi tell everybody what were you about to see hey this is a fundamental fundamentally collaborative robot it's one of the first robots built from the ground up the system is engineered from the ground up to be fundamentally interactive and that's really because in the future in the very near future a lot of the applications that we want to do will be around people now like 90% of the things that we would like robots to do can't be done currently because robots are so dangerous and so we use perception here real-time perception and real-time reactivity to make sure that they're safe and reactive and perceptive around now Nathan Leonarda has a RGB camera and as a depth camera yeah and it has inside-out perception it also has outside in perception or just inside out perception it basically has as a perception coming from here presumption coming from here this kind of egocentric perception you can see my hand so it's always reacting to what what I'm doing it has perception from the outside so it just kind of has a more broad perspective of what's happening it can recognize each one of these blocks constantly has feedback about where it is are you sure you didn't pre-programmed that come here oh come on it's okay it's okay I know I'm not Nathan I'm not your daddy but I'm nice here I'll give you it no how about this one really really Wow well what else could well I was making what else could you only know how to do it's gonna start pushing these around kind of clearing out the space you gave it a mission I gave you the mission yeah yeah it's gonna start picking these things up and just stacking them as precisely as it can so here it's this kind of like making sure that everything is nicely organized and the reason for that is because Leonardo wants to stack it in that spot underneath and so it's moving some stuff out and so so one of the things that you realize that Leonardo right now has to understand 6-degree pose right so so 6-degree pose not not the car where it's looking at the world through just one angle it's looking at the world through a lot of different angles and and the blocks are all in different orientations in different wicky-wicky wonky shapes and so now it has a mission that it has to now go and stack them together and so our first thing it did was it cleared the space just like we do we cleared a space it's pretty cute so it's taking a look at where that thing is it recognizes what the pose is then it's gonna go for it it's considering it's a little bit frightened by the audience in the bright light please guy the clock is ticking this one here Oh Oh No do it again do it again there we go Wow and so it's interesting so remember it's eyes are very close to his hands and so once it places something it has to take a step back and look at the world again yeah yeah so you see a do that that's actually really interesting and so please sir go ahead go ahead it's like as grabbing as with his mouth and so it comes in there I can't really see anything it picks it up it's not entirely precise but the place is there in it then it pinches and lines it in the future we need some separation between our hands and our eyes right yeah yeah and which is which is kind of the reasons why this separation kind of makes sense too exactly yeah oh wow all right fantastic thanks graduate wait first of all Leonardo had to learn and you can't learn by just physically doing it and so we have to create a virtual reality simulator and you guys know we're quite good at virtual reality simulation and it has to be sufficiently real that Leonardo thinks that it's really learning and it has to be physically accurate meaning it has to obey the laws of physics and so what you're looking at what you're about to see here is a virtual reality simulator of Leonardo learning how to be Leonardo and it's Fitz photo-real so Leonardo can't tell whether it's in the real reality or not and it obeys the laws of physics what you're seeing also runs the entire computing stack it is all completely based on hardware in the loop so the entire computing stack is working right in front of you okay so let's take a look at that Rev this is Leonardo in virtual reality and Leonardo is learning and these are all the different paths that it could have taken and then showing the path that it did take it's just quietly in this lab learning how to be a good robot we could put ops objects in this in this way look at that we put objects in this way it moves away from it can you see that and once we stop getting in its way it could get back to work it is it is running hardware in the loop which means the softer computer that is here running Leonardo and the computer that's running that virtual reality simulator same computer same AI everything is working the perception of the cameras the perception that comes from the z-depth camera the laws of physics the weight of the cubes all exactly the same simulate everything and this is how we're going to create robots in the future and part of the Issac SDK part of the Issac SDK comes with the simulator without simulation it is impossible to create robots in the future they need this virtual reality environment to learn six degrees of pose perception perception for visual perception for touch now of course we would like Leonardo to learn a lot faster right and so definitely yeah see one of the great things about Leonardo is it's got a whole lot of friends and and replicated itself and so it's every single one of them are running on a real Leonardo computer running the entire the SD the the Isaac stack connected to this simulator and it's learning and every so often will checkpoint select the smartest Leonardo replicate all there are there AIS and then start again ladies and gentlemen Leonardo Nathan congratulations thank you Nathan Radcliffe amazing AI researcher this is a great achievement it's a great milestone for robotics so this is this is our stack this is the Isaac SDK the processor Xavier the Xavier robotics processor I happen to have one in my hand in my pocket here we go little tiny computer the Xavier robotics processor the Isaac software stack the rep the the reference applications Leonardo and Carter Carter for navigation Leonardo for manipulation the simulation environments one to simulate navigation we call Isaac sim the other call Isaac Jim so that the robot can learn how to be a good robot all of this connected hardware in the loop so you're looking at running real software today we also like to announce that universities across China are going to adopt Isaac so that we could teach robotics in schools and so that researchers could advance this field and help us discover the future of artificial intelligence the next great adventure it has been an amazing year it has been an amazing year accelerated computing accelerated computing the path forward has made some really fantastic achievements because of our collaboration we discovered three new applications one application is rate racing another applications by G and another application genomics analysis we put r-tx the rate we invented RTS redefined a future of computer graphics and we partnered with 10 cent to put it in the cloud so that 800 million gamers who don't have access to sufficiently powerful computers can now enjoy PC gaming at the level that we should we invented a new way of sharing and collaborating very complicated workflow we call it omniverse portals from applications to be connected into omniverse so that designers from around the world across different workflows could collaborate together we created an application for AEC omniverse AEC is now available for early access we brought CUDA aren't all of the body of work and all of the applications we accelerate including one of the most important today tensorflow now available on arm we made enormous progress in the last year in inference we are accelerating the single most important model in the world on the internet called the recommendation system as the world moves towards deep recommenders we have the ability to power the internet and of course tensor RT 7.0 makes it possible for us now to compile all kinds of neural networks from CN NS to transformers and very importantly our intents which makes it possible for the very first time for us to have interactive real-time conversational AI from end to end we announced our Oram processor the next-generation robotics processor seven times more powerful and includes all kinds of new technology for function functional safety and security and then lastly we announced a brand new SDK we call Isaac to enable the next generation of AI we call robotics and demonstrate to you Carter and Leonardo I want to thank all of you for your collaboration to make accelerated computing amazing and thank you for coming and enjoy a GTC [Applause]
Info
Channel: NVIDIA
Views: 72,159
Rating: 4.4923234 out of 5
Keywords: NVIDIA, artificial intelligence, HPC, accelerated computing, computer graphics, robotics
Id: uPOI4T2SwOo
Channel Id: undefined
Length: 139min 8sec (8348 seconds)
Published: Thu Dec 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.