GTC 2019 Keynote with NVIDIA CEO Jensen Huang

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

I am AI accelerating your discoveries to solve the great challenges of our time I am a visionary bringing characters to life with more natural movement generating brilliant new worlds for them to explore and inventing new ways to bring out the creative genius in us all I am a protector leading the way into the most dangerous environments and searching for signs of life I am a guardian listening for the sounds of destruction to save our forests and using satellites to bring freedom to those who are enslaved I am a navigator finding safer paths for cross-country deliveries and taking a personal travel to new heights I am a scientist exploring oceans of data to understand extreme weather patterns and studying the building blocks of life to save a community from hunger I am a healer giving hope to those who suffer from the most challenging diseases and tapping into the brain to rejuvenate paralyzed limbs I am even the composer of the music you're hearing I am AI brought to life by Nvidia deep learning and brilliant minds everywhere [Music] ladies and gentlemen please welcome nvidia founder and CEO jensen huang [Music] [Applause] so this is what it's like to be Kobe playing in the great stadium hey guys thanks for coming out this way you know we got so packed in the old fort the old stadium that we have to get you guys out here so I really appreciate you making the trip I have so much to tell you guys today so many fun things to tell you I'm gonna get going right away the accelerated computing approach that we pioneered is really taken off if you take a look at what we've achieved last year the momentum is absolutely clear 50% more developers on NVIDIA GPUs 50% more CUDA downloads than a year ago we now have a hundred and forty supercomputers powered by Nvidia a 50% growth the number one fastest supercomputer in the world the summit the number one fastest supercomputer in Europe the Swiss supercomputing Center does paint and the number one fastest in Japan 22 of the most energy-efficient supercomputers are powered by Nvidia we now have 600 applications that are powered by CUDA and high-performance computing 15 of the top 15 applications most popularly used are now powered by CUDA and this year we saw some really important new applications cryo spark the reconstruction of particles from like from cryo-electron be microscopy fun 3d a CFD simulation grow max molecular dynamics microvolt evolution a inverse convolution method to enhance imagery para Brix genomics analysis and Worf to earth worlds most popular most frequently used weather simulator the thing that's really great about a software-defined method and an accelerated computing approach like ours is that we never give up on the applications and together with the developers we continue to advance it using exactly the same GPU the same exactly same infrastructure applications continue to improve in performance if you take a look at the most popular some of the most popular applications from a year ago this year over year we improved the performance from 25 X to almost 40 X we continue to enhance those applications continue to refine them continue to squeeze more performance out of them so that you can continue to be more and more productive or simulate larger and larger simulations accelerated computing is not just about the chips the chips is really important there's no question about that and we certainly build the world's largest chips but accelerate computing is a collaboration it's a co design it's a continuous optimization between the architecture to chips the systems the algorithms and the applications the Nvidia stack looks basically like this and today I'm going to refer back to this repeatedly starting from now we're going to take all of our libraries I'm going to put them together into one body of work one suite if you will one umbrella name called CUDA X CUDA acceleration libraries it's built on top of the architecture that all of you know very well and for many of you the reason why you're here call CUDA CUDA runs on top of all of our GPUs it runs on top of our graphics GPU called RT X our deep learning systems called dgx our hyper scale systems called hgx and even our autonomous machine systems a little embedded systems call a GX it's architectural II compatible with all of them on top of it we build domain-specific acceleration libraries domain-specific acceleration libraries one domain after another whether it's a computer graphics or high performance computing artificial intelligence autonomous driving dr for drive is for Isaac robotics CL for Clara healthcare medical imaging and last one is what metropolis metropolis for smart cities all of these domain-specific applications and acceleration libraries are now contained as part of what we call CUDA X and on top of it is NGC the NVIDIA GPU cloud whenever it's possible for us to containerize those libraries we do so and we put it into the NVIDIA GPU cloud you could download it into any cloud any data center any computer that has been certified and it will just run the entire acceleration stack fully integrated fully optimized and enhance all the time now the reason that we do this is for several several characteristics that we really love about accelerated computing and remember it's not just about the chip it's about the entire stack and so one way to think about that is this the first thing you want to do for accelerated computing is you want to make it programmable I created a whole bunch of acronyms here just to simplify it for you and more importantly that's right I made it simple for you more importantly there's going to be a pop quiz at the end okay there's going to be a pop quiz at the end it's programmable well the reason for that is because most algorithms today want to be software-defined you want your computers to be software-defined because so much innovation is going into computer algorithms today and we want to put the architecture put the computer put the capabilities of that machine in the hands of the software developers making a software-defined makes that computer forever more powerful improving all the time if it's software-defined of it's programmable has great tools then the time to solution time to finding that algorithm will be shortened and of course you will see growth in the number of domains growth in domains if it's programmable you could use it for more and more things the number of algorithms of course are diverse and endless but the fundamental computational methods tend to be very similar and so by using a software-defined acceleration approach we could expand the number of domains that the computers could be used for acceleration gives you time to answers time to answers or size of problems you can increase the size of the problem whether you want to take a very large problem do it as fast as you can or you have a certain amount of time you want to increase the size of the problem you can solve it also gives you the lowest cost of infrastructure if you solve a problem quickly if you have a computer that can achieve the necessary solution as quickly as possible of course you could buy fewer computers you've heard me say the more you buy the more you save the reason for that is this doing things efficiently and doing things quickly it's the most cost effective way of doing something the number of domains it turns out a computer is only successful if there's a whole ecosystem around it and the number of partners that you have that take it to the marketplace is large because the world they use as computers is huge almost every field of science of an industry every company the world uses computing of some kind and we believe that in the future they will all be high performance computing customers they will rely on high performance computing to achieve their mission the cost of deployment is lowest when it has multiple domains and the reason is very very simple a computer maker would be more enthusiastic to take computers out to the world if you conserve finance and healthcare and manufacturing and transportation and retail and insurance the larger the number of domains that a computer serves lower the cost of deployment it is it is the reason why today the x86 is the world's most affordable and the most popular computer because it has so much use the space of applicability is gigantic its domain of use is large programmability gives us the largest possible domain and also as a result reduces the cause of infrastructure again architecture architecture is a funny word in our company architecture means this an application that was written yesterday has to run on a computer tomorrow and a application that we enhance today will run on computers everywhere that has CUDA in it architectural compatibility has such enormous and powerful implications it is the reason why architectures either succeed or don't succeed backwards-compatibility large install base all makes it available to drive the cost of the infrastructure down so the architecture gives us continuous improvement in performance it expands the domains and drives the cost of infrastructure down all of these things create a positive feedback system and that's what we're experiencing today the reason why an architecture grows faster and faster and adoption grows faster and faster all the time it's because of these characteristics while these characteristics as it turns out is incredibly hard to remember and so as in every keynote I try to introduce you to a new word today's word is that's right and I come up with it at the very beginning of every keynote just in time and so this is this is the word of the day Prada at the end of our keynote I will ask you this word again Prada stands for programmable acceleration of multiple domains with one architecture does that make sense Prada can you remember this okay say it out loud one Prada okay thank you very much my job is done have a nice day okay product accelerated computing the approach that we pioneered a dozen years ago is really taken off and it's taken off because we're seeing that CPUs are not scaling as fast as I used to and because the number of workloads that are really really important to the future of computing and future of society has just gone berserk the amount of computation that is necessary for the future is incredible and I want to talk about some of those today but first my talk is in three chapters three fun-filled chapters the first chapter is computer graphics computer graphics the driving force of our company it is the simulation of virtual reality one of the most challenging computation problems in all of the industry after 25 years of pursuing real-time virtual reality we have not achieved it not even close but we're getting closer and closer every single day today I want to show you something that's really great this is by one of our partners and I'll come back and I'll tell you about it in just a second guys invent yourself and then reinvent yourself don't you swim in the same Slough invent yourself and then reinvest yourself and stay out of the clutches of mediocrity invent yourself and then reinvent yourself change your tone handshape so often that they can never categorize you reinvigorate yourself and accept what is good only terms that you have invented me self-taught and reinvent your life because you must it is your life and its history and the present you okay was that real or virtual reality let's take a look all right audience participation okay left is real or right Israel how many people say right this is not real so how many people thinks this is real it's got to be the rest come on okay so everybody thinks this one's real all right next one how many people thinks this one's real getting a little confused how many people thinks this one's real okay so this one's real re this is one of those rock-paper-scissors things all right how many people thinks this one's real okay well it turns out they were on the same side this one's not real this one is r-tx that one is real is that amazing by mimicking the physical properties of light our TX has made it possible for us to perform real-time ray tracing for the very first time it turns out ray tracing could be done in software it is true and the reason for that is because the first implementation of ray tracing was done in software by one of our researchers a long time ago and in fact most films are software ray traced today they're run on a bunch of CPUs they call them render farms there's about a million and a half of those render servers running around the world making movies all around the world and so it stands to reason that ray tracing could be done in software but as in everything else our goal is to accelerate this particular domain of application what we want to do is find a way to accelerate enough of it as quickly as we can so that it could be done in real time chad is going to show you a couple of these things Chad yep what are you gonna show us here so the one thing I'm going to show you which is a little is the new thing is what is not very common in that video solves all those rendered image you would see would have been done offline over you know a few minutes a few hours even days but that video you saw at the beginning that we did with unity all the rendered images were done in real time for honor our TX hardware and so some of those images were a lot of those scenes were rendered about 30 fps and what we're showing you right here now is not a cut from the move from the commercial we kind of put together it's actually unity running real-time Wow and so that's amazing yeah so this would be really useful for like a designer a professional designer who's like I want to see what my materials look like simulated it for real using ray tracing so they could swap materials they can play with the compartment open them up see how the lighting works guys this is all done in real time yeah we can even go to the exterior you'll see some of these shots what we're familiar again you can change the materials make sure you didn't create any distracted reflections for other drivers or anything like that and kind of play around and we can turn some of the our she effects on and off to show you that what's really going on so ray-tracing gives you like a real shadow and then we turn it back on and there you go that's beautiful Chad thank you very much a quick thing real real exciting about it is if we're showing this in unity it looks awesome it's actually gonna be available and an experimental package April 4th so it's coming soon for guys everyone here to play with April 4th are you guys unity great work thanks a lot Danville the you're in the audience you must be very proud of Chad Chad Dibley all right we practically raised him okay so this is the NVIDIA Turing r-tx architecture it is the greatest leap in computer graphics in 15 years since we started the programmable shader revolution if you take a look at the imageries today it's really at the limits of what you could do with programmable shading and we need to kick it up a notch take it to the next level of realism what you saw was a twenty atti the highest end of the Turing architecture 18 billion transistors 32 trillion operations per second 16 billion of it 16 trillion of it floating-point 16 trillion of it an integer the reason for that is for the first time Touring's architecture split the integer and floating-point execution so we can have we can do them concurrently the reason for that is this back in the good old days we would largely use our GPUs for shading but in the future if you want to use it for ray tracing the programs are really complicated and when you do probe when you do a lot of complicated program execution the integer or the address calculation of the program becomes to dominate and so by putting it making it concurrent we could overlap the shading operations as well as the ray tracing operations with concurrent floating float and integer operation we also have the new technology called variable rate shading depending on what part of the screen it is maybe the amount of shading doesn't have to be nearly as precise maybe the texture is not as relative the course maybe it's moving relatively faster you wouldn't notice it anyways using variable rate shading we could reduce the amount of shading necessary we have this new technology called mesh shaders allows us to render really really complicated scenes like like rocks and geometry geometry of mountains and an enormous enormous force of of grass and rocks and just you know stuff stuff in the world stuff in the world to make it make it make it more interesting render stuff it's called the incredibly fast stuff rendering system otherwise known as mesh shaders and it allows us to do things like level of detail adjustments and of course the big ticket item is ray tracing our first GPU to be able to ray tracing and intersection testing in real time and then lastly one of the most important features of this GPU is a tensor core architecture it allows us to do artificial intelligence for generating images creating pixels that otherwise would have had to been rendered or fully calculated and so notice the number of things that you can now do with artificial intelligence with super resolution if we could find a way to create algorithms and neural network models that can process it so fast we could reduce the number of pixels we calculate infer infer the rest of it and as a result achieve very high performance working on this part it's called DL SS we're making progress all the time I'm super excited about its future there's no question in my mind this is going to be a huge success we just have to continuously innovate and create these new models while creating the GPU is the first step inventing the technology is the first step but ultimately what's going to enhance its adoption is the entire ecosystem supporting it and the ecosystem starts with the most pervasive computing platform for 3d in the world Microsoft Windows and Microsoft Windows Direct X with ray tracing called DX R is now available but that's just the ape gie that's the lower level API on top of it are the game engines the two most important game engines in the world unreal and unity represent 90% of the world's games you just saw just now unity with DX r and RT X acceleration as Chad said available april 4th for people to download an experiment with we also mentioned before that unreal is developing their version of game engine it's Unreal Engine 4 point 2 - which will incorporate DX R and then lastly Vulcan RT these represent basically the entire vast majority of the entire computer graphics industry and it is all coming together for a future of ray tracing then we have to create all the products we now have Turing our TX graphics cards from top to bottom from what you saw earlier which is a 20 atti we recently announced a touring GPU for just over $200 $219 the lower end GPUs have no RT X just the Turing shading architecture and then for the higher end GPUs starting with the 2060 whenever it says RT X it has the ray tracing acceleration hardware but the Turing architecture now spans from $219 all the way to the highest performance in the world and this year we have more notebooks than ever the gaming notebook marketplace is growing super fast it grew 50% just at in-video last year year over year and we expect this year to be pretty great as well 40 new laptops based on the Turing architecture has been announced let me show you a couple more demos it is very clear it is very clear that ray tracing is the next step in real-time computer graphics and the next generation of video games has started this next game I want to show you is from a developer called Nexon and it's built on top of Unreal Engine 4 point you too [Music] [Music] [Music] [Music] [Music] really really amazing I think it doesn't matter how the technology works many of you in the audience understand how the technology works it's really a marvel that we're seeing real time ray tracing but what really matters in the final analysis is that things just look so much more beautiful and alive you know everything is just alive somehow because the reflections work properly and the shadows work properly it just looks much more realistic much more alive and as a result developers can tell much better stories it all started about 20 years ago about 20 years ago for the very first time Hardware 3d acceleration was possible because of a game named quake use this opengl for the very first time frankly quake if it wasn't because of quake Nvidia wouldn't be here today every company needs a bit of a killer app and in our case it was the video game industry and the one that really kicked it into high gear was quick and it was so hard to render quake back then it was just so hard to do it and then you need an accelerated to do it and for the very first time we were able to do quake in real time now art our engineers wanted to do something and make a contribution technologically to the community that has since worked on quake it turns out quake has gone through several different iterations and what we're going to show you is quake 2 it's open sourced and and this is going to carry on some of the work that was done by Christoph Christoph she'd one of the interns at Nvidia working at Nvidia research and we took it all the way and what we're going to show you here is an original game an original game now done with state-of-the-art computer graphics manual let's take it let's show it to him all right so what you're looking at over here is a resurrection of the original single core single thread running on the CPU quake 2 game engine and this is one of the destination match maps and at the time our CPUs are so fast now compared to 20 years ago you can just render it's off in software using a CPU go ahead you know of course at the time on a single CPU you just really do not have compute power to the global illumination and instead they used light Maps so all the lighting is baked and Static and this is what you get now what we added on top is all TX so that we can do this no no no no ladies and gentlemen wait for it if this is not the end this is the beginning wait for it all right now go everything is dynamic we can change time of day we can move around all the reflections the materials are physically based go ahead keep going look we'll just keep going go anyone yeah at this point I'm just letting them run because it's just so beautiful so Alex you over here is the lead programmer has added a lot of features so the first thing we did was adding adding high dynamic range because without physical units nothing works obviously the direct lighting the indirect lighting the reflections refractions all based on ray tracing all based on past racing once again the CPU rasterizer over here with baked globe illumination and all T X all right we're gonna take you to one of the actual game levels for game play all right so this is before this is before this is after as you can see we know how physically based material we can get glass which reflects everything around it we also added little fun things like volumetric lighting that you can see here with beautiful light shafts and we have more interesting materials coming up like metal grades over here which starts reflecting all the environment around them no tricks this is actually real we're not faking it and here's not a little Kodak moment there we go beautiful volumetric lining very moody again before-and-after ladies and gentlemen this is the beat this is a definition of beauty for a computer scientist and we have one more little surprise for you something else that we've added a last minute we weren't sure if it was going to work today so I'm going to let them show it and of course this is not quick without the BFG okay thank you Aleksey good job Manuel good job thank you this really incredible work and this is this is a obviously this is a work of love we're going to contribute this back to open source and the engineers are going to finish it off over the next month and keep you know stay tuned and we'll post it as soon as we can but this this is this is this is genuinely state-of-the-art computer science and computer graphics coming together and we're doing it in real time and when when we put it out in open source it's going to be it's going to be a topic of great research and great discussion to advance the future of computer graphics so that's real-time computer graphics offline computer graphics offline computer graphics or rendered using render farm computer graphics so that we could generate photorealistic images is used in every single industry today whether it's an architecture or product design there's a whole community of 3d artists who use 3d graphics and rendering software to create amazing imagery in imageries this is created by somebody it's a fan out there who just loves to create art with this medium arduously meticulously turns their imagination into something that comes alive and of course media and entertainment using it for making movies a modern movie a modern movie might have something along the lines of 2500 3000 shots and each one of those shots are a few seconds long just a few seconds long and those few seconds would take a team of tens and tens of artisan and designers and it would take them literally the time of the film making production budget time to create those few shots whole bunch of studios were coming together and stitch all their shots together and creates what what you guys know what we all know as a wonderful movie well this process is incredibly arduous and we wanted to bring accelerated computing to the rendering process the very first time this has been largely the effort of CP is because the algorithms are so complicated the data size is so large it has taken us literally 25 years to do this and so finally this year we announced our TX and we started working with all the all of the leading film quality design tool companies and firms to accelerate their tools and rendering systems and I'm happy to announce that as of now over 80 percent of the world's leading tools makers and film studios have adopted our TX and by the end of this year we should have all of them in production we have some of it in production now Arnold is ready and we're going to show you some examples of some of the work this is from a studio call imagine engine and this is work that they did for lost in space I can't wait for the second episode second second season let's take a look at it [Music] image engine image image engine lost in space and you could see the work is arduous as meticulous it takes so many people working together to create that one shot it's layered by layered by layered a meticulous process what we ultimately see as the final film what you saw there was really quite amazing so in just a few seconds it represents a shot our early adopters our early developers have now benchmarked it for us this is um take a look up here this is the CPU with the dual Lake server ok dual sky Lakes server 25 nodes takes 38 hours to render a shot of a few seconds a shot of course of a few seconds is 30 frames per second and each one of those frames are meticulously rendered 25 nodes takes 38 hours the power bill is about $70,000 over the course of five years and the total cost of that data center or excuse me of those 25 nodes is about $250,000 we've now done the same work that one shot with one node of what we called an RTX server with for RTX aight thousands it took six hours instead of 38 hours the cost of the server is $30,000 not $250,000 and the amazing thing is this in the power bill that you would save you could buy a server just with the power bill that you save I used to say the more you buy the more you save I think I was wrong our team servers are free [Applause] and you get another $30,000 to spare at the end of five years so it's completely free okay and so I said earlier that doing work efficiently is the most cost effective way of doing anything and of course energy efficiency and getting things done quicker is really a great savings if you take a look at a major film and it costs something like 350 million dollars 300 million dollars to to produce that film and the vast majority of it is post-production which is otherwise known as rendering and it might take something along the lines of a year and a half a year to a year and a half if you could even save one month on what is otherwise a one year long project the amount of money you could possibly save is in the millions and so this is one of the reasons why this industry is such in a hurry to find ways to accelerate the rendering process and to accelerate the production process we're working closely with Pixar and they've they've been sewing then so excited about the technology I can't wait to see the first major motion movie made by Pixar rendered completely on r-tx the process of creating these movies are hard it starts with modeling and of course you got concept you got modeling you've got texture and got rigging which is basically putting the bones into the characters and once you put the bones into the characters that character essentially becomes a robot you can animate that that character you could deform it you could you could you could change the shape of the skin you of course do forward kinematics you could do inverse kinematics and so as a result you can animate these characters you have to light it and then of course you have to render it make it look totally perfect and then once you create the character you have to composite it with a whole bunch of other characters and the scene and all the environments and all the special effects all the special effects are done in physics simulation it is so so complicated and I just told you that a shot may be assigned to you know a few shots may be assigned to a studio a few shots can be assigned to another studio as a result multiple in multiple sites are all working on a movie at the same time there are over 200 animation studios in the world today because of the nature of how we enjoy entertainment because so many cultures are now starting to adopt this form of this media in this form of entertainment and because there's just so many different ways to enjoy content today the number of studios have gone up significantly and the amount of pressure that is put on the studios is increasing as well they would like to drive their cost down they would like to increase their productivity but one of the greatest things is that we want to do is find a way to work together if you take a look at this this map this just represents a few of them there's over 200 of them and there are sites all over the world and yet some of the sites are working on modeling maybe another site is working on the simulations maybe another one's working on animation they're all using different tools some of it is internal some of this third-party but it's got to come together in some kind of a holistic way could you imagine a world where there was no way to share documents because the content is too big because the workflows are too complicated and yet each document requires hundreds and hundreds of human hours to do well we have four Word documents we have Google Docs well for 3d content creation we have nothing we have nothing and so we wanted to create a tool that made it possible for Studios around the world to collaborate from for the workflow the designers of it worked a single workflow which has all these different tools and all these different steps to be able to collaborate we've been working on this for some time and I want to show you this great new technology from our company it's called the omniverse the omniverse basically connects up all of the designers and Studios it works with every tool it communicates we communicate with all the tools through their plugins using USD and using MDL and we energy we only exchange all the things that are dynamic and we put it into this world this portal that everybody can see from and so as a result irrespective of what tool or what part of the workflow you're working on you will see one version of the final content that you're creating in its highest possible fidelity you could be on the laptop maybe your computer doesn't have ray tracing maybe you're using a particular tool it's renderers not quite up up to the point of being able to do ray tracing but all of your working together in one workflow and everything looks beautiful everybody has one common understanding of what everybody is working on at that moment in time it's an open collaboration tool it works with all of the major 3d tool out in the world we're getting great support people are so excited about this let me now show it to you ladies and gentlemen the Nvidia omniverse now we're going to demonstrate this very quickly this is this is a this is our team back there and let's pretend for a second that they're all working in a different place it doesn't matter they're here today but it doesn't matter that they're in a different place one one you know that the substance painter could be done in Montreal the Unreal Engine could be you know could be done here in California and the Autodesk Maya could be done somewhere in England and all these tools all these studios are global as I mentioned and they're working on different parts of the design now notice in the case of Maya we're doing the design in the case of unreal we're doing we're composing the environment in the case of substance we're painting that airplane and these three designers have don't see what the other ones are doing they can't see what the other ones are doing until omniverse these three tools are from three different developers and three different third-party vendors but yet we're gonna have them see each other on the same omniverse or old okay okay so what you're looking at over there is omniverse now check it out while one engineer while one designer is changing the geometry while another one is changing the scene while another one is painting it all shows up in one world can you guys see that it's all showing up in one world the geometry is changing we could change the paint we could create a different environment this is omniverse those are all the independent designers working separately this is rendered with ray tracing it doesn't matter what they have and as a result we have this beautiful world now omniverse could run on your local workstation if it has an RT X in it it could also run in a local workstation and you could share with other people you could stream it to other people in your workgroup you could also put it in the data center or you could put it in the in a cloud and we can render all of that and stream it to you it doesn't matter how you would like to enjoy it omniverse is one shared world what do you guys think okay so working on this this is thank you guys let's come back to the slides the studios we've talked to are so blown away by this this is something they've been looking for a whole long time and finally we have essentially the Google Docs for 3d design except it also works across all the heterogeneous tools and all the data formats are different but when they communicate with us and connect into the omniverse this is what you get to see where we have early access planning and come to this website come to the nvidia website for developers and it's under nvidia omniverse and let us know if you would like to have early access to that rendering is data center graphics omniverse ultimately will be data center graphics another data center graphics is GeForce now we've been working on GeForce now for a few years in fact we've been working on it for about six years and the reason we working we've been working on it is because it turns out that the vast majority the gamers in the world don't have access to our powerful graphics card we have some 200 million GeForce gamers in the world and they're growing but yet there's another billing in PC gamers that don't have the necessary computer or don't have a geforce card to be able to play the games at the level that they should and so we decided several years ago that we would create a cloud gaming system not a service not a store it's not the Netflix of gaming it's an open platform so that all the developers get to keep all their economics all it is is a PC in the cloud think of it as a GeForce PC in the cloud so one of our GeForce PC can do 4 games this can do it's an open platform it should be able to run everything if it runs on a GeForce PC on your desk on your laptop it should be able to run it in the cloud it is virtualized so that we could share it across a whole hold whole data center and to be as efficient as possible but yet the entire stack has to be exactly the same as the stack that we have on PCs because want the software to just run no porting no onboarding it just works and it would be streamed to PCs the other billion pcs that we can't reach and that's what GeForce now is about there are five hundred plus games in there today basically the games you love and enjoy on PC if it's on Steam or the epic store or you play you know all these different game stores if you've bought it if you bought the game and you get on top of GeForce now you should be able to stream it to a Mac a Chromebook a low-end PC a laptop a leftover PC from years ago you'll be able to stream basically very high quality games today GeForce now has over 500 games we have 15 data centers about 300,000 users players are using it all the time now we've been in beta for some time and there's a million gamers on the waiting list a million gamers on the waiting list and we're trying not to tell too many people about it because the waiting list is actually fairly large and so we're sitting here trying to improve the quality of our service improve its reliability it's a very complicated problem we want to make sure that latency is as short as possible never any hitches the sound quality is excellent the interactivity with the mouse is great and what we've discovered is this you could build the world's largest data center won't make any difference you've got to build a whole bunch of data centers all over the world and it turns out that the number of gamers around the world is quite large and they're spread out all over the world and so what we need to do is we need to come up with a system that allows us to go well beyond these 515 data centers to support these 300,000 gamers but to find a way to reach all of the edges of the world through partnerships and so we created this idea called the GeForce now Alliance the GeForce now basically is us creating a server architecture developing all the necessary software and partnering with telcos around the world who can't wait to put the service on their 5g networks the 5g networks are there Wi-Fi networks or their broadband networks they can't wait to put additional services on top of it and so these alliances are all over the world they could be in countries all over the place today we're announcing two big ones we're announcing today that the first two alliance partners for GeForce Alliance is Softbank in Japan and LG U+ in Korea two of the major gaming markets I want to thank you for your support our strategy is to put these alliances set them up in countries all over the world and they would have data centers multiple data centers with GeForce now servers and we will host a service and maintain the service for everybody and the first step therefore is to create the server we call it the RT X server does RT X server is super dense it's super dense 40 GPUs and 8u 40 of our state-of-the-art GPUs and eight you you can virtualize it and as a result of virtualizing it you can share it with 3300 20 different concurrent users it's optimized end to end so you could use it for the entire cloud gaming GeForce now service you could use it for rendering you could use it for omniverse but that alone is not enough and turns out the reason for that is because storage and networking is a really complicated problem once you get the data on the server we can stream it and we can compute and we understand that problem super well but it turns out the entire infrastructure with hundreds of thousands of gamers all playing at the same time essentially running the supercomputer in real time the interconnects all become bottlenecks everything becomes an issue and so we decided to create a pod we create a pod that can scale up to 32 of these r-tx servers and within 10 racks we can support 12 hundred and 80 GPUs it's all connected with high-speed Mellanox InfiniBand and we can therefore support 10,000 concurrent gamers at the same time and this is what it looks like this is one pod this is one pod and our alliance partners would install these pods and basically within a week they should be up and running and the service is ready to go and whether it's all the different countries in Southeast Asia or Eastern Europe where a big part of the market market for gamers are emerging we could set these things up and work with the telcos as an alliance partner and bring GeForce now service to all of them as quickly as possible these servers can now support three basic use cases the first chapter ends here and it's about the fact that graphics is both number one going to go to the next level it's going to also go into the data center and data center of graphics requires a brand new architecture in this architecture already supports three use cases and we have others to come that I'll share with you in the future our engineers have been working on something really cool and so let me play it for you now ladies and gentlemen everything you see is in real-time [Music] [Music] [Music] [Applause] [Music] [Music] [Music] [Music] [Applause] [Music] what do you guys think let's another round of applause for the engineers and a creative artists Nvidia very few technology companies get to sit at the intersection of art and science and it's such a great pleasure to be here and to be able to focus on doing work like this and to bring joy to so many people it's such a creative process and it just blows my mind every single year when they keep raising it raising it to another level Nvidia is the ILM of computer graphics real-time computer graphics and you can really see it here chapter 2 chapter 2 is about AI and HPC as I mentioned today we're going to talk in three chapters the first chapter is computer graphics now we're going to switch gears and talk about AI and HPC if you look at AI and HPC we've been talking about deep learning for some time and deep learning is a part is a new algorithm new type of algorithm and great breakthrough as part of machine learning which is used which is used for natural language understanding and computer vision which largely built up AI now underneath AI there's a whole bunch of other stuff like robotics and such but in the areas that we've been talking about this is a taxonomy if you will data science is the fastest growing field of computer science today it is the most sought-after job it is the most oversubscribed course in leading universities where there's Berkeley or Stanford or C CMU or NYU or it is so oversubscribed and the reason for that is this it has become because of several different factors and has become the fourth pillar of the scientific method theoretical of course the eine to einstein method the thought experiment the experimental method Newtonian physics through experimentation simulation methods today's scientists using it for molecular dynamics and such and now we have a data-driven data science method and it's made possible because of three factors an enormous amount of data that we can now collect sensors everywhere digital everything phones everywhere cameras everywhere we can now have great sensors collect a great amount of information customers clicking on your websites customers clicking on your apps all of that creates data the second is breakthroughs and machine learning algorithms whether it's deep learning or machine learning and the approaches that has emerged recently and then lastly computation and we've been we've been fortunate I've been a big part of advancing and accelerating the computation that makes all of this possible these three factors continuously feed on each other and then now data science is a pillar of the scientific methods that allows us to solve problems that were previously impossible we're solving problems that were just previously impossible now when we talk about data science when we talk about deep learning we talk about machine learning in AI I wanted today to frame it for you the pipeline to workflow starts with data and there's a stage of the first ingest ingesting of data to bringing up data called data analytics then it goes and from data analytics your goal is your goal is to take data and join it from a whole bunch of data lakes create what is called a data frame from that data frame tabular data the rows the rows or all the all of the instances if you will and all of the columns are the features all the columns are its features the features could be where you live what your preferences are for movies the cont connects with all kinds of things okay and so all of the features that you might want to learn from that's called feature engineering you take all of this data in you do all of this munching and wrangling and what comes out of it this gigantic table and that gigantic table could be anywhere from gigabytes large two terabytes large could you imagine a spreadsheet that's terabytes large first of all it wouldn't loan on a normal computer not to mention what would you do with it at all and so you do that data wrangling you do all of that processing and analytics what comes out of it is engineered features with a large amount of data that is now in a data set that you can now use for predictive analytics to learn from that data set and that is what we call AI some people call it AI some people call the machine learning some people call it deep learning depending on the field of science that you're in if you're an image recognition of course you would just use camera inputs if you're trying to understand something with related to disease the area of the data source could come from a whole lot of places including genomics could be medical imaging it could even your family history from that you run it through frameworks and those frameworks ultimately gives you a model that predictive model that predictive model once verified allows you to now predict the next outcome from other input so the new input comes you can now predict the future we call that prediction process inference so data analytics machine learning or AI or deep learning and then inference okay there's some words that we use and for some of the people in the audience who are kind of new to data science this is such an important field I think it's worthwhile for everybody to understand it and the words that you hear the words that you hear are things like this CSV comma separated values parque Park is a file format that came out of Parque as a column or database notice the features that are in columns is a much faster way of reading it Hadoop distributed file system it is the beginning of Big Data the ability to distribute to use a large data center distributed files all over it and use it for large compute engine ETL extract transform and load basically the data analytics process pandas a data analytics program set of libraries basically the spreadsheet of big data or the spreadsheet of data analytics and machine learning SPARC for a large data center graph analytics you hear tensorflow Pytor Chamonix and makes that psychic learn actually boost these are all frameworks for machine learning or deep learning and then on the inference side you have tensorflow serving you have onyx and you have sage maker neo these are some of the most important the most popular large scale out inference service and and backends and so these are the words that are going to come up over and over again and I just wanted to place it on a graph for you so you have a feeling for how all the stuff works now our words the words that come that I use that talks about these areas who I owe it is about vectorizing and massively paralyzing the input the loading of the i/o really complicated problem and you take it from desk you put it into system memory and it is now in memory in the right formats and the format is in the format of Apache arrow ku DF is a data frame it is largely compatible with pandas except it's gpu-accelerated ku graph graph analytics ku DNN basically invidious deep learning API that got this whole revolution going Kuh ml which is our family of machine learning algorithms from random forests and decision trees and you know k-nearest and k-means and regression algorithms tensor RT our optimizing compiler that takes the output of this which is a model and compile it down to a target where there's a little tiny computer or a giant supercomputer or a cloud computer and our inference server that runs on top of kubernetes called tensor RT inference server allows us to now run it in a hyper scale data center and make it possible for millions of people to be doing inferencing at the same time that's exhausting it was exhausting to say it must be exhausting to listen to all right if that wasn't hard enough as it is this is what it looks like when you look at it as an ecosystem it turns out building a chip building a great chip is a nice beginning but it turns out it's useless until the world's developers and users could take it and that's why it's so important for us to work with the ecosystem NVIDIA is an open platform company we create all these libraries in a way so that it's software-defined and integratable into modern systems it runs on three computing platforms personal workstations servers and cloud this is the way computing is consumed today if you're not in all of that you're know where the next is the system software stack on top CUDA and this subset of CUDA today and for this chapter ai and HPC we call CUDA X AI it has data analytics has graph analytics machine learning deep learning for training and deep learning for inference and it has to go out into services to the world to the developers in frameworks in cloud services and then to be deployed in cloud platforms and this is the ecosystem at large today it turned out we have a ton of announcements and I just wanted to put all of the announcements into one slide so the first announcement is that we now automatically automatically do the mixed precision necessary for our tensor core architecture which is invidious new AI architecture for GPUs our tensor court it's now optimized automatically for support and tensor flow in pi torch and MX net it turns out that precision is a very complicated problem their range issues their precision issues sometimes you could do it in FP 16 sometimes you can't sometimes you could do it an innate sometimes you can't you've just got to find out when you can and when you can't well it turns out that compiling problem is a very complicated problem especially since you have to wait a week and the answer comes back and off by a little tiny bit these convergence problems are just historically challenging for high-performance computing well tensor quote tensor tensor core is now optimized automatically for these frameworks and you get the x factor speed ups automatically you do nothing number one announcement number two announcement is data bricks started by Matteo Hari Marte I love you all right so Matt a who was the founder a spark when he was back at Berkeley started a company called data bricks and they they have a machine learning and AI platform that is up in the cloud is really popular super successful company they've have now incorporated and accelerated Rapids which is the data analytics and the machine learning part of our framework of our API into their platform Google Cloud now has rapid acceleration Microsoft today announced that they have rapid acceleration for Azure machine learning and the next ceinture announced that we're partnering together they're one of the world's leading professional services company and Mike Sutcliffe was so visionary to lean into data analytics and AI years ago they've they're just doing fantastically and it's great to partner with them because we now have Rapids and our CUDA X API and platform integrated into their services and they're seeing great results from that they serve three quarters of the world's top 500 fortune companies top 500 companies and so it's really great to partner with Accenture also announcing today that onyx runtime is integrated natively with NVIDIA tensor RT so all of that today also that you can accelerate all of these frameworks and all these AI production systems and development systems so that your productivity could be as great as possible so I want to thank all of the engineers and all the partners for making this possible thank you the question is what happens to all this stuff you know we look at we look at these things and and they're cnn's and the industry is benchmarking our and 50 and it's a little bit like benchmarking quake and it's it's it's now at 974 frames per second and we're benchmarking our and 50 and we're we're focused on cnn's and and yet yet yet all of this all of this is about ultimately creating services that are incredible and do incredible things one of the areas where in the end we're trying to lean we're trying to lead to its this idea of a AI conversational agent that when you pick up your phone or you're talking to a speaker a microphone somehow that AI service is super smart it's super smart it somehow figures out how to you when you're talking to it it's got to do the necessary noise processing speech recognition it has to understand your language and convert it to the language it wants to do processing in it has to make a recommendation maybe it has to do a search maybe it has to answer a question and that question could come back with another question that answer could come back with another question which is you could maybe upload an image and now it has to upload an image it has to recognize the image it has to come back maybe with an answer maybe with another question in this conversational back and forth with an AI agent has to be fast and it has to be really natural if you take a look at the number of neural networks that's involved in the future it's actually quite large people think there's one AI somehow running in the cloud is just not true it's like any other software program these are all models that are sitting in containers and they're running all over the data center and at the end of each container you're sending an output through a REST API to another container running a kubernetes or something like that on another server and these containers services are sending inputs and outputs to each other and it's just one program one service and it connects all of these different models it is the reason why people say that East and West data traffic in a data center is exploding and the reason for that is because machines are talking to machines a lot more than we're talking to the machines all of these neural networks are talking to each other across containers across servers does that make sense this is the future and these models are going to get chained together and for conversational type models it is very likely that latency will ultimately define the quality of service and therefore we have to process it incredibly fast here's the challenge it's not one network there are species of networks there are different network types and they're different species of them and every data centers got a different one and the reason for that is because they've architected all differently and some of them it eight works some of them in eight work sometimes an F P 16s got to work the rest of the time F P 16 always works for images and sometimes you have to punt all the way back to F P 32 sometimes the whole networking runs on F P 32 because that's all the time you've got to optimize it sometimes after optimization there's only so much precision you're afford to lose and so you've got a mixed precision and sometimes it's good enough you might even be able to use four bit integer sometimes so this data center is heterogeneous in the number of the types of models it runs and it has to be heterogeneous in the type of precision that it runs and it runs across a whole bunch of servers that are connected together this is the future the near future the near future of AI in the cloud there's somebody that I've invited who's going to show you something that's really really really cool Gohan he's a good friend thank you very much for coming here show us something amazing cool so what I'd like to show you today is the Bing app it's running on my phone right here and for this demo I'm just gonna be a regular Bing user I'm just gonna use this app like a regular user would and I've been thinking about redecorating a couple of rooms in my house so I want to kind of change the look and feel a little bit I want to kind of make them in freshen them up a little bit and I've learned that one way in which I can do this is by changing the lighting in the room but as a user I don't know much about lighting so I'm gonna use the app to help me out here so the app has three icons three buttons in the middle there's a camera button on the left search by image there's a text button in the middle and I'm gonna use the microphone button here on the right and I'm just gonna ask it about lighting in different rooms in the house so the first query goes like this that's right what are the different types of lighting for the kitchen according to steal and home morgue as far as general lighting in a kitchen there are three main types surface recessed and pendent fixtures no color three things that's amazing what's what just happens so of course of Corrigan you're gonna explain it more deeply but for the audience very quickly what just happened is this your service recognized your speech it understood figured out what you were trying to say it went to find the information it then presented back to you in two ways one way is to say it back to you so it has to do speech synthesis and then the other way it had to compose this page that's right to present the information to you and so look what just happened okay speech n what came out was visual and verbal all right several different models all happen not to mention the search algorithm behind it all right go that's exactly right and so I just wanted to highlight exactly the components you said but this card up here up top we call it the intelligent answers and that goes through the table documents that came back for that search query reasons over them real time in response to the query and it finds the best passage from all those documents all those sentences that answers the question that was asked and within that passage it highlights in bold the specific phrase that we think is the answer right so that worked for the kitchen as a user now I want to learn about other rooms maybe I'm looking at a living room so let me try asking about a living room instead what are the different types of lighting for the living room according to this Bruce calm living rooms require three types of lighting ambient tasks and accent and COO hon I gotta tell you I just realized something as you asked I'm missing one of those three Lighting's in my house that's why my eyes going bad honey our task lighting is in trouble yep ambient light and the accent lights working great well I actually have the opposite problem so I'll find out more about accent lights in just a second but the thing I wanted to highlight here is it's a very consistent experience again for this query that was speech in the intelligent answers card gives me the answer and the speech readout the text-to-speech service is reading out the speech but the other thing that I wanted to call out is there's a little embedded image in the answer and as a user I'm visually drawn to that I want to learn more about images and I actually don't have accent lights in this room so I'm gonna learn more about accent lights show me images of accent lighting for the living room images for you so here you see it has a whole bunch of images of living rooms with accent lights but I like one of these I'm gonna pick the first one and I can actually look at it in more detail and up top there there's a button that we call the visual search button and when I click on it not only can I see the whole image but I can actually select different parts of the image and find some portions of the image that maybe are more interesting so in this case well that's cool I'm just gonna Center in on maybe the lamp that I'm looking at and not only as I do that does it find images that are similar to the lamp I've selected but now there's this shopping tab and as a user I can click on it and I can buy lamps that happen to look similar to the area that I selected in this random picture from the Internet that's pretty awesome okay and and that is object detection that's also happening in real time as I change the selection this is running in real time and to Jensen's earlier point the latency is where these things are super critical if this takes 3 seconds to refresh no users gonna use it right so our latency budgets for these models are super tight and that is kind of a very key aspect of the experience so Jensen what I hope I showed you is an experience here that sort of brings together four very different types of models so there was speech recognition there was the intelligent answers there was the text to speech read out and there's object detection right but to me as the user it was just one immersive fluid experience as a user I didn't have to worry about all the AI that happened in the back end and that's to me the best kind of AI it just works but of course in this room we're all technologists we know how hard it is to get these things to work seamlessly and for the big app which by the way you can try it's available on iOS and Android devices today for the wing gap we relied and we use the azure n series Williams which lie on NVIDIA GPUs that's awesome thank you ladies and gentlemen Gohan Seiran argh Jana I like saying your name alright that is that is an example of extreme inference extreme inferencing that pipeline is complicated it went through a lot of different models and imagine hundreds of millions of people doing it this way well we're working with the industry at large to accelerate inference and to be able to deploy services tensor RT has been downloaded so many times now 300,000 times six times growth in just one year and some of your some of the best services some of the most popular services that you guys know of and a whole bunch of them that I didn't list here are now using and being accelerated by video GPUs as a result of tensor RT voice search and image search and recommendation systems and assistance and news feeds and translation and e-commerce all of these things have modules of AI models in it and their accelerated by nvidia gpus one of my favorite uses of inference is medical imaging there are a hundred thousand radiologists in here just the United States alone we are also we also have the best trained radiologists in the world they used to be able to study us they used to be able to look at a study and study it for twenty minutes now they barely have four minutes to study the same thing the pressure on them is incredible it is the largest operation in the hospital and yet early detection is the best way the best way to stop something before it gets worse and so the question is how do we apply deep learning to enable all of these radiologists and augment them so that when they're doing their work there's an assistant sitting next to them helping them along maybe at the end of the day they're doing QA and meanwhile meanwhile recognizing that there is no way that one institution one group can possibly train all of the neural networks for all of these rare diseases many of them happen very infrequently and so many of them have experts and specialists in the field well so what we decided to do is this we decided instead of being the one company to solve it all we would help them create tools and put it in the hands of radiologists all over the world give them a AI tools and then give them an AI infrastructure make it super easy for them to share work among themselves but we've got to give them a starting point so we started with pre trained models we have 13 pre train models up in NGC they're incredibly well done we worked with radiologists in research hospitals and we dedicated ourselves make amazing pre train models the second thing is we have a tool that basically is an AI that's going to create a eyes and it helps you with the assistance of essentially annotating doing the laborious work of annotation and then we can do transfer learning we fine-tune these models with the data that you provide so you download a pre trained model you annotate your data you then fine-tune that and then we compile that into a model that you can use and then we make it possible for you to deploy it easily the entire framework I described earlier this entire process end-to-end is easy-peasy and it's being used all over easy-peasy that's very technical it's being used all over the world now we're so happy that MGH NIH OSU some of the leading D kfz some of the world's leading research hospitals are now using it for assistance of annotation or the deployment of the model or integrating it into their own tools all of this is open sourced and we'll make it available for researchers all over the world so we call this the clara AI toolkit and you can come and download it alright Dave thank you data science is the new HPC Challenge HPC using computers very large computers to solve very difficult problems because the data is so large because the computation necessary is so great this is the new HPC challenge and just to show you one of the examples and there's so many examples during the talks at GTC and then Walmart's gonna give a great talk I can't imagine the world's largest company surely has the world's largest data and they are doing they have millions of SKUs millions of SKUs simultaneously that they're managing and forecasting and they have stores all over the place and they have to find the exact right level of inventory for every single SKU to arrive at exactly the right store for exactly the right time because during every seasons a little bit differently maybe there's a run on something special like bananas you know different types of products have different different seasons and demands and they have to monitor and they have to ingest all of that data and use it to train a machine learning model so that they can protect the inventory faster well because of our platform they could do it now practically in real time you should be able to you should go listen to their talk there's so many other talks here related to using invidious machine learning platform to accelerate data analytics and data science well we have we have a we have somebody here Aaron Williams Aaron are you here I can't see it right here Jeff all right Aaron welcome to GTC Thank You Aaron has been working with charter one of the world's largest cable operators to apply data science to solve their problem and so why don't why don't we change to let's go to your demo let's take a look at it yeah sure this is a really exciting real-world example of putting AI to work for data scientists as Jensen was saying we're working with the network operator that has 25 million subscribers connecting to half a 500 million sorry five hundred thousand LTE towers and Wi-Fi networks those connections are the the lifeblood of this business and that data is super important so what we're gonna show is how we can put AI to work to help the network operator make smarter decisions using these predictive algorithms now the decision just to go to the end and work backwards charm the decision they're trying to do is they're trying to figure out where to place Wi-Fi access points to offload from the cellular traffic so that they could provide the best possible quality of service and then of course and to increase the capacity for their company but you don't want to place it randomly everywhere they're adding ten thousand of these Wi-Fi access points per month that is a big decision they got to decide where to put them and they can save a lot of time and money if they put them in the right place and so what we're gonna do is help them predict where to put those Wi-Fi access points okay that's awesome and so the first step of course is we have to take the data in we have to do basically ETL extract transform and load we have to do basically data analytics and this is a lot of data Janson it's a terabyte of data per day that they've got coming in now when you're talking about data scales that big well usually you know what people do when they have a terabyte of data that comes in every single day ignore it that's right that's right yeah yeah that's right because tomorrow it's gonna come back again that's right and you ignore that one too because they'll come back next week that's right that's the beautiful thing about recurring data all right but obviously obviously they would like to use this data yeah this care might have a their business yeah this terabyte of data is not also a really clean set of data of course it's coming in from multiple different data sources they've made some acquisitions over the years that means they've got goldmines of data that they all need to combine into a single now the industry calls it give it a great name it's called a data Lake yeah right well the data Lakes got all these different formats is all messy data and and some people call it dark data because you're never gonna look at it again yeah to your point what they were doing before was spending eight days to be able to transform that data into a usable set of data and this was all done manually we had data scientists dozens of data scientists spending their time transforming this data to make it user so let me see if I understand you got a terabyte of data coming in every day it takes you eight days to process one terabyte of data yeah yeah to make it usable uh-huh so guess what I pretty soon you're gonna be infinitely behind guess how many days of data we're not getting watched right so here's what we've got on the left-hand side here you can see these diff this sort of incoming data this is the complexity of the data that they had coming into the system dozens of different sources all kinds of different formats we're gonna use a product called data log a really awesome product for being able to transform that data into usable data the first step in that process is by the way data log is really fantastic they take it so this is Tim's company I think Tim is watching Tim good job go ahead Erin yeah they are great so what we're gonna do first is define an ontology this is basically just a knowledge graph that helps us transform what the data is that we have to the data is that we want and so we've defined here the different data sets that we want each of these data sets then we can use AI to be able to transform the data we have into this data set that we want yeah and now what's really amazing what Erin just said is this just imagine so you have you have basically data let's say you have data sets coming from 10 different places it's the same data it's about the same thing but the format's all different the names that they use the labels that they use the columns they use the features what they just the names are used for the features are different but just a little different just a little different you know maybe your name spelt just a little differently if you and I were to look at the database we would be able to tell that this column and this row is the same thing but to a computer program they're basically exactly different now what we need to do is we need to have these AIS that are basically looking onto all of this data as we're taking it in and realize that you know what you know what that J and Sen and J and - Hsu is the same person it's the same person it's okay yeah all in there let's let's go look at actually what it looks like when we train these models so this is an example of a model that we've trained for being able to predict these values you can see that the the different values that we're coming in on the left hand side the values that we were predicting on the on the up there you can see how successful this model is at being able to determine the different kinds of data coming in with the different data that we need in our ontology you are off by two yeah so this is interesting though right because sometimes you will you'll confuse an IP address for a different kind of data it sometimes you know these things are they look the same sometimes the good news is we've trained this model so well that it happened it's 99.9% accurate which is much better than investing the eight days worth of data science this time to be able to produce those same results now it takes it would take days to do this by hand that's right we want to automate to do this by hand right yeah not to mention that just a computation of loading it and reformatting and then joining in and grouping by this and doing the sequel processing that you guys are familiar with doing the sequel processing on it it takes so much energy to do it eventually you have this thing called a data frame that data frame then comes in to another amazing product this amazing product is from a company called Omni size isn't it right I've heard of that guy yeah yeah yeah yeah you and Todd started and and Todd is hard out in the audience you know Todd Todd good job man good job so so todd todd realized the importance of GPU accelerated day signs quite frankly almost before everybody practically not just momentarily before me okay and then he then he came and told me that I realized hold on a second this is a big this is a high-performance computing problem and so so we've been working together for some time I'm the size you guys are doing amazing work now talk to you now here's the data frame yeah so so now we've got all this clean data right now we want to actually put it to work so we're gonna bring it into Omni so I am nice I is a sequel database that is built to run on GPUs and it also has a visualization engine let's cut to the dashboard so we can see what that looks like now this is 500 million rows of that network access data that we were looking at before this covers two hundred and fifteen thousand access you got to let that moment sink in you guys know what a spreadsheet with 500 million rows look like I know this is 200 and I've ever seen one that big 215,000 access points across the entire US on the left-hand side there you can see this really nice heat map showing where those access points are and where they're being used the most let's let's zoom in to Ohio actually just because you can see how easy it is to zoom in and get a much clearer picture of a specific part of your data so now look how we've zoomed in now we're seeing the detail of what's happening in the state of Ohio I'm gonna go use the the time chart here in the middle to zoom in not just by location but now by time as well see how when we changed to the the the chart at the bottom there now we're seeing hourly results at the top and as we scrub over to the right we're seeing all of the charts changing in real time to show us what happens at different time frames in that data set now Jennsen this is the cool part because what's actually happening behind the scenes here we're running hundreds of sequel queries against that database there is no way you could do this on CPUs it's only possible because we've architected to run on it was like yesterday morning that sequel was a batch job that's right those guys are now watching this in real time and it is this craziness this is no indexing no aggregation this is 500 million rows of are you guys getting this sequel used to be something you would run with Hadoop and it would be a batch job that's right that's right you know now you're doing this interactively and you're visualizing it in real time it's incredible now this is just the first step though Jennsen because now we've now we're actually seeing the historical data let's get to the predictions let's get in our time machine and start to see what's happening in the future we're gonna do that using a jupiter notebook so we'll switch over to the jupiter notebook now and this is a interface that all data scientists are familiar with it's Python we're gonna actually in the first step there on the Left we're gonna actually get a pointer to that data sitting in the Omni sigh memory in GPU memory we're gonna use that pointer to do the feature engineering that you talked about before and on the right hand side we're gonna run our XG boost to do the training to do our machine learning very familiar toolkit here of tools that pointer is exactly a panda's interface which is what you're looking for if you're a data scientist right so these tools give us those predictions once we get those predictions we're gonna push them back into Omni size so that we can make a complete visualization of what that looks like did you guys just see this sir first step is data analytics and within data analytics you could use on the side to visualize it just take a look at the data so you can get a feeling for it then once you get once you do that you've done the feature engineering you take that into a that data frame you put that into XG boost that XG boosts your machine learning algorithm that machine learning algorithm is going to take the previously collected information and all of the features and it's going to learn a model that predicts the future from those features okay predict the future from those features it's gonna come up with a new model and let's not forget about rapids Rapids is an important piece of all this that ties it all together right so we're using that here with these data frames that you talked about to be able to do this kind of prediction okay we push the data back in Dom nice I let's go back to that dashboard now so we can see what that looks like you'll see on the time chart now we have this dotted line that goes into the future so as we scrub through again we see that same interactive experience of scrubbing through the data and now we're seeing predicted results for where we think the usage is going to be across the state of Ohio and you can see the same sort of interface the same interactivity that you get now using these predicted results coming from Rapids using ku DF and guys you've just you just saw in just a really quick moment here predictive analytics in practice from the beginning to the end and and if we could accelerate everything and and what what what Aaron was saying was it's called Rapids Rapids is the machine learning framework the data analytics that data science framework that we open sourced underneath it there's a whole bunch of engines those engines are basically CUDA X and Rapids is now open sourced and the important thing is is you see you see down here this is where I grew up Anita Kentucky there's no traffic there at all I think here somebody type in Anita Kentucky live it's just Anita Kentucky let's see [Laughter] just one access point come on give us one access point that's it that's where I grew up one intersection and not kidding not kidding all right so everybody still baffled they're gonna right that can't be right uh that is right so so that was my that's why I came to the United States now and so so so here we saw the entire the entire platform what what how long did it take before and how long does it take now right so we were talking about eight days before just to up to a point where we had some data we could do something interesting with and then it was taking us hours to do queries after that right now we're talking about four minutes to be able to get data the way we want it that's near real-time and then what you're seeing here is absolutely interactive engagement with that data so it's just a completely different paradigm of what's possible with this data so here we go data scientists the most sought-after professional in the world today they're sought-after in every single industry and every single company and then once they get them they make them sit eight days as they work on data that is eight days old seven days old okay and so so so that makes no sense what we want to do is want to accelerate their work given the instrument of their science give them the instrument the tool of their science so that they could do their life's work and to do their life's work as quickly as possible ladies and gentlemen Aaron Williams thanks Todd you guys got something good going okay so what we need to do is we need to build we need to build high performance computers for this whole new area called data science you know design design automation computer-aided design styling median entertainment climate simulation energy discovery molecular dynamics they all have high performance computers they all have workstations that the engineers and scientists and researchers work on well there's a whole field now it's called data scientists there are three million of them around the world three million of them around the world as I mentioned it is oversubscribed in every single class we taught a hundred thousand data scientists ourselves last year our program is called deep learning institute we taught a hundred thousand ourselves their classes all over the web this is an area that is hot and brimming with excitement and the reason for that is because the three things that I mentioned that came together the availability data the machine learning algorithms and high performance computing has made it possible for us now to use this as the fourth pillar of scientific discovery we think there needs to be a new type of computer built and so we decided that the work station has to be re-engineered a new type of work station with very very fast storage very very fast i/o really fast computation with very fast memories this type of architecture a work station for data scientists is really complicated to build in fact just us building it installing and building the software and tuning the whole computer to deliver the performance is not easy we're basically taking what otherwise is a high performance computing data data center IT team and shrink it into one box and make it possible for us to ship these like appliances all over the world we came to this idea because some of the world's leading researchers were building it themselves and some of the companies at the forefront of AI were building these machines themselves and they were building it right in front of us and they asked us you know how do we make this easier it came to us because customers were asking for it and so we decided to build the workstation for data scientists and this is the performance look at this this box if you were to run it the old way it would be the blue bar and you run it with one RTX 8000 or to RTX 8000 and now you basically have a performance similar to what Aaron was showing you the the industry is so excited about this because the demand is just right there all of the world's leading computer makers have joined us in this endeavor and today we're announcing a brand new family of workstations we call it a data science workstation and it's going to be available from the world's leading computer makers Dell HP Lenovo and all of our partners I want to thank you for joining us this is a good way to get going for one data scientist however the jobs many of them many of the jobs are so big it's impossible to fit on one computer and that's why we say that data science is the new HPC it's the new HPC now before I go into HPC it's let me give you a text sauna me also hi supercomputers and high-performance computers are created this chart this is by the way this is jhh mathematics inside and video that's called CEO math it is not accurate but it's right yeah are you guys following me it's not accurate because you're gonna if you nitpick it it's gonna be off but if you study it you go darn it I think he's dead on ok all right goes like this basically high-performance computers are built in for two fundamental applications on the one hand you want to build something with capacity okay with capacity this is this site the other way this vertical side capacity capability capability is building a computer you have a you a computer the largest possible computer so that you can run a simulation as fast as possible or the largest possible simulation you can imagine a capability machine that is designed in a way that we call supercomputers for very few jobs very very very large ones and you want to get it done as quickly as possible or you want to do the biggest one you can the second is called a capacity machine this capacity machine otherwise known as hyper scale uses efficient cost Computers efficient cost Computers millions of them and what you want to do is you want to serve millions and millions and hundreds of the millions and ideally billions of people with small jobs with many small jobs okay so the architecture that you do you create for hyper scale in the architecture you create for super computers are not the same are not the same this is called scale out this is called scale up this has maybe a million nodes at a data center this has tens of thousands supercomputers what I show here is the compute it is the computation load not not the flops but the flops as in instances not time okay the number of instances of compute is a billion ped flops for example a billion petaflop sware the s is units not time and this is hundreds of millions of people concurrently data centers of two types both are high-performance computing data centers this is where supercomputers go capability machines scale-up architecture this is where hyper scale goes capacity machines what people is call scale art architecture are we okay all right here's where data science goes data science goes right here these are tough problems to solve engineers love problems here engineers love problems here these ones are hard because it's not quite this or that is not quite this or that and the reason why it's not quite this or that is because in the case of data science notice the amount of data is gigantic and if you want to train a network or you want to do data analytics on it you do computation for days those are characteristics of a supercomputer a few people using a large cluster for days otherwise known as a supercomputer on the other hand on the other hand data scientists is in the millions we don't have millions of weather simulation experts so we don't have millions of molecular dynamics scientists we have millions of data scientists and so all of a sudden the concurrency of data science is both large and the computation requirement is also large of course not everybody is the same way and so we need to have multiple architectures to solve this this is where dgx goes we created D GX which basically takes a supercomputer and we turned it into an appliance we integrated everything lots of GPUs it's the scale-up architecture 16 Voltas 16 v-102 petaflop sat computing half a terabyte of high-speed memory essentially 16 terabytes per second of aggregated bandwidth and we have eight Mellanox infinite bands on here to get the fastest possible access to the network into storage this is where DG x2 goes the next step of our journey is to accelerate hyper scale to do essentially the same at for us with DG X but we have to do it for hyper scale the software stack is different the architecture is different the whole solution stack is different the go-to-market is different everything is different about it it took two years longer to do this than that and you'll see in the second why and so this is called scale allowed acceleration when it accelerates scale out and we're going to address the millions of data scientist engineers to the upper to the bottom right of that Center bubble and the solution for that is this GPU we've been making called T 4 T 4 is the first GPU tensor quart is our second generation test record GPU it's literally 70 watts it's the size of a candy bar it fits into every single of the high-volume most popular data center servers in the world it can fit in a blade it fits in a hyper scaled server it fits in an enterprise data center server 44s gives you about 260 teraflops FP 16 so it's R it's a supercomputer essentially and it comes with Mellanox or infinite or Broadcom Ethernet NICs in the end the software stack is the complexity and it basically looks like this you notice earlier when I was talking about inference one container the output through a REST API goes into another container and it goes into another container eventually comes back out and it recognizes your speech answers your question those are containers those are all containers on kubernetes what you're seeing here is distributed computing notice a whole bunch of users are using a server containers are communicating with each other data is going back and forth over here over here a whole bunch of servers a whole clusters working together as one compute engine it's running one job okay Hadoop started it they used it for of course crawling the internet and doing search Hadoop is a disk in disk computation system in this computation system uses commodity off-the-shelf was a brilliant strategy billion architecture basically you could take terabytes and terabytes and terabytes 100 terabytes of data to internet put it into your disks and stream it out to the disk doing map and then reduce ok basically Hadoop is HDFS filesystem the yarn distributed scheduling system and MapReduce compute engine that revolutionized Big Data however because most of the data is sitting disk the computation was slow and then spark came along and spark came along and read everything into memory so now instead of small amounts of memory on all these servers now you have big amounts of memory and it loads everything into memory and now it could interact on it and iterate on it in real time first time people were able to do basically interactive sequel searches sequel processing in a data center however the story goes eventually Moore's law started to slow and the data kept getting bigger and bigger and bigger and now we have to accelerate that and that's where Rapids came Rapids is an effort that started about six years ago and the industry has been working together on that and it came in several different layers the first layer is to completely re-engineer the memory system the in-memory architecture is now called Apache spark Apache Aero the second part of it is a scheduling system called desk and the third the new compute engine the new compute engine called Rapids and Aaron was talking about that that's integrated into the work that they're doing these three stacks Hadoop for disk oriented low-cost systems built on top of that spark on top of HDFS built another version of it which is the GPU accelerated version of it call now Rapids and you saw earlier Rapids has seen great success has been adopted into Microsoft Azure has been adopted into Google Cloud it's been just been used all over the place it's just fantastic and this is how we're gonna accelerate basically your distributed computing capability the one thing that you can notice about this is that between these containers between these containers is a lot of traffic that's going back and forth what the industry calls east and west networking traffic the east and west networking traffic is going up exponentially what the industry calls north and south basically data center to cloud is not growing exponentially and the reason for that is because the number of people in the world's not growing exponentially is growing but not exponentially the amount of traffic inside the arc into that data center is growing X potentially second when we create these large distributed computing systems the broadcast the collecting the reducing all of that Ness that all of that those primitives necessary to do distributed computing is causing an enormous amount of traffic inside so it turns out in the future the way you design a data center is going to change instead of a whole bunch of compute nodes that are connected essentially by networking the networking and the compute will become one continuous computing fabric the network is going to become really really important and that's one of the reasons why recently we announced that we're acquiring Mellanox and let me let me do this let's introduce the CEO of Mellanox yell Waldman please ladies and gentlemen a visionary a giant in the industry and pretty big guy thank you for the introduction J so so so yeah you know we've been working together for some time and we've been working on supercomputers for some time in fact in fact I think we've been working on it for about a you know a dozen years you've been working on it for almost 20 years yes and so so we've been building supercomputers together and what what are the trends that you're seeing in the world of high-performance computing right so I think like Johnson said we're seeing a great growth in data an exponential growth and we also are starting to see that the old program of program centric data center is changing into data centric data center which basically means that data will flow and create the program instead of us creating a program using the data the data will start creating the program using the data itself and this is things we can work on and actually have very synergistic architecture solutions for the future for the data center mm-hmm and so you know if you take a look at look at our journey together we started in supercomputing and and almost almost all the major supercomputers we worked on you guys worked on as well and our engineers were close hand-in-hand and the reason for that is because you know when you have all these different compute nodes working together the the synchronization of the information the sharing of the information into one large simulation is very intensive but we're seeing the same thing happening now in hyper scale data centers and we're seeing the same thing happening in enterprises and what are you what are you guys seeing and what are the dynamics that is causing that you think all right so I think if you look at the big hyper scale one of their big advantage is the compute engine the supercomputer they have in the data centers worldwide to serve hundreds of millions of people simultaneously what we help is actually connect that compute to computer and computer storage in the most efficient way with the lowest latency and highest scalability and this is why we increase the productivity the efficiency of those data centers significantly some of the things you showed here is that latency is one of the most important parameters in terms of then scalability and then efficient productivity and that's what we do best we have the lowest latency interconnect on the planet both on InfiniBand and Ethernet and we're just improving this now with 200 HDR InfiniBand and also 200 and 400 Gigabit Ethernet they will continue to develop and evermore synergistic products in the future yeah so between the neck to switch the latency of your system is just really incredible the other thing that that that what you were well ahead at the time was the concept of CPU offload in our DMA and and and and we we also felt the same way of course we didn't call the CPU off flow we call it acceleration but in a way you were in a network accelerating company all along right so we found out that you know doing programs is great by the CPU but then doing very tedious IO operations by the CPU is very not efficient so we took this task on Mellanox and we do it mainly on the endpoints on the knee connect CA with InfiniBand then what we found out there is we can put computing inside the switch and this is something we're done with Nvidia sharp to actually to accelerate we have like an AI offload machine floating-point machines inside the switch to increase the efficiency of artificial intelligence programs in the data center distance is something we're working together I think you have very recent results I don't know if you've shown this recently but we are seeing more and more offload we can take away from the CPU and GPU into the network and then synergize this into the whole data center solution for AI yeah and that's that's our path forward you have to find a way beyond the time when CPU scaling is going to continue to progress now that it's slowed down we have to take as much workload as we can and of course moving it moving it into an accelerator is one thing but moving into a network is a completely another thing and we should do both we will execute on that fantastic thank you thank you well being on stage here Thank You yo ladies and gentlemen yo Waldman incredible guy we we've got a bunch of Mellanox switches and Nick's all over our products and so so I now we've created these machines we've created these machines in these systems we've got to take them to market and the market is fragmenting very in the sense that the number of customers that use data analytics and data science is growing so rapidly it's gone from research it's gone from research to the world's Internet companies what we call hyper scale companies cloud service providers it's moving very quickly into supercomputing physics inspired neural networks in AI and moving of course into enterprise and we're trying to take these various different kinds of computers to the world and so we have two for each one of these markets and each one of these segments we have to use a different go to market one of the ways we're going to market we're taking the dgx and basically these AI supercomputing appliances with all the storage and all the switching which is really complicated and bringing it to the enterprise we have a suite of partners that we're working with really great partners D D and L UMC IBM NetApp pure storage they know a ton about storage and they happen to be working already with all of the people who use a lot of big data and so partnering with them creating essentially a pod and these pods are fully integrated what the industry calls essentially hyper-converged you could take all that you install it literally within a day we come in or somebody comes in I don't come in but somebody comes in between zero I was almost going to take credit for it but I don't think I should maleeh heavy and and so you bring it in zero to AI is basically in a day a few hours and so really fantastic work with Arista and Cisco on switches and Mellanox and switches together this represents a large part of the high-performance computing industry and we now take these capabilities into the enterprise the second thing is now the larger enterprises the dgx pods that I just mentioned comes in to the upper left-hand side upper left-hand side of that bubble of data science now we want to take it into the bottom where the top in the bottom goes the computation the computation is three orders of magnitude apart but yet the market size the number of engineers the number of data scientists is also three orders of magnitude instead of one instead of 1,000 now we're talking about 1 million instead of thousands we're talking about millions of people and so we're gonna need a large large network of partners to be able to take these architectures which are relatively complicated which is only possible really today in the cloud service providers to bring it out into the world's enterprise and so that they could set it up easily and to be a run these workloads as easily as possible we're announcing today that nearly all of the world's leading computer makers for enterprise has joined us to take this new architecture this data science server powered by T fours the CUDA x AI stack and all of the machine learning frameworks that I've already spoken about to take it to market and so I want to if I could ask you guys to congratulate them and thank you for joining our team let me show you what these servers look like so just now you saw what a what a what a workstation can do now this is a gigantic data a gigantic data set and this is what it looks like in the case of this data set as several hundred gigabytes large end-to-end it takes thirty five minutes to do on a cluster of ten servers okay and on a cluster of 10 servers with one T 4 inside it's almost zero and so basically we take it from half an hour end to end to three minutes barely enough time to get up to get a cup of coffee ok so so that in the future you will get all the data scientists in the world are going to be substantially less caffeinated we're going to get a lot more work done here's here's an interesting this is this is M X net training using the same distributed servers and notice it starts to plateau this is the problem of of distributed computing this is the reason why scale-up is better than scale out for some very large simulation jobs because by adding more and more servers notice the return on that investment starts to decline and the reason for that is because you're spending too much time communicating and so this is with the fastest ethernet and this is with the fastest Ethernet with our DMA the technology that Mellanox invented and now it's an industry standard called Rocky ok so this is the reason why networking bandwidth is so important and why networking offload is important and why software integration of the stack is so important what I see here what I show here you here it looks simple but the amount of software that goes into making long as possible is really incredible well that's enterprise if we want to reach a lot of people if we want to reach a lot of people and want to reach a lot of people fast with the single largest compute engine on the planet there's one way to do it and it's the only way to do it ladies and gentlemen is the Amazon way to do it and so if I could just invite Matt Garmin a partner of ours I've been working together for years Matt how's it going have you here thank you it's great to have you here we started working together seven years ago you put a Fermi GPU in the cloud we did yes what were you thinking but anyways man put a fermi GP in the cloud and and opened it up as a GPU as a service and that was the first time and actually to be honest when you first did it I was kind of going yeah I don't know where this is gonna go a lot of people you know yeah I'm enthusiastic I'm happy I'm happy to sell any GPUs but but but it was just it was it was so it's so cool to see and then you were the first to launch Volta into the cloud and then you were the first to launch an eighth instance hgx to Volta into the cloud and we've been working together ever since we have it's and and so so it's great to have you here you guys have been doing AI in the cloud you've got this lay layer called the service called sage maker and it's a fantastic tool you know I'm gonna and us to you you got a couple slides but can you tell can you tell the audience about some of the work that you guys are doing and sure yeah happy to and so and I've actually been an Amazon an AWS about 13 years since we started AWS back in 2006 and over that time you know really our goal is to develop and what we've done is develop the most reliable fastest cloud for anybody out there and really how we think about that is we wanted to deliver services across the world for everybody wanted compute services storage services and we want data analytic services and networking services across 61 availability zones and 20 regions all around the world and particularly ml services and that's a lot about what you've been talking here today and when you think about ml services one of the things that's really exciting is machine learning is a great fit for the cloud a lot of our customers and we have more machine learning is done in AWS in the cloud than anywhere and one of the reasons for that is because a lot of our customers are still trying to figure out a lot of these people here trying to figure out how exactly do you incorporate machine learning into your applications right they're still trying to figure out exactly what are the best ways to do it and iterating on that well the cloud is a perfect fit for that we have customers who come launch large clusters of p3 servers running Voltas and they'll run their training applications and they'll spin them up and they'll test them they see how they go they spin up lots of computers and lots of servers so they can get the work done quickly and then they shut it down they don't have they paid don't pay for any of that infrastructure and then they go look at their results they iterate they try some new algorithms and then they go spin it up again when they're ready to do it and in fact one of the services that we've delivered as you mentioned is called sage maker sage maker is an end-to-end fully managed machine learning service that makes it easy for data scientists developers even machine learning experts to easily and scalable e launch their machine learning applications in the cloud all in Nvidia and ec2 technologies now what are some of the customers that use your stuff yeah and tell about tell us tell the audience about some other things that they use it for thank you and yeah it's so you know AWS and Amazon we always start with a customer that's that's really kind of where we think about it and we have thousands and thousands and thousands of customers using ml in the cloud today I picked up a couple of them here some well-known customers and I'll highlight a few of them for you and we have some customers that are actually doing traditional HPC we have customers like Western Digital who use P threes and Voltas to look at a wide range of factors from material science properties to magnetic and heat flows and all sorts of things to really improve the quality of their disk drives we've got a lot of customers that are using your stuff for seismic processing we do they're doing all sorts of they don't have if you look at a different industry we have cell gene cell gene is a bio pharma company recently bought by this bristol-myers Squibb and they have AI and they're looking at a bunch of different drug designs to see what's going to happen fastest they used to have a cluster on premise that would run their applications it took them two months to run this complicated application they move to the cloud scaled out and now they're able to do it in six hours Wow and there's a couple of Bay Area companies two months to six hours when you say that out loud for a second it's amazing it's actually yeah shocking yeah and so think about think about not just the cost that they're saving but the to your point the most precious resource is that data scientist time and that ml scientist time and the iterations that they're able to make are incredible for their business yeah and then many of the top technology companies here in the Bay Area from Salesforce using it in their einstein vision application for their developers finding image recognition for brands that are online people like Airbnb making it easier for their hosts to figure out how much they should charge for their property or even lift lifts recently announced that they're all in on AWS everything that they're running is all on AWS for their 50th or 50 million riders per month and they use al-rai and ml running on p3s and Voltas together with sage maker to calculate everything from estimating fares to better drop off and pick up things to fraud detection in the cloud just a couple of examples that's just amazing and one of the things these are all kind of training applications but many of them my favorite app Yelp exactly working great that's incredible yeah II know it's about the future before does it go from here yes so these are many of these customers are doing their big huge training sets in south of AWS and happy to announce that many of them are also doing as you mentioned through that pipeline that you had up here earlier they do their training but they also then spread out inference yeah and many of the customers tell us that actually 80 to 90 percent of the cost of machine learning at scale ends up being in front yeah this is the big market so super excited to announce this today a new instance coming out in AWS coming soon the G for instance it'll be featuring Nvidia t4 processors and really designed for machine learning inference it's really designed to help our customers really shrink the time that it takes to do inference at the edge where that response time really mattered matters but also really reduce the cost they have to run fewer nodes in addition to machine learning where this is a really great fit you mentioned you can also do graphics processing on these and so we're really excited and many customers are excited to do that as well we have customers that are looking to do a high-end video workstations in the cloud we have customers that are looking to do video kit transcoding and media processing as well as video games streaming all of that via the cloud on our new g4 instances I've got two great stacks for you one's called g-force now the other ones call omniverse awesome yeah so you'd go build a whole bunch of cloud GPUs I'll be there we will nothing got to be more excited when he said 61 countries that's right 61 availability regions but we're getting to 61 countries so we're super excited I want to thank you hope you have a good night oh and thank you for inviting us thank you Matt thanks for the partnership yeah appreciate it thank you Matt Garmin Amazon AWS the largest computer on the planet alright chapter 3 robotics here we're gonna talk about robotics the first part of robotics is Jetson we created this little tiny computer called Jetson and we put it out it's it's based on Linux it runs the entire Nvidia CUDA X stack the amazing thing is there are 200,000 developers across 2000 companies building things everywhere warehouse logistics robots little delivery robots agriculture robots John Deere for example retail robots and assistants industrial robots robots everywhere augmenting our capability doing things that are hard for us to do and this area is just rich with research this is of course the ultimate AI today we're announcing a brand-new robotics computer a brand-new robotics computer we're so proud of this one it is the smallest computer our company has ever built it's called the Jetson Nano I have one here I've been wearing it all day [Applause] well you don't get it it took me days to get it they kept putting the slide in front of me and I get come on it's not quite right it's not quite right his head's too big turns out they were it was my head that's a terrible joke you guys ladies and gentlemen Jetson nano here's the amazing thing about this little tiny thing it's $99 the whole computer and and it's like you know if you if you use a Raspberry Pi and you just don't have enough computer performance then you get yourself one of these and it runs the entire CUDA X stack it runs it runs computer vision if we're in speech recognition because it's architectural II compatible our company is that way and so you've got rich software and all the AIS that you've created that runs on DG X's you know when you compile it again it runs on this we we care so much about the robotics industry we decided to create a whole set of tools for the robot robotics to foster the robotics ecosystem and so today we're opening several things the first is the Isaac robot engine basically the entire stack that's available on top of it to create robots it creates three different robots the kaya robot the Carter robot in the link robot and one of them could use little Jetson Nano the other one may use Xavier we also created a robotic simulator so that the robots could learn how to be robots inside this virtual reality environment and it's kind of look like real to the robot so that when the robot is done we take the artificial intelligent into the real physical robot it sees the world perceives the world in the same way the robotics loop basically has three things and go on talked about it okay it all sits all everybody talked about it it's exactly the same way however you think about one perception two reasoning and three planning perception reasoning and planning perception reasoning and planning these robots are all doing that the internet services the AI chat conversational agents are doing that the machine learning prediction systems are basically doing that you're perceiving from all the data your reasoning about what to do and you take action you take action you make a recommendation or otherwise known in the world of physical worlds called planning and so we would we would let the robot learn how to perceive the world so the world has to look right that's the reason about what to do based on what it's asked to do and then has to do the planning of the motion the articulation of it and do the work we also wanted to learn the machine learning algorithm the AI algorithm by itself because some of these programs are just impossible to write how do you look at something go and pick it up and object is changing in shape sometimes it's different shapes all the time it's in a different position all the time so the program is not exactly the same every single time but you would like to have an agent and AI algorithm that goes and picks it up by itself and so we created a gym where it can learn through reinforcement learning how to be a robot we put all these things together and hopefully the the ecosystem the community can use this platform to create amazing robots for the future you can go to developer down and video.com isaac sdk and well just you know it's it's open and feel free to use it give us feedback and let me let me show you I saw here we go and and there's these reference robots you just kind of you know here come just yeah just stay put sit all right so this is kaya and this has a Jetson in it and it's got rich sensors in this case depth camera in the larger computers the larger robots has lidar and so they have wonderful sensors and we can support very high resolution sensors and so if you wanted to make a little toy you could but if you wanted to make something that's actually real and you do real work you can and they can be wonderful robots and the whole stack from vision to speech and all of the AI that we've been talking about this whole time is available on here in a ship run okay and so hey good job you guys okay yeah okay the guys the guys made you a short movie let's take a look at it [Music] [Music] [Music] ladies and gentlemen Isaac robotics okay good job you guys let me talk to you about one of the most important robots of all your self-driving car everything that moves in the future will have robotics technology inside will have some level of Tamas capability our company is deep in the middle of the autonomous vehicle revolution but we're not building a self-driving car we're creating a system and the infrastructure and the design capability necessary for the whole industry to build a self-driving car these are basically the components of our drive initiative we start with of course the Saturn what we call dgx Saturn 5 our own supercomputer necessary for doing deep learning and training these a is the output of these AIS whole bunch of them that are then ensemble together to compose the three layers the three groups of algorithms perception localization and path planning perception localization and path planning some people combine localization and mapping and so perception localization mapping and planning these three parts of the computing AI computing stack robotics TAC is inside the drive platform all of those algorithms are in Samba together to essentially create what we call drive AV constellation is our simulation platform as an engineer nothing is right unless you can simulate it you want to be able to simulate the corner conditions you want to simulate the rare conditions you want to be able to regress and repeat old scenarios this car not only does it drive by itself but it has to communicate with you just as when goo han was communicating with the cloud there has to be visual back input they've had feedback as well as audio feedback and so dr IX is our intelligent intelligent user experience and it basically shows you what is in the mind of the self-driving car and it communicates with you the driving computer itself scalable from level 2 all the way to robot taxis we have a Risa Malaysian effort that is going I'm going to show you a video of that and then ultimately we've got to drive the car we're gonna drive the car until it drives by itself and it's just it's perfectly ready for production and then we open up the entire platform people could use it at the computer level at the meet middleware software level or the fully integrated application level they could take advantage of our server system for developing the AI the simulation system and whatever infrastructure we've created this is an open system the future of autonomous vehicles has to be software-defined and the reason for that is because look at how far we have to go and how many different diversity of systems we have to support it is impossible to design a specific specific widget for one particular car we have to be software-defined it has to be open you have to take advantage of the entire ecosystem and I really appreciate all of the partners for joining us thank you today we're announcing our release 9 release 9 we're still a year or so away from having a production car but this is our release 9 high functioning level 2 gives you basically it gives you off ramp on ramp and off ramp last year we showed you I showed you how we drove 50 miles without touching the steering wheel we have some round perception so that we could have the ability to do auto lane change we localized to all of the world's major maps we do real-time mapping ourselves and the reason for that is because these maps cover about 80% of the world but most of the routes that we drive ourselves the last 20% of the world it turns out to be the most frequent miles and so we our car could map fuse it all together and turn it into our own personal HD map then we can localize to in the future we also integrate everything into an AR and VR system that I'll show you in just a second it's really quite incredible and then we have integrated driver monitoring and voice recognition basically your car becomes an AI guys let's show it to them last year I showed you guys and to end 50 miles this year I'm showing you this is map routing our car is doing dynamic mapping right now you drive the routes and then it fuses the routes together into a map not it doesn't record your driving it's Greek creating the map and now notice on the right hand side it's creating a map it fuses it together because of the routes that we drive these days there's a lot of intersections a lot of complicated intersections we have to teach it what are the different context and where should you stop every intersection is a little bit different counting on red lights alone is not good enough counting on signs alone is not good enough we fuse everything together we have this great technology this is so great we use radar to localize essentially radar turns into a lidar very coarse lidar and it works even in the rain and fog at night and as supplements our camera we simulate everything this is a virtual reality simulator now automatic Lane changing in a crowded environment and this one is a technology called safety forcefield apply to break [Music] and then takes us home back to headquarters lots of intersections lots of complicated corners we drive it several times we fuse it together in to reconstruct our own map because the HD map of the map providers don't don't go here okay in the interest thank you guys let me show you something that's really cool so we do so the first part that you guys know we do incredibly well is perception the second part we do incredibly well is localization we've been building up the stack from the foundation up now we're going to introduce our path planning our path planning has several components the first component is an ensemble of neural networks and different computer vision approaches to estimate the paths that the car should take we call that path planning it is really really robust and it's fantastic on top of that we have a prediction algorithm to predict all of our surroundings detect the surrounding predict its path future path estimate their speed that action perceiving the surroundings and predicting their future is important to safe driving and the reason for that is this you want to be able to have a computationally robust algorithm to predict that whatever you decide to do you will do no harm in the future and so everybody else around you put assuming that they're they're well behaving agents that you yourself will computationally not cause any harm so you are essentially in a safety cocoon we have a method a computational method that detects the surrounding cars predict their future path of course knowing our own path and computationally avoid the surrounding traffic we call it the safety force field this is the first of its kind is completely computational it has the ability to be computationally validated we have researchers around the world who are as safety experts looking at the algorithms we're getting great feedback from it it's going to be an open open system so you guys can take the algorithm and you can implement it yourself and let me show you how it works so this is basically you've now predicted that based on their trajectory which is not moving that you should apply your brakes and this is using this algorithm you achieved what's called automatic emergency braking automatically this is intersection handling we're now predicting that car is now taken a turn and so we have to apply our brakes and we did it just in time okay and this is intelligent steering we're now estimating where everybody else is and because we're blocked we have to find the next closest route okay and so that's part of the safety forcefield it figures out computationally that there's a lane next to us and we can use it and this you can just congested traffic using computational methods we detect where all of the surrounding cars are we want to change lanes it is okay to do so and the reason for that is because we predicted the car velocity and trajectory of the car next to us behind us and we come to the conclusion that if we change lanes it will not hit us if it were to if we were to change our computation and we determined that it will collide with us of course we'll veer back into our own lane okay so safety force field safety force field it's computationally verified and it is simulated and that's why simulation is so vital to us this algorithm we're super proud of it it's now it's now going to be in the open and you can read the white papers and as we as we continue to progress and we'll make it available to you and so the third stage of really really great self-driving cars is the path planning algorithms and both in comfort and in of course accuracy and comfort as well as in safety simulation is really vital today we're announcing that constellation is available we've been working on constellation for some time this the architecture of the the image generator is complicated it's bit accurate meaning the hardware it's basically a virtual car and you take the software we throw it into our data center and you essentially have a virtual car it's like having a a virtual fleet of autonomous vehicles in your data center the workflow is in the cloud you could program your workflow we're going to show it to you in just a second and then you can of course stream it from the cloud in the future instead of having thousands of Av test cars we're gonna have thousands of these constellation systems they're going to be so much more programmable we can create conditions that we otherwise can't and we have our own system I'll wracked up okay so now mark is going to show it to you mark okay let me show you a little bit about what how we deploy our system on constellation you're looking at drive sim which is our simulation platform the Center for screens are our simulator right left front and rear view Thank You vana over to the left side is the perception this is the drive a V stack taking the sensor input from the simulator and perceiving what's going on in the world and and as well as the lanes and the cars and everything that are in the world and giving us control information back to the simulator and so the computers are driving the car look ma no hands over here on the right side it's what we show the driver or the the human in the car so that the human is well aware that we're aware of what's happening in the environment is a confidence view as a driver you could decide to see this augmented reality view right here or you can decide to see a virtual reality view right there and the ability to see this gives you so much confidence that the AV computer is recognizing perceiving the right things and about to do the right things okay mark up it fantastic so I want to show you how we actually use this the workflow of using Drive Sim so if we can switch the developer view real quick over here on the right side we're seeing and yet another camera in the scene this is a spectator camera that a developer would use we're running the simulation the entire time that we're editing this we also have the Jupiter notebook Python interface two-hour drive sim so we can make some changes let's let's make some weather changes go ahead let's uh night time go through a few of them sunset rain so much easier to do this in virtual reality than to do it in real life like we try to we try to hope for rain but it just doesn't rain for months [Laughter] okay fantastic so we've changed some environments in the world we've actually been adding traffic as we've been playing along all this while the simulation is running all while we're running hardware in the loop so we're talking to the actual computer in the car as the exact software that we would actually put into the car is actually running on constellation bit accurate exactly the same perfect yes so let's make one more modification while we're here all this traffic is essentially their agents they're they're running a pass based upon rules but we can we can occupy the mind of one of these agents let's take it over and and control it so we've grabbed one of the cars in the world let's uh let's do it have it do a lane change so I can locally modify any one of these BOTS of traffic I can possess it and control it I can set up any scenario I want interactively finally the best way to test our car isn't to do it all of this interactively I really want some randomized versions of these scenarios that we just created and I want to deploy them on a whole fleet of not real cars but in this case constellation boxes in our data center so let's do that let's switch to the constellation view here we go we've got in this case twelve versions of that exact same setup but now with randomized variables of weather time of day timing of the traffic so that we can we can test every possible iteration of that scenario that's awesome thanks a lot mark thank you very much ladies and gentlemen constellation not only can you use it for simulation you could use it for rese emission watch this this is really cool what's actually happening here is we're taking a previous Drive and we're taking the data and we're pumping it into constellation we're sitting inside of inside a test car stationary outside it's in GT C as far as your as far as the car is concerned it's basically running a regression and so the detection is working the lane detections are working and it's recommended paths are all working and whenever there's any deviations from previous drives this is how you can regress do regression testing ok good job guys thank you constellation I have one more thing to tell you so it turns out that autonomous vehicles is one of the greatest challenges safety is a great concern the technology is really complicated the software that we have to develop is still quite significant and and it's a computing challenge it's an artificial intelligence challenge it is a system integration challenge of cars there's all kinds of challenges involved this is really one of the world's great computational challenges and the world's largest car company the world's largest color company is making enormous endeavors in this in this area ladies and gentleman today we're announcing that Toyota the world's largest car company the largest transportation company is partnering with us from end to end from deep learning systems the simulation systems to Inc our computers to collaboration with AI ladies and gentlemen Toyota and the t.ri advanced developments team is partnering with Nvidia to create the future of autonomous vehicles this is this is so exciting and this is how we can make a real a real real difference in the future of Transportation let me quickly now summarize we just several things today we talked about accelerated computing in the path forward ladies and gentlemen what does Prada stand for that's right programmable acceleration of multiple domains with one architecture okay so programmable acceleration domain architecture that is what accelerated computing is about and that's how we move forward we work across the entire acceleration stack the second thing is the future of graphics the future of games is unquestionably ray tracing you're gonna talk you're gonna hear about ray tracing on and on and on and on and on this week turns out to be Game Developers Conference and all they're gonna be talking about is very tracing we announced several things a server graphics in the data center graphics in the data center a very complicated server architecture with three stacks on it for rendering omniverse computer design and three cloud gaming on cloud gaming we have a strat we have a we have a platform we call the GeForce now Alliance and we're partnering with Softbank LG U+ is our first partners and I'm so excited that we're announcing omniverse something we've been working on for quite a long time it's just really great technology the next in data science it's the new HPC is the fourth pillar of the scientific method this is our ecosystem approach and today we're also announcing two computers two new computers designed for the future of data sciences and for enterprises all over the world to be able to access this technology data science workstations and a data science server based on the T 4 and our partnership with AWS to take this entire stack on top of the world's largest computers to the world's data scientist and of course our partnership and our acquisition of Mellanox something super excited about because the future of computing extends out of the computing node and into the networking fabric and then lastly in robotics Jetson smallest little tiny computer - cutest little thing right there I'll just do it one more time for you guys for your enjoyment I think it's like right there and so so and then constellation and then our partnership we tell you how to ladies and gentlemen this is our GTC I want to thank all of you for coming today have a great GTC [Applause]

Info

Channel: NVIDIA

Views: 127,943

Rating: 4.7018781 out of 5

Keywords: NVIDIA, GTC 2019, GPU Technology Conference, AI, Artificial Intelligence, Jensen Huang

Id: Z2XlNfCtxwI

Channel Id: undefined

Length: 160min 45sec (9645 seconds)

Published: Tue Mar 19 2019