GTC 2017: Nvidia gpu technology conference Tesla V100 Volta

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] [Applause] [Music] I am a visionary exploring a universe of data to sharpen our view of the most distant galaxies and studying black holes he'll prove Einstein's theory of gravitational waves I am a healer giving doctors the power turn mountains of data into life-saving breakthroughs identifying diseases like leukemia from a simple drop of blood even in our own home and find a new way to bring cure and the disabled in their homes I am a navigator mapping our world one millimeter at a time and making even the largest self-driving vehicles safer for the long haul I am a creator learning to paint from the master and applying their styles to create original works of art I am a teacher analyzing half a million players every game to identify strengths and weaknesses and a learner discovering new strategies from complex games [Music] ladies and gentlemen please welcome founder and CEO Nvidia Jensen Wong [Music] welcome to GPC 2017 before I start I just got to say all of everything you've seen so far and everything you're about to see we're all created by invidious creative organization we have we have just an amazing team of creatives and our company as you guys know we work at the intersection of art science and engineering and this intersection is what made all of this possible I love the work that they do the craftsmanship the dedication their absolute focus on excellence their creativity their willingness to take a risk put themselves out there and not only that it was all done by themselves and and even the voice-over voice over is done the one of our young employees Helen broening her and she her parents work at Nvidia and and you know it we raised her and that was her voice and so I was I'm so happy about that her parents are gonna be really proud I've got a lot I want to talk to you about today we have laws of physics laws of computing we have special guests we have artificial intelligence and so I've got a lot of stuff to cover so let's get started there are two dynamics that are happening in our industry at the same time I'm going to talk about both of them this is the first one for the last 30 years for the last 30 years we have benefited from one of the most powerful technological revolutions that anybody's ever seen the combination of two effects created what is known as Moore's law one architectural innovation that makes microprocessors better and better more and more performant through architectural techniques the goal is to find instruction level parallelism and it's magic just think about what they're trying to do basically a program is a sequential list of instructions and it's performed one at a time but somehow computer science has have figured out ways to do them in parallel well one of the ways that you do it is pipelining to start the next instruction to start the next step before the first step was complete make the instructions wider do all kinds of amazing things so that you could speculatively execute something in the event that maybe you didn't have to change course all kinds of amazing technology was created cache has got lots bigger and of course software techniques on top of it optimizing compilers made it possible for us to advance microprocessor architecture performance year in and year out and we deployed all those transistors to good use now the transistors that we added more and more and more transistors would have been possible to use if it wasn't because of a second law the law of Dennard scaling Dennard scaling basically says then we can put more more transistors into the reduces voltage long as we continue to make the transistor smaller reducing his capacitance and by doing so the combination of these two factors more and more transistors running them faster and faster at a lower lower voltage allowed us to continue to advance performance within some constant energy envelope in the course of last 30 years we've improved processor microprocessor performance by nearly a million times by nearly a million times nothing in society has improved by a million times and everything a society has been made possible because of this fundamental advance then in the last several years it started to slow our abilities to harvest parallelize parallelism out of instructions sequential instructions started to diminish and the number of transistors that we had to add in order to squeeze out that little tiny bit of extra performance was simply too costly on the other hand we were reducing voltage shrinking transistors and we're now up against the laws of semiconductor physics there's only so far that we can push before the NARC scaling started to fail on us we now have found ourselves at the end of to end of two roads and it'd be incredibly well documented we started talking about in fact for many of you who've been coming to GCC all these years I think I spoke about it at the first GTC I speak about it at every GTC and it's the reason it's the reason of our existence recognizing that we need to find a path forward life after Moore's Law John Hennessy recently talked about it he called it the end of the world for general-purpose processors and the future of computing mark Horowitz also professor at Stanford spent enormous amounts of times with its colleague basically plotted out every single major event and process a product and node in the last 30 to 40 years and the results are actually quite amazing the blue line basically shows the Dennard scaling compounded with the lack of productive architectural innovations has led to the plateauing of processor performance what used to grow at 50 percent per year 50 percent per year compounded improvement is now improving at 10 percent per year yet we can manufacture 2 transistors the transistors are abundant and in fact if you look at it look at that white line that shows you how much transistors we have and that was the ultimate observation the ultimate observation of the beginning of our company that observation was the reason why accelerated computing works and it's the reason why we introduced the concept of GPU computing GPU computing does several things the first thing it does is recognizing the microprocessor is incredibly good at sequential instructions it's incredibly good at single threaded operation that the craftsmanship in the innovation and all the engineering has gone into it over the course of the last 3040 years wasn't going to be replaced and we respect the other law of computing Amdahl's law that if we were to accelerate the things that we could do the part that we can accelerate eventually becomes the problem and so we have to make sure we have to make sure that we honor that law as we change the architecture of computing we did several things the first thing that we did was we realized there's some workload inside some very important applications some very important applications frankly the important these applications are once that the reasons why you are here there are the algorithms of artists of scientists of engineers of the explorers that discoverers the inventors the DaVinci's of our time the Einsteins of our time their software includes some parallel computing aspects some parallel processing aspects that if we could figure out a way to offload of the microprocessor that was good at sequential processing we could provide incredible speed up so to first thing is to create a specialized domain-specific accelerator that is a companion to the CPU accelerated computing the second thing we did was create an architecture that had a platform that we were willing to dedicate ourselves to everything we did for the rest of our lives we created an architecture we call CUDA and it's named after an architecture that we created in the very beginning of our company 25 years ago called UD a universal driver architecture that architecture was extended for computing unified through our architecture scheme that architecture was extended for computing starting 10 years ago we call it CUDA this architecture is our computing architecture and a computing architecture that you dedicate your lives to and you continue to promote that you continue to sustain you continue to improve and it continues to add value eventually eventually other people can benefit from it it has to be special it has to do something that general-purpose computing that commodity computing or available otherwise generally available computing cannot do it has to be special it has to be something you dedicate yourself into has to be something that is available everywhere it can't just be available on a PC has to be available on a laptop it has to be available the cloud has to build availability and embedded devices has to be available everywhere it has to be thought about from top to bottom in a sense that you have to have tools you have to have middleware because computer architecture and computer scientists needs all of that to be productive what's really special about GPU accelerated computing was that it took enormous amounts of effort to pour it to refactor for all of you the applications you've developed on top of microprocessors on to this new computing platform it took time and it took specialized skills and so we dedicated ourselves to having a team of computational mathematicians that can think across the entire stacks and we work with you work with the application makers the algorithm developers to find that match between the work that you want to do and the architecture we created we worked at the architecture level at the system level at the system software level at the algorithm level and then we worked at the application level the reason for that is because if you want to overcome the limitations of Dennard scaling you're going to have to do something pretty clever and you have to think across every single possible layer of computing to find efficiencies to get rid of waste to do special and smart things this way of doing computing top to bottom than bottom to top top to bottom to bottom to top dedicated to one single architecture over the course the last 10 years the results have been phenomenal if you look at the Green Line that's basically the lines and Nvidia's tracking some people have described our progress as Moore's Law squared and the reason for that is because first of all you'd get a big speed-up you'd get a big speed-up over the natural microprocessor performance secondarily it appears to be moving faster than the rate of increase of transistors and I think there's some logic to that and the reason for that is exactly as I described is because we thought across the entire stack well for many of you who have been coming here for close to 10 years I want to you how much I appreciate all of your support we come here we come here because the work that we do is impossible otherwise the work that we do is impossible otherwise the work that we do in creating virtual reality is impossible otherwise the work that we do in computer graphics the work that you do in fluid dynamics is impossible otherwise in molecular dynamics is impossible otherwise there are several regions several domains that we have found accelerated computing to be incredibly effective of course graphics physics quantum mechanics and a new field called deep learning GPC has been growing so fast it has been growing incredibly fast since our very beginning we now have increased by a factor of three and five years the number of attendees now the only reason why I haven't gone grow faster is because of the fact that computing is all over the world since last year starting last year we've taken GGC on the road last year alone last year alone over 20,000 people came to gtcys around the world and this year we're going to take the show on the road again so that we could make this computing platform available for developers scientists and researchers for your groundbreaking work all over the world the number of GPU developers has increased by a factor of 10 in five years it's actually amazing five hundred thousand developers it's taught all over the world textbooks are written all over the world when you look in LinkedIn you see CUDA all over the place it's just fantastic the number of people who use G who now considered gpgpu or programming GPUs are programming CUDA one of their specialties is really fantastic to see and then over a million CUDA downloads the CUDA driver the CUDA software SDK has been downloaded over a million times at GTCC this year every one of the top 15 technology companies in the world are here 100% of the world's top 15 technology companies are here ten out of the world's top 10 car companies are here Pfizer Merck and Roche GSK and Jen Lilly are here researchers from the world's top 100 national laboratories are here there are 80 AI startups here 25 VR startups all kinds of robotic startups and ideas GTC is where if you will the future is invented GTC is where we create what other people would think of as science fiction and speaking of science fiction my first demonstration for you today is the holodeck as you know we play at the intersection of virtual reality and artificial intelligence we play a divert at the intersection and nothing exemplifies that intersection like holodeck does and the holodeck is not only a place you go but it's a place we can share and so it have to obey the laws of physics otherwise they wouldn't feel like a place it has to be photorealistic and has to be someplace that we can share together and so I thought gosh you know what can we share together what can we share together and so I I thought it would be great to invite a friend to join us from Sweden and he's not he's uh he's actually a pretty special guy his name is Christian Koenigsegg do you guys know Christian Koenigsegg Christian Koenigsegg is a car maker not a normal car maker can a hyper car maker and that's a description of him not the car come on guys help me out here that was that was exactly that was incredibly fast thinking on my feet there are many more jokes to come watch some of its just gonna be right in here it's incredible and so so I invited Koenigsegg to come and show us something that most people have never seen and to enjoy the Koenigsegg Agera ladies and gentlemen let's go into the holiday hey guys hey hey Christian which one are you Christian just up out he'll be right Bennett research with he went to the bathroom everybody was waiting alright okay so this is a this is invidious holodeck we have we have people in it from all over the world and the thing that's really I think it is incredibly cute that they're all wearing name tags so they know who each other that's very funny okay so why don't you guys take a steward let's let's see the brand new Koenigsegg shall we you guys know this the holodeck works this is all completely in real time just photorealistic graphics this is this is what why it's so fun to work at Nvidia man old man hey so Christian what do you think about the car tell us about the car Christian yep hmm no what happened wow thank you [Music] well don't don't breathe like Darth Vader okay so Christian first of all tell us about your car so this this car this car is a hyper car it's a it's a v8 twin turbo with three electric motors in it right so take it from there yeah so this is our latest creation it's a hybrid car with direct drive no gears we have a combustion engine with up to 1,200 horsepower and the 680 horsepower of electric drive so this car doesn't need any years to go from 0 to 250 miles per hour in only 20 seconds it's really one of the absolute fastest cars ever produce produce and something I'm very proud about being it in this environment it's just amazing trying out this system with you guys really changing it changes the view of what's possible how to create the cars showcase them during the building process just fantastic now Christian you know that you know they're the first in California it was sweetly in the rest of us yeah yeah yeah whatever whatever and so so I guess well as you know I drove it anyways and the thing that the thing that was really great is that that whenever whenever I get stopped it's never by a police is usually by an old lady who would like to take a picture of her grandson in the car but but um but I had the perfect excuse I was going to say that if Mike I was going from I'm taking the car to the showroom and apparently that was going to work because you told me that would work and that it was okay to drive an illegal car alright so let's take a look at this everything here is carbon fiber let's say let's go inside it let's go inside let's take a look Amanda let's take a look and remember everything a based a lot of the physics so if she were to grab the steering wheel her hands don't go through it Wow look at that look at that oh look at that okay all right all right come on out come on out let's don't you guys have an x-ray feature or something like that so that we could see through the car I mean I just want to make sure that the entire car design is there every single body part is in here this is not just a a videogame car this is actually a computer aided design car this is that this database this database came directly from Christian and now that we have his database we can 3d print it ourselves look at that it's not amazing but what if I want to see like all the parts I just want to do an inventory off the parts okay you guys thank you thank you very much all right enough fun ladies and gentlemen Christian Koenigsegg the hollow deck the holodeck and it's just so amazing to be in these environments together with all your colleagues and you're talking to each other and and you're pointing at the same thing and because you could touch things you could actually lift things up and because you're in that environment you're superhuman it reacts to physics but you can lift up amazing things and so the holodeck is such a great place the first part of our so the first dynamic is the emergence the rise of GPU computing the second thing that happened started happened several years ago and in fact some we call this the second era not of not of processing but the second era of computing all together as you know when you guys are doing a search on Google somehow it magically knows what kind of information you're interested in when you're doing watching movies on Netflix somehow it magically knows what what are other movies that you would enjoy and when you're shopping on Amazon it's amazing that every single page is personalized for you and that it knows based on the type of shopping habits that you have there are other things that you might be interested in none of those programs were written as a sequence of instructions specifically by engineers all of that was made possible by machine learning it's learning from all of your behavior and all your interactions with with that service and over time it becomes more and more predictive it is almost able to anticipate your these machine learning is one of the most important computer revolution ever where as computer scientists used to specify every single instruction aligned at a time now algorithms write algorithms software write software computers are learning by themselves machine learning the era of machine learning Penguins Domingo's the university of washington professor really really elegant book and it's called the master algorithm and he describes that there are five tribes in machine learning the Symbolists people who use inverse deduction induction the Bayesian probabilistic in in inference in analogize errs the evolutionists the ones that believe in genetic programming these five these four approaches are also making enormous contributions in computer science but one particular tribe called the connectionists have recently burst it into the public consciousness this particular approach which is now called deep learning if the culmination of research breakthroughs from so many different labs from schmidt Huber's with AI lab on using GPUs for convolutional neural nets and the early uses of LS p.m. long story short term memory recurrent neural nets to young lagoons work with cnn's to the groundbreaking work invention of that crop is a propagation crop back propagation by geoff hinton at the university of toronto the work that safeiy has done on image net and computer vision at stanford and of course quite famous work by andrew hang on deep learning at stanford as well all of the works have come together into if you will kind of called the big bang of deep learning the big bang of modern AI and safeiy gave a talk recently where she was talking about the search for intelligence and she said that the Big Bang of AI what made it possible were three fundamental ingredients the breakthrough of deep learning was made possible by three things of course the culmination of all of those great ideas that came together into the deep learning algorithm and the deep learning approach the second is the availability of enormous amount of data and third is the discovery of using GPUs to accelerate deep learning the training of deep learning the development of the network the model well that combination set off in 2012 one of the most amazing progress in computer science that combination that Big Bang allows computers to magically look at an image determine what's the important features learn the features hierarchically from pixels to curves to objects for example my face my ear my nose my eye to eventually turn it into a face and my face that it learned at hierarchically and was able to represent knowledge represent information in this way by extracting it out of raw data all by itself it is able to look at a picture of me and recognize that it is me it is not only robust it is diverse it could recognize me in the Sun in the dark with a hat on half of my face included in shadow maybe slightly turned away it could recognize other people it can generalize the ability for these networks to not only be robust and diverse and generalized allows us to solve one of the great challenges of computer science up to this point which is perception sensing the real world sensing raw data whether it's visual or audio are otherwise it could be vibration it could be tremors it could be temperature it could be access to your data storage on your in your corporate corporate storage and all of a sudden boom by solving this problem we just went on a massive races 2012 one breakthrough after another was made possible because of it self-driving cars the ability Baidu using computer vision translating it to text Google self tagging all the photographs that you upload you no longer have to tell it where this picture was taken it figures it out you could you could you could ask it for all your pictures of beaches and find it for you for the very first time a deep learning network that was trained by data not by not coded by engineers not coded by Computer Sciences was fda-approved for medical imaging cardiac medical imaging and it just keeps on going recurrent urinal the ability for neurons these networks to learn time sequence information so that it can understand sequences of text that turns into words sequence of words that turns into paragraphs all of a sudden we have speech recognition that are superhuman we have the ability to now look at a video and caption it automatically so this software has learned what is in the video and what it means captioning another architecture came along called reinforcement learning where the network is given a value system and it tries and tries and tries and tries again exhaustively until it figures out how to improve itself towards the value system reinforcement learning how we learned to just about do everything as a result of that one network called alphago was able to beat the world's champion and go call alphago and was it was a feat that nobody thought would be possible for another 20 years robots are learning how to translate computer vision sight to kinematics hand-eye coordination and it keeps on going and now we have unsupervised learning we have the ability to use auto encoders to enhance images to fill in the boat the missing spots and then a breakthrough came along this last year or so I guess I guess I and goodfellas paper was 2014 but this last couple of years it's really taken off adversarial networks training two networks at the same time one networks job is to fool the second network and the second networks job is to not be fooled and so it's a little bit like one network learning how to be Picasso and generating images and paintings of Picasa while the second network is learning how to discriminate whether it is truly Picasso or not when you're done training this network what you end up with is the network that is able to draw like Picasso and you have another network that is able to recognize images and recognize paintings at a level of discrimination unheard of generative adversarial networks as a result all of assigned all of these new ideas for generation comes along we could use we create things like style transfers generating voice the ability to fill in empty spots or missing spots and photographs natural language translation to go from one language to another language and to learn it learn a pair of languages and then transfer that learning to other pairs of languages so you learn how to translate from German to Spanish and all of a sudden how all of a sudden you're able to learn how to understand from translate from englishdom to just vanish equally well zero shot learning transfer learning these just spotlight a few of the examples the number of papers and deep learning is just absolutely explosive there's no way to keep up and it's literally everywhere in the world that Big Bang that second MOOC that second dynamic of computing the Big Bang of deep learning has programs automation of automation computers that program computers artificial intelligence let me give you an example of something that we've been working on we also in our company we do a fair amount of deep learning research let me give you an example what's actually happening here basically this this network in the middle this is just an articulation of the networks it's called an autoencoder we're at we're asking this network we're asking this network if we gave it a distorted image of that a noisy image of that that it has to learn how to generate that image the beautiful image from that noisy image the way that that has to do that it has to figure out how to recognize the important features and eventually generate it automatically now one of the areas we've applied this to is ray tracing ray tracing as you know is computationally incredibly intensive following photons around as we as we try to regenerate an image is very computationally intensive and so one of the things we decided to do is what happens if we were to teach a network to fill in the spots that we haven't rendered yet okay to generate some of it and to automatically infer or to use artificial intelligence to decide what to fill it in with and so let's take a look at that this is on our ray tracer with deep learning on the Left let's see on the left here is without deep learning on the right is with deep learning notice how noisy it remains for some time but deep learning it figured out based on the surrounding things that it has already rendered and based on recognizing what objects look like what paint looks like what glass looks like it's learned those things and it's selected the right pitch that selected the right colors to fill it in with all by itself and as a result you're able to take this noisy image and turn it into a beautiful image is that amazing the implication is actually quite amazing we can now get distorted input from the sky from the internet and somehow we could have a network running here that regenerates what it's likely to be okay auto-encoders using deep learning for computer graphics that's great thanks guys now this particular image let's take a look at this this is the full scene yep let me not forget this this is now the full scene let's do the else ultimate stunt let's go outside and just like that it fills in it and it still it recognizes that it's a reflection of trees and it renders a reflection of trees okay fantastic thank you using deep learning for ray tracing the AI revolution is well on its way it started with the Big Bang in 2012 but since then it has grown incredibly there's just amazing things that are happening if you take a look at the major conferences nips ICML cvpr and ICL are the attendees has doubled in two years it's only doubled in two years because there's limitations in physical space the number of students who want to learn deep learning has 10x in two years on Udacity the number one most popular course on Stanford at Stanford its CS 229 introduction to machine learning the most popular course you would have thought home home egg was but the most popular course is is not is not being a movie critic anymore it's machine learning and in fact I understand that it's not just engineers not computer scientists but it's psychologists it's biologists it's oceanographers it's basically everybody deep learning has democratized computing not everybody knows how to program but everybody has data everybody has their own data and they could use that data now use the experience of their domain the experience of their career the experience of their professionals professions and teach a computer how to automate their work so teaching computers is something that I think everybody can do we have democratized computing and lastly the number of startups is just have been explosive of course building great GPUs by building great systems and system software's and all the middleware that goes with it the invention of kou DN n so that we containerized we turned into a library that really complicated numerix and mathematical prop processing of all the layers the convolution layers the activation layers the pooling layers all of those complicated layers into a to use library has been completely revolutionary we've kept on going there's all kinds of libraries that we've created now for the deep learning SDK that's available to framework designers and deep learning engineers all over the world we work with every single one of the framework providers in the world we have engineers that are working with each and every one of them so that we can integrate and optimize and make as wonderful as possible and as productive as possible these complicated frameworks which are basically high performance computing software stacks to run on NVIDIA GPUs every single framework on the planet supports CUDA every single framework on a planet supports NVIDIA GPUs and it doesn't matter which one you use they some of them have their own special characteristics some of them are better for research some of them are better for production some of our better for clouds a little more better for enterprise some of them are better for the highest possible performance some of them are better for flexibility there are so many different frameworks and we support them all we also work with the system companies to make sure that irrespective of how you would like to access a high performance computer whether you would like to build it yourself by going to the store and buying yourself a g-force Titan X or buying a fully integrated server from one of the world's large OMS our partners HP Dell and IBM Cisco Lenovo or you would like to provision it in the cloud one of the best ways to enjoy deep learning is for someone else to build this incredibly complicated supercomputer on your behalf and so we've worked with all of the cloud providers and at this point every scene all cloud company in the world has been video gpus provision in the cloud and so with this strategy we have accelerated the capability of deep learning made it compatible with literally every single framework on the planet and made it available to you however you would like to access it but we didn't stop there there are simple things that we're doing because we realized what an important computing revolution this is and that we can't just make the computers for that we have to understand how deep learning works and how how ai will impact society build ally in our NVIDIA research is doing basic research and deep learning we also have applied research I showed you one example of an applied research we're doing applied research in a whole bunch of areas I'll talk about some of them today we also partner with the world's top AI laboratories whether University of Toronto or Stanford or Berkeley or Oxford or Harvard or MIT or the the Xinhua University or the list University of Tokyo we have some 20 some-odd universities that will work with around the world where the greatest of brightest minds and artificial intelligence are supported and working directly with us one of the most important programs that we have in the company is called inception and many of them many of you in the audience are part of its program there are 1,300 startup so we're working with today that are focused on deep learning you need access to early access the technology you need access to resources you need access maybe to expertise you need access to market exposure sometimes you need access to funding we have access and we provide all of that as part of Inception 1,300 companies came out of nowhere this program is literally 18 months old there are all kinds of great companies here deep genomics and health care zebra and healthcare medical imaging we have in we have financial services FinTech we have retailer detailed there's autonomous machines amazing cool autonomous machines whether we only talk about self-driving cars a lot Zook said the self-driving taxi that they're building there's Drive AI who's building a software stack on top of Drive px amazing companies out there Blue River is using autonomous machines to make it easier for farmers to fertilize their fields one particular company that I want to mention that's really really cool and it's a company that is in fact not working on deep learning it's just the technology that they they create it's so vital to almost everybody who's working in big data and to company called map D a table must X company was the world's first to create basically a database engine on top of GPUs and he's been working on this for quite a few years you just recently open sourced map team you should all take a look at it is just completely amazing to be able to access databases so large completely in memory and be able to interact with it create graphs out of it query it with AI visualize it all in real time completely revolutionary stuff so 1,300 startups and inception and we're just delighted in really really proud of all of them let's give them a round of applause [Applause] deep learning an enterprise SAT is one of the world's largest enterprise software companies and recently they reached out to us and wanted to partner with us on deep learning and our two engineering teams have been working together on a new product that it's just super super cool it's called brand impact and the way that this works is this their videos being shown to by advertisers and and they spend some sixty billion dollars a year advertising their brands and advertising products but they really have no idea how affected they are so let let this video for you and you get a sense for SVP Brandenburg is a fully automated and scalable video analytics service for brands media agencies and media production companies detected brand assets afraid of the video and correspond to the lines in the panel below eat a live interactive review of original video footage overlaid with detection in the some review you get reports and grant exposure duration and size on the screen in the detailed view you get indication of each brand screen coverage and frequency of exposure graph as a V brand impact solution accelerated by Nvidia enables customers to measure the impact of brand exposure and their business performance so much sense it makes so much sense for a safety to be working on deep learning the reason for that is if you guys know that some eighty percent of the world's commerce is flows through the SCS AP ERP system almost ninety percent of the world's largest enterprises have their databases all within the s AP ERP system so they're sitting on a pile of data companies who are using s ap are sitting on a pile of data if we could figure out a way to use AI so that we could harvest so that we could we could find insight in that dark matter it would be incredibly valuable there's one of the reasons why we partnered up with that safety and the results in this is the first results of it and you can see a lot more to come well there's so many different startups that are emerging all over the world but the part of it that's really so you're seeing startups that are emerging number of application is exploding the number of industries that are being touched by this and simultaneously simultaneously the complexity of the models are exploding means this is Microsoft's ResNet the groundbreaking word that achieved superhuman levels have 7/6 Tilian operations 7x applause ok 7x applause just to give you a sense if you took all of the world's fastest supercomputers all top 500 of them you put them all together and you cause you if you could figure out a way to cause them to operate for one second together that is one extra flops so what this network this model that Microsoft created called the ResNet really really super deeds the deepest network at the time it would take seven seconds seven seconds for every single supercomputer on the planet the ones all the ones in the United States all the ones in China all the ones in Europe your gang them all together it would take seven seconds to process this network okay that's how much operation is inside very few of us think about the word sextillion every day these are big numbers these are a whole bunch of numbers and that is what a program looks like in the future well the program is getting larger and larger Baidu has has a network a model that's 20 exif lots large with 300 million numbers inside it and that's called deep speech - and google's recent neuro machine translator that does multi-language translation requires 105 exa flops the amount of computation necessary is just incredible in fact it would take approximately one server one CPU only server two years to run through this network one time this is the ultimate high performance computing problem and that's one of the reasons why we have to continue to push them live the living daylights out of computing ladies and gentlemen I would like to introduce you now to the next chaps chapter of computer be 100 this is made out of TSMC 12 now 212 nanometer FinFET and i'm just getting a little exercise up here 12 nanometer FinFET the part that is really shocking is this is radical limits radical limits basically means that it is at the limit of photo lithography meaning you can't make a chip any bigger than this because the transistors would fall on the ground every single transistor that is possible to make by today's physics was crammed into this processor 21 billion processors almost a hundred billion these little connectors a hundred billion vias to make one chip work per 12 inch wafer I would characterize it as unlikely and so the fact that this is manufacturable is great it's just an incredible feat 800 millimeters squared if you guys have an Apple watch on your wrist the die size is approximately like that okay so just take a look at your Apple watch gives you a feeling for 5000 processor cores in here seven-and-a-half teraflops of 64-bit floating-point 15 teraflops of 32-bit floating-point and a brand new type of processor a brand new type of processor called tensor core which results in a hundred and twenty teraflops of tensor operations a hundred and twenty teraflops unbelievable [Applause] this well a DRD budget is approximately three billion and this is the first one so if anyone would like to buy this it's three billion dollars I'll just take that in my pocket the memory system in our in our architecture is quite unique if you take a look at the way that most most processors are organized the register files are very small the caches are very big and the DA is quite large in our case the register file is huge 20 megabytes of RF register file so that the the memory is very very close to the processors and that's one of the reasons why the throughput is so high 16 megabytes of cash and we're utilizing the state-of-the-art the fastest memories that the world can make today is made by Samsung our partnership with them is terrific the two engineering teams have been working so closely together pushing the limits pushing the limits of how fast we can drive memories and we've been able to achieve 900 gigabytes per second it is just so fast and then lastly the second generation MB link gives us 300 gigabytes per second basically approximately 10 times the fastest PCI Express in the world today ladies and gentleman the tesla v1 - Tesla b100 Volta has a new instruction inside it's called a tensor cord it's a new CUDA tensor operation instruction that is both an instruction as well as data format it's a 4 by 4 mate and one of the most important primitives of deep learning a times B + C on a matrix a times B + C on a matrix and so the input is a matrix four by four 16-bit floating-point times B 16-bit floating-point plus C and we're trying to do that as fast as possible so this is the way Pascal did it and it did it incredibly fast at the time and the reason why it's incredibly fast every single row is multiplied by every single or every single rows multiplied by every single column and then when you're done it accumulates adds it all the way vertically into that green the output results and it doesn't incredibly fast because Pascal has thousands of processors because that Pascal is doing the thousands of times at the same time and that's the reason why Pascal was so fast however we felt that that just wasn't fast enough what we should do is do it in parallel and in parallel and so this is what the Volta tensor core does it literally does the four by four multiply plus C at the same time and it dumps it into result twenty times increased throughput really crazy stuff the net result is although Pascal the p100 is the most advanced processor the world's ever built one year later one year later Volta is one and a half times the floating-point performance general-purpose computing twelve times the tensor operations compared to Pascal for deep learning training and six times for inferencing I'm going to come back to imprinting a little bit inferencing of inferencing for all of you who are not familiar training the network is the first step very computationally intensive and the second step also computationally intensive not as intensive but computationally intensive is influencing the production the application of their work well that's that's Volta that's be 100 and let's let's go through a couple of quick demos you know you guys know that that it's a GPU so although I haven't looked at much about graphics it is able to do graphics and so 10 days ago 10 days ago I reached out to Deborah son Tabata son is is the head of the studio at Square Enix and this is as you know the 30 years of Square and they're well known around the world for the incredible cinematic production value of their films and video games and with with this generation with Final Fantasy 15 his vision Cabaye defense vision was to unify the pipeline unify the workflow unified the graphics engine of cinematic film and real-time computer graphics with a vision that some day cinematic film and computer graphics have essentially the same visual effects and so ten days ago I reached out to the bodice and I asked him if he if he could you know do something for all of you and he just jumped on the opportunity he first of all he apologized he said look I just don't enough time to really do anything at the level that Square would like to do however he dedicated his engineers they worked around the clock they basically haven't gone to sleep for 10 days and let's take a look at the great gift that Tabata sign and the Square Enix guys have done for us first emulator is leather stimulators amount of geometry in here the lighting system the soft shadows [Music] the character is directly out of the metal looks like metal you can almost touch the leather that's a good leather jacket guys that's a good leather jacket I just noticed that that's a good look I'm gonna have to call it the bottom son okay good job amazing and so so he also sent us he also sent us what he believes that it will look like someday in the near future and this is what video games will look like ladies and gentlemen a quick trailer [Music] Kings clay that's how I want my video game to look this engine is called the luminous tension it incorporates a hundred percent of all the physics processing that we've developed in game works and all the particle systems you saw the explosions the fire all of the destruction so beautiful all made possible because of Nvidia physics engine well let's think about let's look at something new several years ago when we announced Kepler which was a groundbreaking GPU was our first double precision it was our first double precision GPU it is the GPU that ended up in our nation's fastest supercomputer that Oakridge Titan it is the namesake of the GeForce Titan X as in the fastest supercomputer in our nation the fastest supercomputer you can build for yourself the tight net several years ago we demonstrated Kepler simulating the future of our world you know people always look at the past of our world by looking at all the images from the sky and we could we can learn about how our universe was created and informed but very few people really think about how our universe is going to turn out well in order to figure out how our universe is going to turn out we have astrophysicists in our company as I mentioned earlier one of the things that we're super proud of is we have computer vision experts we have astrophysicists we have quantum chemists we have all of these look we have molecular biologists we have people who are expert in these computational Sciences so that we can work with all of you to advance your work and so ladies and gentlemen Stephen Jones is going to give us what the galaxy looks like billions of years from now okay then it looked like a piece of artwork but that is actually a live simulation that we're showing an n body simulation code here courtesy of your own beador from Simone Portuguese part of leiden observatory and this is a simulation of the andromeda galaxy on the right hand side and it's hey Steven wait wait I need your demo to last less than a billion years all right I get going I've got a few moving so we'll start - yeah following Andromeda here and it's flying in towards the Milky Way and in five billion years that's orbiting it it's going to make a closed path it's going to swing past us and it's going to start spinning stars of the Milky Way and the Andromeda but gravity's going to inevitably take over and what you can see where we're running it on Volta and the number of stars can simulate a hundred million bodies per second we can see the bar structure of the Milky Way right there right next to us as it comes plunging back for its second pass towards the core of the Andromeda galaxy and then the cause get much closer and you start seeing stars being slung off to all sides one of the amazing things is you can look out into the universe and you can see galaxies colliding in just this way and you see the same kinds of structures as you see stars being thrown off in waves as the cause orbit each other and then finally merge and so about 5.3 billion years you can see the timer in the corner there the merge is finally this event this at the moment when all the stars gets up thrown away and we get left with one single giant galaxy the fusion of the two with a whole pile of stars probably including our Sun unfortunately just flying out into the universe all right thank you all right so so the amazing thing that the amazing thing about that demonstration that's why I go into the next one they may be thing about that demonstration is that five years ago we showed it to you on Kepler versus today basically was eight times seven to eight times performance improvement in five years seven to eight times improvement in five years so if you say 7 to 8 times performance in five years and during that same time the microprocessor sir has improved in performance by about 50% okay 50% another way to think about it seven to eight times in five years it's basically about seventy percent per year the benefits of accelerated computing okay so one of the one of the really important and only at Nvidia what engineers love so much watching galaxies make love that's science corn right there brother Thank You Steven thanks everybody that was incredible where is this beautiful okay now let's talk about deep learning so recently Cornell to the paper that was really amazing so what they were able to do is you guys know that that it's now possible to take two pieces of art and learn the learn learn learn the style of Picasso or Monet or Van Gogh and you could apply it to a photograph and it turns your photograph into a Monet okay turns your photograph until Monday it's called style transfer well the Cornell team dr. Byas team realized that that what the cleft behind is artistic meaning that that it's distorted the photograph is now distorted it's no longer retains its original fidelity it doesn't look like what it used to look like the buildings don't look like buildings anymore they kind of look like buildings it applies art to cats and dogs and grass and so what they would like to do is photo graphic style transfer it learned the style from one photograph and you apply it to another program you learn to soften one photograph you learn you apply to another program basically the way works well you know what I'm going to do is I'm going to I'm going to kick it off Julie the man please did I say right DeMarre a demerit demerit a B now some mouths gray I said that so that Madison could hear it she's in France and she's gonna come on come on hey come on hey let me have to start all of you okay all right so go do I have to click all right Julie delegate super play Julie is one of our deep learning computer scientists and she speaks print one one photograph with the first photograph we learned a style of that okay next please keep on going and we're going to try to apply it to this one now the thing that I has to understand is how to understand the structure as well as the style it has to understand the structure as well as the style because it needs to apply the right style to the right areas of the photograph and so it needs to understand a building to building a cat the cat you know water's water to walk away to walk away the clouds are clouds and when it applies it it generates it it's drawing the pixels one at a time regenerating the photograph in this new style and and when the corneal engineers wrote the paper they use the tight next to do that and it took three to four minutes to process this image this is now on Volta go hit start away and so it starts out here's this artificial intelligence network is trying to draw this image it's kind of generate the vintage from scratch it took this these this style and this image and it says I want to recreate something that's photographic and notice the beach looks like the beach the clouds still look like clouds and somehow the style of that image has now been applied to this image pretty great thank you Oh wha inserting that French in my mind just made me lose my trip with my track we're in trouble okay so deep learning deep learning style transfer now what was possible on tight neck in several minutes is now possible in a few seconds and you can now kind of get a sense for what deep learning can do what the artificial intelligence network is able to do is able to generate an image based on what you teach it now it did just learn from these two images it had to learn structure from lots and lots of images it had to understand what are important features from lots and lots and images and after it's done loading all of that you could give it two new images and say I want you to take this structure this image and apply the art the style to this image just does it all myself the performance of deep learning is everything cafe 2 is a framework that we worked on with with the team at Facebook and recently be announced that Kaffee 2 is now going to be both so ready it's really delightful to work with them and our engineers are working really closely together these frameworks are incredibly complicated we worked also with Microsoft on their cognitive toolkit one of the things that's really great about the Microsoft toolkit is that it's able to scale incredibly well let me show you the numbers if you look at cafe to the Templar performance 8 kepler's 8 supercomputing GPUs was able to train this network in 40 something hours 40 something hours basically almost two days almost two days last year with Pascal you can get a dgx one box and you can train that same network basically within a day and now with Volta and p100 you can train that network in a shift the productivity of these engineers is so vital and the reason for that is because there's so few of these amazing deep learning scientists and their time it scares the productivity of deep learning the magic of deep learning is so great that everybody wants to jump on this and get products to market and so the pressure on all of these engineers and scientists to deliver is incredible that's one of the reasons why there's so many acquisitions happening all over the world now when you finally get those engineers you want to make sure that they have the best possible technology the most productive environment to develop their network on so the difference between having to wait for two days versus literally one shift is groundbreaking but you want more than that and so CNC Microsoft's cognitive toolkit would used to be called the entik a can now train a networked multi node training of the resident 50 network which is a really gigantic network and it integrates our SDK called nickel collective co-op collective communications basically allowing all of the GPUs to work together as one big farm they are able to scale 64 Voltas together and turn what used to be days hours shift down to basically a couple of hours a couple of hours so with 64 Voltas behind that deep learning scientist now the iteration could be could happen much much more quickly one of the newest networks called it's called MX net incredibly popular and it's growth come out of nowhere and it's growing like a weed everybody loves it because it's so scalable it's flexible and we're working with Amazon to enhance it for Volta and recently they benchmark that on Volta and using LF TM which is a network for time sequence learning we're able to improve the performance dramatically from previous years and so now we can train it on one GPU loan MX net could be trained in just several hours to come and share some of their insight about the work that they're doing in artificial intelligence we have a special guest with us today this is Matt wood the general manager of artificial intelligence for Amazon we're so happy to have you come on join me on stage hey Matt [Music] good morning welcome hey hey doing good you have you Hey so so first of all you know I think that that everybody knows about Amazon artificial intelligence of effort even though you guys know working on it for so long they really really learned about it because of echo and Alexa yeah I mean we have been working with machine learning and deep learning for over 20 years at Amazon and it's become one of the one of the arrows in our quiver really across the organization from fulfillment all the way through to the what we're doing in taking this magical technology and giving it to all developers through AWS through defining entirely new categories of products and experiences like echo echo look and Amazon go you know one of the first one of my first recollections of Amazon was when Jeff said that there he was going to have millions of books in store and I still remember the book the book company the traditional book company said oh there's no way you could have two million votes in the store because who's going to go through two million books well the reason for the reason for that is because they didn't understand machine learning that's right in one of our very very early usage of machine machine learning was in driving discovery and search on our retail site the famous customers who also who bought this also bought is all driven through machine learning and that's really now are used extensively across the site for search discovery summarization you name it if using shopping on amazon.com you're interacting in some way with machine and deep learning systems under the hood now Amazon has multiple pillars of strategy and there's another pillar it has to do with robotics and the work that you guys do with echo and Alexa and then there's another the new the new pillar that you guys introduced I guess about last year was taking all of this technology that you guys have invented and putting up on the cloud so that everybody every developer every startup company in the world every enterprise in the world could benefit from these pre-trained networks of yours and create applications based on that yeah that's right so the original goal with Amazon Web Services the very very first business case was to be able to take technology that was only within reach a very very small number of very very well-funded organizations and put that within reach of any developer anywhere and you can see that that's been used now with millions of developers active on the platform every single month and we're doing the exact same approach as we took with compute and enterprise data warehousing and applying it to deep learning and artificial intelligence using computer vision model speech systems speech recognition natural language understanding and doing a lot of work in driving Apache MX net forward for all developers we've seen and being able to work together our two teams really to optimize MX net for voltar we couldn't be more excited we've seen amazing performance improvements both in training and inference and we're really excited to be a launch partner for when voltage becomes available and we'll make Volta available as the foundation for our next general-purpose GPU instance at launch that's amazing that's awesome thank you very much I really appreciate the support right thank ya everybody in the audience it probably wants to know there's anything I want to know what's the what's the funniest question that Alexa get funniest question that Alexa get you know what it's almost all AG by the way it's all actually almost always from kids I bought a new echo dot you can order it by just by asking Alexa for it it arrived I took it out of the box and my son was with me and he said hey Dad can I hold her not can I hold it but can I hold her and the affinity that customers have for Alexa is really incredible and growing every day we have a remarkable ecosystem both of devices from Amazon the parties integrating a lexer into their own cars and refrigerators and also 10,000 more than thousand Alexus skills available on the platform today there's a bustling ecosystem which I encourage everyone to experiment with it so one more question that I'm dying to know about you know Amazon today has the world's largest scale of GPU cloud and and I remember Andy telling me that that when you guys rolled out the GP into the club it was the fastest growing instance that Amazon ever had and and I guess I guess my question is how did you guys what did you guys see what how did you guys know that people want a GPU in the cloud and why'd you guys do it in the first place yeah I mean it really came from customer feedback about 90% of our roadmap at AWS is driven directly from what customers asked us for and they're really asking for NVIDIA chips available with utility pricing and availability and so we made that available we've been working together for years our most recent instance the p2 is just growing like wildfire it's being used extensively for deep learning in virtually every vertical we see medical imaging is even in regulated workloads all of your examples earlier the the FDA approved regulated workload of deep learning for medical imaging all the way through to the best performing autonomous driving simulation which is from a startup called too simple they do everything from real-time per pixel object segmentation to send to me to act your positioning of the car in three dimensional space or up and running on p2 using MX net on AWS today that's amazing thank you very much dang us thanks for intensive work Jesus thank you graduations thanks amazing words at Amazon revolutionize in computing as we know it creating the future with cloud computing the first thing I want to know is our brand new DG x1v this has become the essential instrument of deep learning research the work that they do keep learning scientists that are so heavy and building a supercomputer or high-performance computing cluster takes a great deal of time integrating all the software into it to make it perform it takes great deal of time and finding space for it takes great deal of time for many of them they either aren't quite ready for the cloud one quite ready for the cloud or would like to have the ability to burst into the cloud and so we created for developers the DG x1 supercomputing appliance dedicated to AI with Volta it has almost almost I wish I wish I just had 40 teraflops more because it would have been it would have been wonderful to say 960 tensor teraflops with a GPS inside it can now take what takes what used to take eight days on tight next to train now eg x1v yeah is that beautiful the DG x1v with a Tesla voltage and call your operator because we're ready to take orders you can order one today for $149,000 come to Nvidia comm have your credit card ready and we'll deliver it in q3 now only for you only for you in the room everybody who places orders now I'll give you because Volta is not quite ready to ship now it'll ship very soon you'll ship t3 and DG X's and q4 from OMS are all over the world for anybody who places orders today starting today you'll get a free upgrade to volt it when it arrives in q3 dgx 1v $149,000 replaces 400 servers or approximately a couple of million dollars worth of equipment and it comes out of the box plug it in and it go to work well we've been asked so many times would it be nice I don't have a data center and I don't have cooling I'm going to startup and I've got 10 engineers and what we need right now is to get working on deep learning could you please make us a small version of dgx and so so we we thought well that's an interesting idea so we prototyped up a few inside the company and of course putting that much computing power next to an engineer you really really have to keep it quiet so we liquid cooled it we liquid cooled it and it's whisper quiet you can't hear it at all and so so we created it and we can't make enough of it every single deep learning engineer in our company either has a dgx station or a dgx one or both every single one of the engineers and so this is a just an amazing amazing success inside a company and so I decided that we would make it and make it available to deep learning engineers all over the world ladies and gentlemen to be personal dgx called a dgx station and for those deep learning engine Tesla gets used in a dgx station in DG x1 for a personal AI supercomputer the other way that we use it of course is putting it in the cloud and one of the one of the ways that we use it is putting it into a public cloud when you put a service into a public cloud GPUs are used in a whole lot of different ways they might be used to train certain types of networks and different types of networks like different type of computer configurations they might use it to run molecular dynamics they might use it to run as Matt was saying earlier a segmentation or maybe even a simulation or computer graphics in the cloud and so we need this computer to be very adaptable one of the things that we did was we partnered with Microsoft to create the industry's first industry standard hyper scale cloud graphics accelerator notice there's a computer one there's a 1u computer underneath the server and there's these four tables that come out from the base computer into the HDX one and these base these cables are basically PCI Express and allows us to all kinds of different size services to the market and of course we still have the ability to virtualize the GPS so that many instances going to want to run on one GPU this particular box is intended for the public cloud so that the versatility of service could expand the reach of the customer base whether you're using it for deep learning or using it for graphics or using it for CUDA HPT computation hgx is really ideal I'd love to have Jason Zander who is the vice president of Azure come and join me we've been partnered together for such a long time working on this it's great to have you Jason welcome Thank You Jensen appreciate it happy to be here you know our companies work on first of all I'm really really grateful that you're here as you guys know right now simultaneously satya and harry shum the CTO and sequel CEO and the CTO of Microsoft are up in Redmond doing almost exactly the same thing we are our build comfort state so they all bill and so I'm really grateful that UW took the time to come down here and spend it with us we've been working with Microsoft sip for gosh 25 years yep and and we have we have all kinds of developments going on recently we've been working on of course advancing deep learning for internal research and and very very amazing well first of all Harry Shum XD Wong are some of the world's pioneering AI researchers in speech and the work that you guys did recently to achieve superhuman levels and speech recognition was a complete groundbreaking endeavor not only that you guys were able to create natural language translation for real-time Cortana apparently Cortana now can understand 40 different languages and of course of course you guys had a great groundbreaking work with resonant developing a very very deep very sparse network that was was a very powerful in the chief superhuman levels this this this AI laboratory inside Microsoft uses a framework that they call the cog in back in the good ole days they called it C NT K and which has now become the cognitive toolkit and tell me tell me about the work that you guys do there and and how you guys going to expose that to the rest of the world so that everybody else can have the benefits that yeah thank you yeah AI is a clear part of what we're trying to do we want to infuse AI across our entire platform so we start off with the platform our partnerships that we have cognitive services and brain works like cognitive cool kitten now the contouring tool can used to be cnc k but also into our applications because we want users to be able to take advantage of that as well make sure that's available to them it's we've been doing AI for a good 20 years now if you think back to things like search relevancy all the way up to things like Kinect sensors on the Xbox and most recently hololens in those environments you mentioned the real-time speech translation those Gaytan real-time translator that's one of the most sophisticated language deep neural nets that are out there it can take us hundreds of GPUs we can spend an entire day running this thing through but it's really cool because I can talk to someone in English and they can fear in Chinese and go back and forth and you say lots of languages can't do that without the power to a universal translator is going to have its awesomest I hear yeah yeah and so so I could just imagine so that's right so you guys working on hololens and hololens as you and I know and it is real-time 3d reconstruction it recognizes the 3d environment and and it augments computer graphics and register it perfectly inside reality augments reality and now if you supplement that with the Cortana natural language translator you're able to walk up to anybody from another country and you could talk in your language like I've been their language and the two of you get to have a perfect conversation it's really cool I mean the idea that I can trap photons and do things with it before it hits my eyeball is pretty awesome so that's another exactly great place where we're able to leverage the work that we've done together because doing the API and algorithm validation on the back end for the hololens also another thing we can use inside of the sleep combine that with real-time translation like you said I can have this really cool interaction a model of it too there and you can't do it without the deep partnership that we've had in already so far nobody would have guessed that Microsoft would be one of the fastest growing cloud providers on the planet and in a lot several years your Azure businesses has just been on fire and you know that had the support of not not not yet and and everybody is behind your business pushing pushing Microsoft Azure into into the world and and so tell me about some of the efforts there and talk to me about some of the collaboration that we have and how can we bring GQ computing to world yeah and it's really exciting times for us so we you know this is we're on our second generation of GPUs inside of the fleet we're the first to bring nm 60s and the in the K 80s we just announced the key 40s and the P 100's are coming and then of course we really really love the volt of workers coming out we want to make sure that that goes up there as well look you know my job is to make sure that both my internal folks inside the Microsoft developers that use the azure cloud in order to be able to do their training examples we just mentioned and what every time you come up with a new system that comes through they want it tomorrow and so our ability to come in make sure that it's part of the training regime that we've got it makes things go faster like your benchmark show but then we also want to make sure that it's available to all of you and we've done additional things for example our cognitive services we've had out we've actually had it up for about three years now we've been doing AI speech and text and these API is part of the services on top of the Azure cloud and we're bringing even more of those coming forward in fact we just announced the adjure batch AI training service that's out there would you kind of you know our history is a lot of developers stuff right we want to democratize that and so we want you know data scientists and basically developers that are working on models let them concentrate on the models in the unique IP and value they add and what's on the plumbing so you're going to see more and more of that coming through the system we think it will help accelerate some of the Sinai's that you're showing yeah it's really super exciting process idea is going to be super excited about the work that we're doing together you guys announced yesterday the largest installation of GPU cloud that Azure has ever ever announced and so I'm super excited about that thing thanks for your support and thanks for support revolta thank you so much all right thanks Jake [Applause] [Music] Jason Xander corporate vice-president Microsoft let me talk about inferencing so now we've created this Network and it's taken hours and hours of deep learning training on DG x1 or in in the Amazon Cloud or into Azure cloud with all these GPUs now that you have this network this network can is now ready to be deployed and that network is still very computationally intensive and we need to figure out a way to make that network run as fast as possible there's two things that we are doing with Volta that is really really special the first of course is the tensor core that I mentioned earlier it increases the throughput of training by a factor of 12 but it increases the throughput of inferencing by a factor of 6 and I'm going to show you the benefit event but the second thing is this whereas the frameworks are used for training the network when you're done with it it creates a graph and that graph needs to be optimized and compiled for the processor that you're using we call that needs to run in real-time and it comes into the comes into our comes into our software in the form of graphs and so the first thing that we have to do is we have to interpret from each one of the frameworks are stupid and in ingest from each one of the frameworks the second thing we have to do is compile it and the third optimize it for each one of the targets and each one of our GPUs that's slightly different architectures they have slightly different numerix precision and we have to take advantage of each one of our GPUs we call a tensor our T tensor runtime tensor our T graft optimization for vertical and horizontal layer fusion so if you take a look at this this is basically a neural network there's the convolution layer the bias layer the the the rectified linear unit later and basically the activation layer and this is one particular network that a typical network this is I believe it's Ellickson it what we do with it the first thing that we do we combine two mathematics that otherwise would have to be done in sequence into one big blob and so we turned that a real ulla bias and a one by one convolution into one by one convolution bias really and so that mathematical block is replaced so by analyzing the graph we could figure out where which one of the mathematical operations we could fuse together and replace with something much more efficient the second thing we could do is we could recognize when different mathematical blocks share the same inputs they have different outputs but they shared the same inputs and that again is achieved through graph analytics so we can we can walk through the graph we can analyze the graph to recognize these different different opportunities and simply remove them okay in this particular case and share them and then the third thing of course is compiled down to the precision of the target GPU now this is something this is the influencing performance let me show you Broadwell this is the fastest CPU of today skylake is the next generation CPU we haven't had a chance to benchmarking yet and so we're giving it the credit the full credit of what's possible ka T is the GPU that we announced five years ago okay and the y-axis is images per second how quickly it can do inferences in one second now p100 is able to do 600 images per second there are two important numbers not only the throughput important but the latency how long it took you to do it not how many you can do but how long it took you to do it the latency is equally important and the reason for that is because if you were talking to a neural net if you were talking to Cortana and you asked Cortana the question you would like it to respond very quickly in just a few milliseconds in just a few milliseconds and so the number of people who could speak to Cortana at the same time in the cloud is important because that has something to do with the capacity of the data center and the cost of the data center the second thing is the latency of that performance is equally important if not more important and the reason for that is because that has something to do with quality of service okay and so the purple line is latency the Green Line is throughput so p100 has the benefit of 600 images per second and skylake is about 300 next generation skylake and both of them could do it in about 10 milliseconds this is what bolton looks like that is what we call a little bit faster and so Volta is really groundbreaking work not only is Volta incredibly good at training it is also incredibly good at inferencing for the very first time we've not focused on inferencing in the past and the reason for that is because the number of pub the number of networks that are being created was still rather limited but now in our service providers and cloud service providers and startup companies and enterprises all over the world are starting to move deep learning into production they need an inferencing pipeline and so Volta tensor arti ideal for inferencing the way that we deploy it into a server is also unique and the reason for that is because the scale out servers gosh sandy why do you keep standing over there why I keep going over there sandy a stage manager she is fantastic ladies and gentlemen this is Tesla for hyperscale scale out look how small it is look at this thing it's like a CD case but more beautiful and it's gold it might actually be real gold okay so Tesla V 100 and this is what we call the PCI Express fh HL the sexiest name ever full height half late who does ad I couldn't rename it it was too late the industry took it and ran with it full height half length try to say that without chest ik okay so this is and knowing that it runs at 150 watts 150 watts 350 watts and it fits into these commodity inferencing servers and here's the thank you say anything this is the this is the case I want to make this is the reason why people are talking about accelerators for data centers I like to make the case for accelerators here's how it works and so this is what this is what five hundred nodes look like earlier I said four hundred nodes this is what five hundred nodes look like five hundred nodes it's basically this entire row of servers okay it's entire rows service five hundos well basically this fiber nodes can support 300 thousand influences per second three hundred thousand seven millisecond inferences per second which means if three hundred thousand people were connected into the Stata Center and did a query or okay something or look for something and those imprint networks were activated we could support three hundred thousand people in this rack three hundred thousand inferences basically translates to about thousand CPUs because as you saw earlier the state-of-the-art CPU the one that hasn't been announced yet is three hundred thousand three hundred inferences per second and so a thousand CPUs basically translates into two notes because each one of those little nodes long nodes has two CPUs inside if we added all of that together and a thousand at three thousand dollars per node and remember you have to buy the node there's the interconnect okay and then there's all of the power that goes along with it powering cooling represents about forty percent of a data center but let's let's ignore all of that for a second and so five hundred five very notes translates to basically a million and a half dollars not to include all the cables and all the power delivery and the cooling etc and it consumes 500 500 each so 250 thousand Watts 250 kilowatts well if Tesla we used a a relatively conservative number of about 15 X reduction that basically translates the 33 nodes of this that translates the 33 nodes that's the savings instead of 500 nodes that occupies an entire row you can replace it with 33 nodes or you could increase the throughput of your data center by 15 times and not have to build more data centers as AI workloads floods into hyperscale data centers this is one of the most important reasons why people ask us about about FPGAs and accelerators and so on and so forth and we decided why not make Volta the best inferencing machine that can possibly made and as a result the results are incredible amounts of savings okay so tesla volta for inferencing [Applause] let me show you one more thing that's really super important we've been working on all of these stacks and the stacks are so complicated you get a sheet whole bunch of GPUs whole bunch of drivers a whole bunch of systems a whole bunch of middleware all these different numerix all these different frameworks and there's so many frameworks there's so many GPUs there's so many versions of software the ability for the industry to maintain all of those complicated stack of software arguably the most complex deck of software the world's ever seen is really really really difficult and in fact when you read the forum's it is such a pain and for many people learning engineers it takes anywhere from a solid day to a couple of weeks to sometimes never achieving building a computer that can do deep learning and so what we decided to do was take take this incredibly complicated stack and containerized it we containerized it and we dedicated ourselves to containerize every single framework and every single version of software that we know and then once we containerize it we're going to create a cloud registry for it ok we're going to create a cloud registry for it when take all these containers we're going to create a cloud registry for it and here's the here's how it works so whenever you're ready and you would like you have a tight net and you're one of the several hundred thousand deep learning engineers in the world that have tightened axes and you're you bought a Titan X you'd want to build your own people learning machine you simply go to a website type in your email address register for this you download the container of your choice you download the container of your choice that entire stack of software is fully optimized it's fully integrated it's containerized it's virtualized we download that into your machine you start doing deep learning basically in a few minutes there's no configuration there's no building there's no worrying about different versions it's configured basically in just a few minutes uses a container called the env docker and we create a registry for it we support every single every single framework then the next thing is once you start using you used to use the platform you start to realize that gosh I sure would love to have a lot more performance and nothing would give me more joy than to tap into the ten thousand GPUs that are in the cloud with just a click you create an instance up there we download the container up there and we burst your workload into the cloud okay so this is really the first hybrid deep learning cloud computing platform and what we provide is just a registry would provide the registry the cloud computing platforms provide all of cloud computing infrastructure and we maintain the software optimize the software for as long as we shall live let's take a look at it so yeah so the NVIDIA GPU cloud provides multiple interfaces for developers and users to run deep learning I'm going to focus today on the web application running in a browser so we log in and this is where you create your deep learning job and there's just three steps first we select where we're going to run our deeper learning workload we call this the accelerated computing environment you can pick from cloud or your own dgx one cluster or a dgx station or a titan PC today we're going to focus on cloud and you can see the options you have here include the Microsoft Azure Amazon AWS and the Nvidia Saturn 5 this is the dgx supercomputer we built for internal development I'm going to select an 8 GPU node because I'm going to run a fairly heavy ResNet training session in second step you attach a data set or multiple data sets to your job here I'm going and you get a choice between the existing data sets that you've already uploaded or you can create and upload a new data set I'm going to go with the image net data set and this particular data set has both training and validation samples in it so I only need one data set for this run next we select the framework in a container as Jenson was just describing all of the different frameworks are provided fully optimized and optimized for scale out to moldy GPU so ideal for our a GPU run here I'm going to take PI torch and I'm going to take the latest release you notice the numbered releases we optimized update each of these frameworks every month so at this point my job spec is complete I'm ready to go but I want to notice that we echo out the command line that's equivalent to this web application run so that a developer can copy and paste this into a script and run it from one of our other interfaces such as our command line interface so now I start the job and that takes us to the dashboard this is the control center for the NVIDIA GPU cloud covers all the accelerated computing environments that you've been running in and you see the job at the top is the one we just created the image net 256 it's already running and you can click on any job to look at telemetry there's a job here on four GPUs I started a little while ago it's also resonate on PI torch and you can see we're keeping these GPUs running flat out as we run this training session that's great thank you good job Phil and that's it that's the GPU that's the GPU cloud the NVIDIA GPU cloud and it's a containerized system it's a registry in the cloud supports every single framework and the thing that we will do is we will support these frameworks and every single one of the versions and every single one of the permutations of them on every sync one of our GPS for as long as we shall live and that's something we do for accelerated computing for each one of the markets that we serve it's going to be available beta in July the NVIDIA GPU cloud platform ok [Applause] for all of you who are who are anxious to burst into the cloud this is the way to do it all right so let me summarize quickly on accelerated computing if you take a look at some of the results that I've shown you it is very very clear that as Moore's Law comes to an end that accelerated computing is really a wonderful path forward and it's a wonderful path forward because of the architecture because of the system software stack and because of the domains that we've selected and the fact that we work across each one of the applications middleware and the architecture in an iterative basis and over time you can see that the results continue to compound on the left is amber performance molecular dynamics on the right as Google Net deep learning in both of those examples we continue to advance I will not now talk about AI at the edge there I've got some delightful things I want to share with you I'm going to go quickly you guys know that we're really dedicated to the automotive industry and the reason for that is we believe that everything that's going to move in move someday will be augmented by autonomy everything that's going to move someday is going to be augmented by autonomy where we're enjoying as you know the Amazon effect as we every single day buy things don't forget we used to drive to the store to pick up things but now we expect the things that come to us the number of truck drivers the number of transportation professionals can't keep up with the Amazon effect and so we have to find a way to automate as much of that entire path as possible augmented take pressure off of the drivers the number of truck drivers for example in many countries is short by about 50 percent and so we need to find a way to automate some of this so that we can bring bring keep up with the Ottoman the Amazon effect not to mention we just society could be much much more beautiful and our environments could be much more beautiful if we didn't have any so many parked cars very few of us would remember that for every car that we owned there's actually three parking spaces created for us not to not to not to include not even including the one we already own and so there are 80 200 million parking spots in America there's only 250 million cars and it makes a lot of sense because we never know when we need it okay so so I think a ton of vehicles transportation will be augmented by autonomy in the near future we've created a platform called the Nvidia Drive and basically it's a road map it's an architecture that spans level two to level five from augmented driving all the way up to completely driverless systems we created a full stack we dedicated ourselves to go solve the self-driving car problem and create the software stack but open up the software stack to create the software stack understand a problem deeply but to open it up and keep it open so that hundreds of people can use it we now have 200 developers around the world using DirectX their startups or shell companies or car companies of trucking companies we have wonderful partners in Boston the largest tier one in the world's EF the largest trucking tier one in the world Pat card trucking company all the startup companies that I mentioned earlier today it's all possible because the drive platform is open and it's open in so many different levels you can intercept it at any level that you desire let me show you a couple of videos of what it can do and then I got a big announcement I like to share with you and so the first thing is this the first thing I'm going to show you is mapping to driving basically how the car figures out where it is in the world and localized within it and detect everything that's around it and drive the second thing that I want to show you is how we use AI not just for driving but also to be essentially a virtual copilot and third this is this is this is a phrase that Gil predatorial created that I really really love and it's called a guardian angel even when you're not driving even when the car is not driving for you excuse me even when the car's not driving for you it should be watching out for you and so the AI is on all the time even if it's not an autonomous vehicle mode it should be watching out for you so let me show you a couple of videos the first one is mapping the driving and so in this particular case the first thing that's happening is we're using the lidar we have a lidar mapping car and we're trying to detect everything that's around us whether it's the lane the vertical poles we're trying to create essentially an HD map of the structures in the world that we out of used to mobilize ourselves later okay so we detect all the road features we construct the HD map there amazing companies who are doing this we create the middleware and there's a great company called Keith Knapp who does this and of course TomTom does this here does this mapping company what partner would not be companies all over the world and now we put that into our car and it's an autopilot mode notice it's protecting your cars around it it's detecting the lane you're taking all the signs and now you have confidence that the BBA is driving safely in the road okay and the next one is copilot this is our own Janine I can now drive you to work based on mapping previous drugs shall I engage autopilot oh sure autopilot engaged driving to work janina's is in our marked marketing department and she's also one of our test drivers she's very very brave and so beautiful and brave all right so so she has the benefit of a co-pilot there and and well she has the benefit of a co-pilot there and basically what's happening is this of course we're not going to have the whole world map I'm going to map the world as fast as we can and wherever it is mapped the car should know and it just lets her know that hey gosh I could I could drive the car for you now which is likely to take over and the next let me show you guardian angel there's no free lifetime to go quick cross track of danger maybe not all clear disaster averted okay and so so this is possible because the car sees everything around it all the time well we have partners as I mentioned 200 partners around the world their startups their shuttle companies there they're even Airbus creating creating a self self flying plane that they would like to make available in the year 2020 it's a beetle takes off by itself it occupies the space of two parking lots and it's for two passengers and will go 70 miles I can't wait for this this is incredible and and so Airbus is creating an autonomous airplane there there a Dutch company using our using our platform for economists ships their companies who are using our platform for autonomous pizza-delivery and so keeping this open creating an open platform AI computer that everybody can use has proven to be incredible well today I'm just incredibly honored to announce that Toyota has selected Nvidia Drive px for their autonomous vehicles [Applause] as you know as you know Toyota is the one of the largest companies of world leather with the ninth largest company in the world they have 350,000 employees this is a company that is legend legend in so many different ways so much of modern management has come from Toyota Kaizen continuous improvement genji again whoops ooh I bet you guys didn't know about that one go to the source see it yourself before you come up with a solution just-in-time manufacturing the pull model of manufacturing not push so many of modern management systems were invented by this company it's an incredible company they dedicate themselves to the highest level of safety so they're working with us and the two engineering teams and are working incredibly hard together to create their autonomous vehicle car and we'd like to finish it and put it on the road in the next few years we would also have an have a mutual goal a combined goal of having it's cheating zero fatality someday and so this is going to be the architecture for their future production cars I'm super excited about that ladies and gentlemen Toyota on DRI px the processor inside this Toyota machine will be xavier some of the things that's really cool about this is this a lot of people are surprised how is it possible you guys could put thirty trillion deep learning operations in 30 watts well the reason for that is this and this is a question that has come up a lot recently about accelerators and CPUs and FPGAs basically the way you think about these processors the CPU and the FPGAs are general-purpose they could almost do anything you can make a CPU write you can write any software for you could use an FPGA to do any video decoder Ethernet chip you can make it do almost anything the flexibility of these two architectures comes with it the cost of it efficiency the cost of efficiency the next three processors to CUDA GPU is one is Pascal one is Volta and then the last one but the one on the bottom is called a DLA the DLA deep learning accelerator what some people would call for example a TPU they are specialized in their domains and so therefore they tend to be much more energy efficient they much much more performant within any constraint within any constraint I've already talked about how CUDA has been improved with the tensor core and so Pascal's level of deep learning capability although great is now enhanced by Volta you could take that even further you could create a custom ASIC with we call the DLA and you can improve the MGS efficiency by another third okay you can improve it by another third and so this basically gives you a landscape of all the different processors in the world for our data center products we need it to be flexible we need to we need to be able to run almost any network that comes our way and so the flexibility of CUDA is too valuable for general-purpose computing platforms for data center platforms however in the case of Xavier we've done something very special what we've done is we've created a processor out of these three in order to drive you still need to localize you need to reason about where you're going you need the plan and the planning process is highly parallel and requires 32-bit floating-point process and so we decided that we would use CUDA for that we also know that there's all kinds of interesting networks that we are about to deploy not just computer vision object detection networks but much more sophisticated networks for the future and we need program ability for that and so Xavier includes the CPUs single credit performance capability could as flexible parallel acceleration capability and it also includes a deep learning accelerator that provides for specialized functions in computer vision well the thing that's really that's really great about this is now we have an architecture that is both programmable super energy-efficient robust for all kinds of networks that we can come our way can run the entire software stack of self-driving cars and as when we're thinking about building this a similar building this we realize that there are so many companies in the world who would value having the ability to create a deep learning accelerator we understand the space so well because we understand the entire pipeline we understand the entire pipeline from the beginning of the creation of the network all the way to deploying it into the into any environment we understand and we have the software stack across the board and so what we decided to do was realize that gosh it's so incredibly hard to put all that stuff together why don't we accelerate the adoption democratize deep learning lower the barrier of entry for every single of the trillion devices in the world that someday that will use deep learning we're not going to build them all we're not going to build them all however we would like to see adoption to accelerate for all of them and so ladies and gentlemen today we're going to open source the Xavier DLA [Applause] the best engineers in the world are working on the deep learning accelerator we're going to take this accelerator what some people call TPU and when to open-source it and you can continue to follow our instruction set or you could decide to change it if you adopt ours well continue to support it with software if you decide to change it it's no big deal either okay it's completely open-source we're going to early access it in July a lot of general access in September our goal is proliferated robots robots is the ultimate version of artificial intelligence the robot is interesting and it's going to revolutionize a whole slew of new industries from manufacturing to healthcare we know that robotic surgery is now able to perform surgeries that we simply can't imagine however don't forget that in the future we're going to have cybernetics we're going to have robots that are connected to parts of our body we're going to have little tiny robots or take care of various tasks robots unfortunately are incredibly hard to do and the reason for that is because it has to sense the world which we know how to do now it has to learn from it and plan and take action but it has to interact with the world in the case of a self-driving car our fundamental goal is collision avoidance that is a specific objective in the case of a robot collision detection is essential your goal is to connect your goal is to collide how you collide is incredibly difficult to do there are so many degrees of freedom and all of the joints and fingers and elbows and wrists and torsos of your of your of a kinetic object that training it is incredibly difficult and so recently some breakthroughs have happened this is this is a de at the Berkeley Berkeley AI laboratory and they taught ADA how to how to play hockey and that's a that right there is probably one of the world's finest AI researchers and that'sthat's what he does for a living and that's a de learning learning repeatedly using reinforcement learning how to play how to play hockey now it turns out that hockey is not too bad first of all convincing people to do that for all of your robots as you train it it's probably difficult to do however remember that's just hockey what if we wanted to lift the car what if we wanted to open the door what if we wanted to cooperate with a doctor to do surgery surgery there is no way to have it learn this way repeatedly in the physical world and so the answer is this it turns out we need to create an alternative universe we need to create an alternate universe this alternate universe has to obey the laws of physics in the sense that that there's collision detection it ablaze the laws of physics and gravity if you choose if you choose it has to be visually photorealistic has to look like the world or these robots can't learn within it it has to look like the world that has it sound like the world has to behave like the world and it has to have the ability to learn inside this alternative universe we have to be inside it as well so that we could teach it using imitation learning using all kinds of new deep learning techniques so it has to be a virtual environment that we could be part of as well this alternative universe that is visually real that it's physically real has to have one additional characteristic it should not follow the laws of time the only thing that we expected to do is to be hyper spaced we would need it to train in warp speed and the reason for that is because we there's no reason for us to have to wait around for it Ida to learn how to play hockey in the physical world in real time and so we created a new tools a new world a new simulator we call Isaac I think is named after the - Isaac Isaac Newton from physics and Isaac Asimov of AI we brought the two isaacs together into this new world we call Isaac Isaac has the input of environments and robots so we could put virtual sensors of the robot and would put with the virtual actuators and the virtual effectors of the robot into it and it's connected to open a eyes open AI gym where all that reinforcement learnings and all the interesting algorithms that could be applied we would simulate in this environment and run it on top of any nvidia gpus and inside that computer is a virtual brain and when we're done with it we literally take that virtual brain and put it into a real robot and this robot wakes up almost as if it was born to know this world and then the last a little bit of domain adaptation that it does is done in the physical world and so when a robot turns on it has already been pre trained if you will and it would be pre training in the isaac world and we as a company have the unique capability to bring this capability together we know how to do physics in super real-time we know how to do amazing computer graphics we know how to do AI and so we decided to integrate it all into one thing we call Isaac so ladies and gentlemen let's take a look at so notice when Mike moves the puck or moves the net I think somehow figures it out oh now this all looks wonderful okay this is all in in computer graphics now one of the things that we could do is of course we can repeat we can replicate a whole bunch of Isaac's and have them all learn and then we take the smartest one and we take the brain out of that smartest one and we put into everybody else's brain and then we say okay now start again and then we figure out which one's the smartest one and we take that brain with Clinton else's and we say start again and so as a result we could accelerate the time for learning look at all these items all it's trying to figure it out they're all trying to figure it out and so that's great let's uh let's take a look at some of their some of the clips from when they were learning [Music] it's got to figure out where the puck is it's not excusing it's got to see the puck figure it out it's got to figure out how hard to hit it you got to figure out where to hit it [Applause] [Applause] [Music] that's incredible now because ladies and gentlemen Isaac well look if you can learn how to play if we can learn how to play hockey it can surely learn how to play golf and so here's Isaac learning how to putt and so so uh let's do it there's Isaac reach the green read the green nine now remember this no programming was done no programming was done Isaac just sat there and tried and tried and tried and tried until I figured out how to play okay not strong enough well let's get really oh no way all right let's give oh I was going to say let's give Isaac something hard to do let's move up here Cubs alright ok good job thank you very much ladies and gentlemen Isaac so the next time next time next time you guys see Isaac I hope that Isaac is standing in the middle of a sand bunker and we'll teach you how to chip out and see we could sink it ok so Isaac this is a robotics simulator the next stage of AI where you have to sense the world figure out what you learn from the world and plan about what you're going to do act take action since the results of your action and you go around in that loop and we create a simulator that makes it possible for robots to learn inside this inside this inside this virtual world it's alternative universe and hopefully as we put this out there there'll be all kinds of robots that are going to be created and they'll use reinforcement learning and transfer learning of imitation learning and other forms of learning in order to program these very sophisticated robots that's it that was what we talked about today we talked about two things we talked about the fact that accelerated computing is really coming to the fore this is the rise of accelerated computing our GPU computing platform has been adopted adopted by high performance computing scientists all over the world and recently with the advent of deep learning and the breakthroughs there we're now seeing a new era in computing we call the era of AI Volta is the next generation next giant leap into that new world we also introduced several things with Volta one is a brand new instruction set we call the tensor core second a full compiler and optimizer for inferencing we call tensor RT between tensor core and tensor RT not only do we accelerate deep learning but we also make it possible for hyper scale data centers to deploy deep learning broadly into their data centers without having to build more and more and more data centers we can save fifteen times the cost of deploying deep learning into the world we introduced BG x1 + BG x1 station both of them are open for orders now we also have the support of every single cloud computing partner in the world and so you can find the nvidia gpus with all of our software on Alibaba and Amazon and Baidu and Facebook and Google and Microsoft and $0.10 every single cloud will have this one entered architecture one of the things that we've done is introduced this idea of a containerized software stack that allows you to use one software stack without having to optimize your computer from premise to cloud and this way when you're ready to burst into the cloud that software stack is already in the cloud registry it's all fully optimized and you don't have to reconfigure it we also announced that we would like to expand the reach of deep learning to proliferate this capability to democratize the ability for every single IOT device around the planet the trillions of them that are going to be able to use deep learning in the future to be able to access a world-class design we're going to we're going to open source the xavier dla design we're going to continue to advance it continue to improve it and with people who use it will continue to gain the benefits of it we also announced two new to other things toyota has now announced their support and creation of their future autonomous vehicles based on the Nvidia DRI px platform and then lastly we announced the first ultimate alternate reality virtual robot simulator so that we can hopefully work together to bring the future of artificial intelligence and robotics to the world that's all I have today thank you guys very much for coming have a great GT see it's great seeing all of you [Music]
Info
Channel: Dark Sorrow
Views: 62,800
Rating: 4.5452633 out of 5
Keywords: GTC 2017 Nvidia, Nvidia volta, Nvidia GTC 2017 volta, Nvidia GTC 2017 tesla, GTC 2017: Nvidia gpu technology conference, GTC Nvidia conference
Id: -JvNAzj0iKk
Channel Id: undefined
Length: 134min 44sec (8084 seconds)
Published: Wed May 10 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.