SC19: NVIDIA CEO Jensen Huang on the Expanding Universe of HPC

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I am AI I am a creator freeing our imaginations and breathing life into our wildest dreams I am a guardian keeping us safe on our way home and wherever a curiosity takes us I am a visionary [Music] anticipating the needs of others [Music] and simplifying our busy lives I am a protector keeping our most magnificent creatures out of harm's way [Music] and helping our heroes make it home safely I am a healer decoding the secrets from within providing precision when every second counts I am an innovator finding smarter answers to complex tasks working in harmony to lighten the load [Music] and driving perfection in everything we create I have even the composer of the music you are hearing a brought to life by Nvidia deep learning and brilliant mind everywhere [Music] ladies and gentlemen please welcome nvidia founder and CEO jensen huang [Applause] welcome to supercomputing 2019 our universe is expanding at the speed of light [Music] several of you noticed the four notes what's the excuse for the rest of you [Applause] that is practically our national anthem the HPC universe is expanding in every single direction at the same time when we first came to this conference the HPC world consisted of supercomputing higher education both doing science and the only industry involved was really oil and gas today higher high performance computing reaches out to Internet companies developing AI conversational a I help you recommend information music videos groceries products otherwise too much information on internet for you to search it's used by self-driving car companies developing the AI for their cars as well as inside the car itself and for all of you got here with uber or lyft it was the system that helped them orchestrate drivers and writers HPC is literally everywhere today it's in supercomputing centers it's in the cloud it's at the edge one of the most important and the most exciting developments that we're seeing high-performance computing today even the computational methods are changing first it started with first principled mathematics simulation approaches now more and more you're seeing data driven approaches and the fusion of the two all of these changes are happening at exactly the same time I'm going to try to touch on all of it today so I better get going and videus purpose is to advance this form of computing we call an accelerated computing when accelerated computing platform company we apply this computing platform this incredible capability in three basic fields in computer graphics in simulation of physics and artificial intelligence in all of those fields the one commonality is simulation we're simulating light in the world were simulating physics we're simulating human intelligence accelerated computing has made so much progress in the last decade and a half since we really proposed the idea of using it for general-purpose computing and a progress is really quite amazing but it all starts with computer graphics this is just an area that is still progressing an enormous rates and we're making enormous contributions this year last year this year actually last year we introduced a brand new method of computer graphics we call r-tx r-tx is the world's first real-time ray tracer something that we didn't expect to do frankly for another 10 years an algorithm that was discovered by one of our researchers some 35 years ago we finally made real-time this last year we call it RTX let me show it to you this is our TX this is a beautiful Lamborghini well basically what we're doing is we're simulating a light race as it bounces around the environment and as it bounces around the environment and intersects with geometry depending on the geometry it reflects or refracts it might get absorbed depending on the material the subatomic structure of the material or the photoelectric properties of the material it may decide to become quite reflective or it could scatter in a whole lot of different directions all of these properties are now simulated in real time we also do this in the physically-based right way so that it conserves energy and as a result computer graphics looks photo-real everything that you're seeing here from the shadows the reflections the lighting the ambient occlusion that little crease of dark shadow that you see between geometries the ambient occlusion the global illumination light bouncing around eventually reaching those vents near the back of the car all of these different effects are now computed in real time and as a result things look photorealistic what are we moving around a bit are you guys moving all right look at that all completely in real time the reflections off of the car paint we're simulating the car paint like it's car paint the material on the ground the cement all the subtle shadows the lights or what we call area lights which are really really hard to do because lights are coming from every single direction and therefore the arm brass and the penumbra's of the shadow comes together in a really soft and delightful way lighting shadows all behaving according to the way you would expect the natural lighting and the natural reflection of the car paint off the ground isn't that beautiful you guys what do you guys think [Applause] what you're looking at right here is running on one g-force r-tx graphics card this is one GPU we're now able to do real-time ray tracing on one GPU something that would have taken an entire cluster of GPUs just a couple of years ago or in entire rooms of CPU servers to be able to render photo realistically like this okay that's fantastic thank you very much good job [Applause] RTX gosh it's just beautiful you sit here and just stare at this for a while computer graphics still the driving force of a lot of things we do gives us incredible joy and it was because of driving towards photorealistic computer graphics that ultimately led the GPU to become the most powerful processor in the world and then led us to this conference it led us to working on high performance computing we did a lot in high performance computing last year we made a lot of progress this year 40 plus so new supercomputers on the top 500 our GPU powered by Nvidia we're super proud of that but you know what were super proud of what when incredibly proud of is the fact that we introduced the world's first to AI supercomputers it was a big surprise to everybody were working with indeed our logic was that not all simulations need to be physically informed sometimes in fact we have so many different physics models that are cobbled together into a meso scale simulator we might be able to learn that from data to predict a future we might be to use AI fused with first principle computation methods to accelerate simulations in a way that otherwise would have been impossible and so we introduced the world's first AI super compute and the results were absolutely amazing right off the bat the summit was able to achieve high fidelity climate simulator simulation result and they learned how to detect extreme weather patterns using a neural network and it was the largest run of a training model ever done exceeded XO flops and so we were able to make tensorflow distribute across a very large number of GPUs and train a neural network to detect extreme weather patterns by doing so we should be able to in the future notice future extreme weather patterns developing and as a result provide early warnings to get people out of the way or to alert businesses alert ships to avoid it and save people's lives this one's incredible the folks at Oak Ridge and the VA we're trying to figure out why it is that only 10% of the people who are using opiate opioid drugs become addicted to it and so they took hundreds of millions of examples of genetics information and it compared and look for patterns and several hundred trillion quadrillion comparisons were made in real-time and as a result they were able to go see what are the different generic genetic patterns out of our three billion or so human genome pairs that contribute to addiction this one's interesting well this one I used the a trainer called Mendel which is a evolution genetics evolutions search model that allowed them to discover a new a design a new model that was able to predict and detect cancer on on these slides that that biopsy slides that they were they were taking and so here here's the here's the amazing thing 20 million people are diagnosed with cancer each year and so they're taking these BioSpace biopsy slides and these biopsy slides are a hundred thousand pixels by hundred thousand pixels so much larger than any photograph you would take and they're scanning these biopsy slides and they're coming off these scanners at about two per minute however using state-of-the-art neural networks it would take us about half an hour to detect whether it has cancer or not and each one of these biopsies has ten of these slides so each year the United States would have to process about 200 million slides if we didn't figure out a way to make a neural network detect the cancer accurately and quickly then the ability to process these slides would be limited and obviously 200 million slides would be a great pressure on the entire healthcare system and so they used this Mendel learner to go and explored the large space of different models and a whole bunch of models were being trained at the same time and somehow they discovered one that was both accurate and fast they discovered a model that was 16 times faster than today's state-of-the-art this one is at the Hanford Site where the the work started on the Manhattan Project when the Manhattan Project stopped in the late 80s and the work that was done there stopped create developing weapons-grade new tone plutonium and left tens of millions of gallons of radioactive waste underground and about a hundred square miles of surface water was contaminated and so the cleaning crew is trying to figure out whether their contamination containment process was effective and so they drilled these wells about a thousand of them in this hundred square mile area and these Geiger counters are down there trying to figure out whether the radioactivity is subsiding or spreading well they came to the conclusion that they needed over a million of these wells and so the method in order to properly detect whether the radioactivity is subsiding or that it's spreading and so they came to the conclusion that the way to do that was potentially use artificial intelligence that is in Spa informed by physics that they know and then trained with the ground truth and tested with the ground truth that's collected from there thousand wells to create a new type of neural network they call physics informed Gans and it would produce information it would produce data that's plausible this is one of the really amazing things about the game work that's been done recently using conditional Gans and the information that you do have and it could be it could be informed by physics or something else other other pre known patterns and it could learn by itself unsupervised ways to imagine plausible examples of data like it and so they were able to generate data and essentially simulate what they would have detected from a million different sensors that was in fact only collected from a thousand all of these examples were computation done at the petascale level this is the type of science that we would have had to wait for several more years to be able to accomplish and now using artificial intelligence and this new type of computation we call tensor cores were able to achieve petascale science today the most important achievements of our company that benefits all of you is a combination of a full-stack optimization most people think that the work that we do starts with the GPU and it does and in fact coming up with a great processor it's the beginning of the journey of accelerated computing but it takes a full stack and it takes a full village in the entire ecosystem including all of you to help us accelerate science our stack basically looks like this we have the cuda stack the CUDA X stack which is this is our architecture of our general-purpose computing GPU our CUDA X is a collection of libraries depending on the different domains that we're trying to accelerate and then the applications on top are application specific vertical application specific domain frameworks that are used by our partners and customers and developers to create their applications this last year this is a small sampling of the different releases that we've had could attend now supports arm and is now interoperable with graphics so that's a great release nickel now has the ability to scale up to 24,000 nodes 24,000 GPUs Dalio now supports tensorflow 2.0 tensor RT 6.0 our sixth generation this is our optimizing neural neural network computational graph optimization tool in run time tensor RT now supports RN ends and also company a coverlet conversational ai model natural language understanding model we call Bert qu DNN 7.6 the latest release has the ability to understand Bert models as well as dynamic shapes who blasts this is our matrix operations matrix operations libraries qu solver for dense and linear solvers two tensor four linear algebra but accelerated by our tensor course spark XG boosts gradient boosted tree machine learning model Rapids this is our data science application suite it's open source from data frame processing to efficient learning to graph analytics optics our path tracer and index our 3d volumetric render all of these look at the numbers the releases are 6.0 10.2 2.1 7.0 we're dedicated to accelerating all these different fields of science and all these different types of applications for as long as we shall live and the thing that is really really cool is if you take a look at this on one on this suite of applications some molecular dynamics some chroma quantum chemistry this is a tensor flow so fluid dynamics this is on one particular platform one cpu four GPUs for volta 100s since 2017 this collection of applications with their associated codes ran 27 hours two years ago without changing the hardware at all release after release after release after release as we continuously relentlessly optimize the stack from the libraries to the solvers to the kernels all the way up to the applications working with each and every one of you the applications got faster and faster and faster almost 27 hours just two years ago is now 10 hours the simple way to think about this is you either saved three times the money or for doing your computation or you're now able to do three times as big the science this is the ultimate benefit of accelerated computing that we can work and innovate at every single layer of the stack so that we can ultimately accelerate your science so this is not possible without all of you jumping in working at working with us side by side and as a result look at the incredible achievements just software alone we're able to move faster than Moore's law so I want to thank you all for that this is probably the single best understanding of our company we dedicate ourselves to this I'm super excited and is super proud of this and all of our computational mathematicians and all of our architects and system software engineers that work with all of you this is this is ultimately our our best scorecard the progress that we make in accelerating science year after year after year and it benefits the entire install base because CUDA runs on NVIDIA GPUs across the board whether it's in the cloud it's in your supercomputing centers and your data centers your PC or in your laptop anything with an NVIDIA GPU in it whether it's GeForce a Quadro or tesla it doesn't really matter it could even be in an embedded system we call Jetson these applications and all these benefits accrue to the entire install base we just make the install base better and better and better all the time high performance computing is expanding in every single direction it started with simulation of course and what I just mentioned was largely about simulation those largely about simulation and every single time we come to supercomputing we talk about more flops and faster simulators and those are all fantastic things however most of you will know that all the Sun the world has changed tremendously and the reason for that is because the simulations we do is now spewing and generating so much data that one of the greatest challenges is the analyze the results that you get more and more and more we hear of people that say they are limiting the size of their experimentation and simulation because they simply don't have the ability to analyze it anyways and so simulation analytics creating a simulation that generates 150 200 terabyte terabytes of data now has to go through the network go through storage and somehow be read out networking becomes a gigantic challenge storage becomes a gigantic challenge and one of the most exciting developments that we're seeing right now is putting intelligent intelligence and computation at the edge so that we could have all kinds of rich sensors remote sensing is going to become software-defined the type of remote sensing we can do in the future is just incredible and so all of a sudden high-performance computing is moving in every single direction and even as I mentioned simulation itself is changing from computational methods to also data-driven methods all of these areas are undergoing incredible change the dynamics that you hear about artificial intelligence coming into this industry and helping science be bigger and better edge computing cloud some incredible developments in cloud computing for HPC the single most impressing in the world I think it's something like over a hundred billion devices in the world have this instruction set call arm is now ready to come into HPC several dynamics that makes it very important data analytics it's like using a spreadsheet on a hundred terabytes in oh where is that needle in the haystack data analytics is one of the greatest challenges in supercomputing today and all of that results in understanding IO at a much deeper level and storage is now the limiter in so many different fields of computation we do we have that in video we have three of the world's top 500 supercomputers that are running could tenuously at our company and we're using it to train your networks for computer graphics and imaging and basic research self-driving cars robotics we're working with all of you using deep learning to advance science those supercomputers those HP seas are running full out and in every single case the storage is the limiter just reading data in and out of those systems has become a great challenge and so extreme IO IO is now going to be one of the greatest areas of innovation and we have to put a lot of energy into making that better so I'm gonna touch on each one of these six things and tell you some of the things that we're doing AI I condensed everything about AI into two recent recent developments and deep learning into two points the first point of course is Alex net - or 2012 and I still remember coming to supercomputing and showing you some of the early works that we had with deep learning shortly after that the incredible breakthrough of cnn's is one of course that's a deep deep neural network and so it has the ability to learn features important features in a hierarchical way each one of the layers are differentiable so you could teach it with stochastic gradient descent and the front part of that CNN uses convolution so that it could has has the ability to find interesting features important features irrespective of its orientation or whether it's a facing slightly different way or it's scale is a little different it could identify those important features and learn it over time through a lot of data CNN was a really great breakthrough as a result some of the challenges that we've always had with computer vision those impossible applications to write we're now able to write Alex that was a breakthrough computer vision has now achieved superhuman levels since alex net and CNN the exploration of this entire space has been absolutely amazing the type of things that we're able to do now with deep learning computer vision is for all of us just incredibly delightful of course object cognition object detection classification are all at superhuman levels segmentation superhuman levels we can now use 2d to extract 3d geometry we can of course use the technology to enable robots and self-driving cars it could even imagine images that it has learned about a particular domain so for example you could have a schematic of pixels as you look at on the upper left up to right there with image generation this is one of our one of our really really great works we call Gauguin you give it a a segment to pixel and it comes out with a beautiful painting that that I had to learn from a whole bunch of other images it could learn 3d pose not just to Depot's not just pose but 3d pose where all of the points are in space and so Alex net has started the computer vision revolution if you go back and think about the industries that this is going to impact whether video analytics transportation robotics this one innovation will likely result in trillions of dollars of industrial impact over time the largest industries that we know of today whether it's a manufacturing or transportation are going to be affected because of this that one breakthrough it happened because it was a neural network that was in dormant if you will waiting for technology to happen waiting for us to come along and then all of a sudden one day the confluence of the fact that the internet was there and there was a large collection of images that were labeled but most importantly the computation finally arrived that made it possible for Alex net to be trained that was the Big Bang of modern AI and everything else after that is a little bit history and so for the last seven years we've been pursuing computer vision meanwhile developing this new area called natural language understanding if computer vision is our method of encoding our understanding of the world it encodes our understanding of unstructured data of the world then the next breakthrough is the encoding of human knowledge language is the encoding of human knowledge it's the h.264 of knowledge all of the knowledge that we've amassed over time is encoded in the language that we all share well in 2019 about August time for a maybe August timeframe in last year 2019 or so Google announced a paper and this model called Burt bi-directional encoder representation of a transformer a transformer is one that learns language but not in sequential way not one letter and then one word after another word it learns it simultaneously using CN N and a this concept called an attention model and it has the ability to learn the sentence the structure in both directions because it's hard to understand meaning sometimes Burt went off and inspired a whole bunch of new language natural language understanding models and now NLU has achieved superhuman levels whereas alex net and recent computer vision algorithms can detect and recognize and classify images at superhuman levels we now have natural language understanding models that takes this test called glue which has a whole bunch of tests like reading comprehension and has achieved superhuman levels as well the implication of this is tremendous utterly tremendous all of a sudden things like the ability to answer questions the ability to translate have a conversation where the computer because it understands what is it that you mean was your intention search is transformed recommendations are transformed the way that we interact with the computer is going to change in a very profound way I can't wait for Bert and other derivatives like it to essentially eventually help us understand and decode the human genome and understand all the different variations and permutations of it and mutations of it it's just going to be around the corner these two breakthroughs in combination has implications in all of the fields that we know how we do science in the future is going to change I can't wait until somebody could write me a summary of any particular field that I'm interested in and summarize it for me in a way that I can understand and I can of course dig into it further if necessary how it could help healthcare would be incredible just to present with doctors the latest breakthroughs in medical research so that they don't have to code through all of the latest journals incredible advances in AI we're really really proud to have been in the right place at the right time and the work that we did the relentless pursuit for more performance at all times irrespective of any reason our company is driven to advance performance for all the fields of science just relentlessly it's not based on any particular reason we're just trying to make it faster all the time and in fact if you take a look at our progress this is really quite amazing our progress over the last five years five years ago when I was here talking to you the GPU were shipping at the time was called Kepler this is probably one of the most defining GPUs in high-performance computing it really assured into the era of accelerated computing for our company indeed and k80 servers if trained on resident 50 would have taken 600 hours 600 hours this would have been something that would have been impossible for Alex Kirsch F ski to have done 600 hours is a rather long time to train a particular model because there's so many different iterations and experimentations you have to go through and so to train this one model just this one time sweeping through all the very various hyper parameters learning from the data training this model 600 hours we now do it in two now when you compound all of that it's basically some 30x over five years 2x every year is essentially 32x and so this is doubling every single year this is moving at super Moore's law because of this relentless pursuit and because of the programmability of our architecture and the work that we do with the entire ecosystem so that every single model that wants to be trained or can be trained can be trained on top of our GPU and therefore it can be influenced on our GPU we've been able to achieve excellent performance in the industry's benchmark this is ml perf we were number one in training twice in a row and the inference benchmark just came out and we led that as well the fastest the best platform for deep learning but what's really amazing is the pace of which is moving this is uh this is ILA's work at open ai and he's been tracking the amount of computation necessary to train the state-of-the-art models over time now several things are happening at the same time the first thing of course is that the models are getting bigger if the models are getting bigger then the amount of data that you need to train it has to be proportionally better proportionately larger and the reason for that is because otherwise it would be under fit and so you need a lot more data to train it when these models are pursuing the theoretical limits of the state-of-the-art you're gonna try a whole lot of different experimentation because you're not exactly sure that's going to work just like any other engineering endeavor just like any other scientific endeavor you're not exactly sure that it's ever going to work and so there's a lot of experimentation that goes along with it but when you just think about the number of the complex that's the size of the model the amount of data that you have to use to train it and the complexity of the task that they have to learn how to do the amount of computation is skyrocketing and you sees it doubling every three and a half months and so the net result is in combination with us moving as super Moore's Law rates doubling every year the trend is still doubling every three months or so well the net result is machines are getting larger people are getting larger and building larger and larger machines and my sense is that this is a trend that's going to continue we're going to see larger and larger systems with more and more capability and AI is going to drive our relentless pursuit for more performance and so everything has to improve we have to improve our processors we have to improve our system architecture we have to improve its scalability most importantly we have to improve the software stack so that the utilization of the computation that we provide is as high as possible and then we of course have to create new algorithms and new models that makes that are easier to train but all of this is driving our AI computation relentlessly ai is also not just in training if you take a look at the work that we're doing in nai we're doing an enormous amount of work in training and we created this incredible appliance we call DG x2 with 16 GPUs in it that so many companies are using for training but once you train the models maybe you want to do it in the cloud maybe you take the mountain out to the edge we call that system egx and this is an area that's going through extraordinary extraordinary excitement and development some people call it intelligent edge some people call it edge computing it's just the edge and then of course autonomous machines AIS that are interacting with us in the world some of it are driving some of it are delivering they're delivering groceries in the last mile and so autonomous machines is an area that's doing going through a lot of exciting exciting development we have four basic platform dgx for training hgx for the cloud hyperscale egx for the edge and ajax for autonomous machines let me show you with you some of the examples of how people are using AI in high-performance computing they're gonna come into all kinds of categories some people are using AI informed by first principled physics the first principle physics equations are actually embedded in the neural network and as a result our pre knowledge all of the knowledge that we've gathered that led to those equations are now embedded into the neural network and it gives it a gigantic head start physics informed neural networks some people use enormous simulators with extremely high precision to train a neural network and these large simulate large simulators take an enormous amount of time to train to run and because the larger the scale the more precise the simulation if they could accelerate the simulator it could actually be more even though there's estimation approximation done you means that using the neural networks the overall simulation could be more precise and so there meso scale simulators are used to train neural networks that are then predicting outcomes at a much higher pace we see people using artificial intelligence networks to study the output of simulations we see people using artificial intelligence networks to guide the simulation to focus in on where the most interesting activities are and zoom in on that particular part of simulation so that we don't waste a whole lot of computation on a whole bunch of stuff that don't matter it's using guiding experimentation maybe it's used to steer a fusion reactor so that it keeps it under control there was a I think the the folks at at Princeton created a network called diffusion RN n diffusion recurrent neural network this one that's really interesting this one is using molecular dynamics simulator however the quote the quantum chemistry simulator the DFT simulator using a Schrodinger's equation simulates the energy potential of each one of the molecules and then that neural network is then placed inside molecular dynamics simulator to predict what the energy potential is while it uses Newtonian physics to simulate the rest of it and so as a result they've been able to increase the performance of their simulator by some five orders of magnitude and yet still get the benefits of otherwise would have used shown jure equation simulators otherwise called dfts AI for science we use AI for science we use AI to design our systems and so we're going to show you an example of one this is this is in the category of physics informed neural networks we call it sim net and Chris Lam is going to come up to tell you he's going to tell you what it is that we do why don't you describe first of all the method that we use for the neural network how we train it and then was given the example of it in action sure thing so as Jason mentioned we have a physically informed neural network where we've embedded the partial differential equations that govern in this case coupled fluid transport and heat flow in an heatsink model that's meant to cool a chip and we've embedded the orchid you guys turned up his his mic a bit this we've embedded the partial differential differential equations that govern the physics into the loss function that's used to train a neural network on this specific problem so instead of providing data examples to train on what we've actually provided is a sampling of the geometry and the fluid as well as the parameters such as the pressure or the temperature of the chip that are in there and what the neural network is doing is mapping the boundary conditions on to a set of plausible physical outcomes based on complying with the partial differential equations so ultimately what we're trying to do here is take a series of about 2,500 simulations that would take weeks with a traditional solver on a cluster and speed it up to interactive rates so in this case what we're doing here is we're we're encoding variations of the geometry of the heat sink where we're changing the various fin Heights and we want to figure out what's the optimal configuration that has the lowest temperature drop lowest temperature for the chip with a constraint that the the pressure drop from the fan cannot exceed a certain threshold in this case 15 Pascal's and so by creating a surrogate model training this neural network we can then run inferences very fast you can usually infer about once a second on a dgx to system and within a matter of a several hours come out with an entire design space represented here in the window on the left that shows the correlation between temperature and pressure for the various configurations now christen this case this is this is really not an approximation per se I mean in fact that matter is a deep neural network is a universal function approximator that's right and so so in a lot of ways the partial differential equation that would have gone into the rest of the simulator has now been learned into the neural network that's right so we're very certain because of the way this was trained with the partial differential equations that this is accurately complying with the laws of physics for every solution that it outputs you verified against many different actual other simulators that this is basically just as accurate as a traditional simulation and so by going through this this extra step of taking your equations your simulation equations and now encoding it into a neural network that extra step does take some time you have to train it yeah and so but once you do that this new model is incredibly fast incredibly it's incredibly fast and as a result we could simulate much larger systems if you take a look at this these simulators with the simulators we used to do in the past we would just take a small section of the system yeah simulate that and and use that to approximate the behavior of the rest of the system mm-hm and so people do that in and crash simulators they do that in all kinds of Windtunnel simulations they do that in all kinds of large physics simulations this is this basic method of using physics inspired neural networks has a real potential of allowing us that taking that extra step of training the neural network model allows us to now simulate much much larger models and quite frankly as a result more accurate absolutely it's it's applicable to very broad areas of physics we just chose this is an interesting example because we have hand-on experience and we do it every day at Nvidia well we chose this one because we actually need it yeah so that's a good way good good reason to do it and that this is beautiful this is beautiful yeah and one of the benefits of being able to evaluate different weight guys everybody this is done in real time this is not a this is not a video this is probably the first time you've ever seen fluid simulations done in real-time computer graphics is generated in real-time everything is done completely in real time no video involved yeah so in this example we actually found a very interesting outcome which is a fully rectangular heatsink wasn't the optimal design in this case there's actually a slight peak like a peaked roof that turned out to be just a little bit more optimal than something that was completely square and that kind of intuitively makes sense because the hottest portion of the heatsink is actually in the center of the chip and there you're putting the surface area close to where the heat is now you're trying to balance between having as much surface area as you can so that the heat could transfer through those plates but if you have too much of those plates then it provides too much back pressure so the airflow doesn't flow naturally through it and dependent and those spins have not designed properly creates turbulence in the back absolutely so we've we've integrated in this into this visualization and simulation tool and we're using it now at Nvidia that's really fantastic well in the future we're going to design everything in computers and then we'll take we'll take the blueprint we'll give it to a robot that was that learned how to build these things inside a computer and it would build build our whole system for us okay well that's the future by the way that's that's a real design of our system it's it's it's beautiful too and so we take we take the original design and and of our circuit boards and all of the geometry of all of the all of the various components and we we have it all in our database and then then we can we can simulate our systems for of course for its functionality for its performance for its mechanical and structural rigidity and integrity and of course its thermal performance now in the future what also simulates acoustic performance because all of that needs to happen right depending on the type of systems that we create all of it has to happen in harmony and the next step we got to do this all immersed in fluid and the reason for that is because all of our computers in the future will be liquid cooled right so this is really cool good job Chris good John Jay [Applause] okay streaming AI there are some amazing instruments being built the Square Kilometre Array these incredible telescopes that are looking at the sky sigh lights sensors all over the world measuring temperature pressure vibration in the future there'll be trillions of sensors all over the world there'll be billions and billions of cameras there be littered and sprinkled everywhere we'll be looking at continuous data of the universe and of the planet continuously we'll have just an infinite amount of data it'll be coming at us in real time in some applications in some applications the latency that we have to process our and perform our observations perception and sensing and recognition could be classification has to be done in real time they could be lidar information they could be radar information in the future it could be radio wave information it could be a whole bunch of cars that are traveling through a city and somehow we have to infer something about it in real time so that we could take the necessary action or alert the right people maybe we're using all of this so that we can improve our signal integrity so that we could improve our signal fidelity while dramatically reducing the energy dose maybe in the future our 5g and 6g radios are beamforming in real time and the heuristics are gone it's now completely based on AI and it's dealing with all the cars that are coming through and the people that are walking around and the connections and that are there and depending on the structure of the geometry of the buildings that you're around it beam forms accordingly we do this for lidar we do this for square kilometer arrays we look at different parts of the sky only if it has two and maybe one of these days base on some neutrino detection the telescope sweeps to a different direction because maybe a extraordinary event just happened in the universe all of this streaming information streaming data and doing remote sensing an incredible level is going to enable all kinds of new applications whether you know transportation business the manufacturing business or even in the retail industry the number of applications of high-performance computing as a result of this is quite extraordinary you know if you go back and think about all the work that we've done over the years most of it contributed to the basic advance of knowledge which is so important but its impact on industries tends to be indirect in several levels the knowledge we accumulated through science and discovery we believe ultimately led to advancements of industries but rarely is high-performance computing directly impacting industry until now as I mentioned earlier with the exception of oil and gas and seismic processing high-performance computing really hasn't affected and hasn't touched most of the other industries and finally I think because of the work that we've done we've done together and particularly because of AI we're gonna be able to put these remote sensing systems all over the world performing all these kinds of different tasks this is a new type of high performance computing and I think in the future you're going to find supercomputers that are used to train the models and develop the models but then we need systems that are at the edge they're at the edge because that's where the action is that's where the data is and they're at the edge because you can't afford to stream that much data over the Internet or over any net back to the supercomputing centers for processing you want to do the filtering right there at the edge so that whatever is coming back or all the relevant information the meaning information the useful information that you would like to collect and store remote-sensing it's going to become software-defined and this is an area that I think can go through enormous transformation and make several years we created a whole new type of computer we called the egx the platform looks the same and it is basically the same but it's fundamentally different on many many directions in many ways first of all this box is likely to be sitting in some strange location a strange location you like never to visit again it could be it could be underwater it could be on an iceberg it could be on top of a hill it could be in far remote regions in deserts it has to be tamper proof and so it has to protect and secure data in place as well as in motion it has to be managed from afar and orchestrated independently independently of being sitting next to it and so you'll be sitting at your super computing data center you create your model you should be able to orchestrate send it out orchestrate and update the models using kubernetes all of the models will be encoded and encrypted all of the data will be secure every single time it communicates with you it would have to add a test and make sure that it's secure and so this particular high performance computing system is going to be based far away from supercomputing centers and remotely managed and it has to be secure now so we call it the egx supercomputing platform that is based completely on kubernetes and highly secure the applications of it is really quite quite phenomenal we've created an application stack on top we call deep stream in metropolis basically it's a streaming computation application framework it's a streaming copy an AI application framework it's essentially like a self-driving car except no wheels sensor information is coming in in real-time streaming into this box as fast as you like it to stream in just how many NICs would you like to have how much computation would you like to have inside it could have big big boxes it could be small boxes it could be little tiny boxes like I mentioned earlier this little Jetson computer and these boxes these boxes could sit far away because they're completely secure and so we call that deep stream I'm at metropolis which are the application frameworks one of the early applications of this is really a fantastic application it's used by the USPS the world's largest logixx logistics center they process 500 million pieces of mail a day 500 million pieces of mail a day some of it has hen written some of some of the handwriting could be better some of the handwriting are just wrong and some of the packages contain things that we like not to mail and so they need some AI system to look at and process all of these incoming pieces of mail 500 million pieces a day the perfect application for a streaming AI computer these computers are going to be perfectly secure there'll be in hundreds of sites be managed from one place whenever there's new models maybe new threats it could be updated the system never goes down this is one of those computers that you can update without it going down ever and so new AI models will be deployed to it and I'll continue to run it runs with the old model until the new model is is is installed and then it switches over to the new models there are all kinds of new applications that we could we can we can imagine that I've already spoken about a couple of these whether it's using one of these systems manage a whole factory of robotic arms instead of putting cameras updating all the robotic arms with cameras on them and sensors on them we could put sensors all over the factory and these sensors all over the factories will allow these cages to be taken off and these robots essentially to be working among us and based on what other robots are doing and and what are the other people are moving around and the other delivery BOTS are happening these robots could be orchestrated and managed by one supercomputer sitting inside that sitting inside that manufacturing facility the incredible number of applications all based on high performance computing and processing AI in real-time this is really interesting this is fun to watch this is how mail gets along surprisingly brute floors it's a miracle that it gets there look at that if we were to realize this is what it took for mail to get to us we would have we were too conceived of email a long time ago we announced at Mobile World Congress that this particular system is also ideal for the future 5g edge not only are we going to put applications on 5g networks that are close to the edge so that it could be processed a lot faster so that data doesn't have to travel long distances over the internet or because data is secured and data privacy is so important you can't afford to have it transported over the internet and off of your facility those kind of applications are perfect for this platform in addition to running the whole 5g stack this last couple of years we've been working on accelerating the 5g Rand just as we do signal processing we do deep learning now we're doing signal processing for 5g radios and we could scale to the highest performance and because it's in the data center and because the performance and latency is so low so performing so high in latency so low it's possible to have a lot fewer systems because the traffic the workload could move from data center to data center so we announced a partnership with Ericsson Microsoft as you know is going all-in on the intelligent edge they're seeing just like we are an explosive number of applications in warehouses logistics retail stores 30 trillion dollars where the retail has the opportunity to improve their efficiency by just a couple of percent the benefit to that industry and the benefits to cost of living is incredible and so we see the same opportunities that they do and we partnered together to bring intelligent edge to the world let me change gears now talk about something else as you guys know building supercomputers is hard and I wish I wish we had a stop-motion camera on the building of Summit it would have been incredible I'm sure I'm sure I'm sure Jeff Nichols has it but building supercomputers is super hard and takes a long time several years into planning several years in the planning and almost a year is just a building and then facilitating the system bringing it up getting it tuned it's incredibly hard on the other hand and so researchers don't have the ability to go and invest and building their own supercomputer not only is it expensive it's hard and so cloud seems like a perfect place to do it and so we we did we did several things we took a couple of these examples of the applications that we showed you earlier then we benchmark all the time and we put them in the cloud and and this is a this is one of the clouds and it's the it's the CPU that's recommended for high-performance computing and it runs these applications all of these applications in aggregate takes about 48 hours 48 hours and $152 48 hours and hundred fifty-two dollars now frankly that doesn't sound so bad however as you know most of your jobs most of your simulations take a lot longer than that and so taking a lot longer with a lot more CPUs this bill could really rack up and rack up fast and so the answer of course is to accelerate it acceleration acceleration we've been putting GPUs up in the cloud now for a couple to three years and acceleration acceleration is is is unquestionable the acceleration is is unquestionable please put your walkie-talkie to silent it's okay we're good where these are friends where among friends and so acceleration is is is is unquestionable we know that we could accelerate it however however the price the price per hour of a GPU is a lot higher than the price per hour of a CPU the price per hour of a GPU is a lot higher than a price per hour of a CPU and as a result many tend to use CPUs until recently that trend is starting to change and the reason for that is because all scientists know is not about price per hour its price per science it's not price per hour is price per science it's not cost per hours cost per job and so the question is what would happen if I would run this exact same application on our GPUs and this is what it looks like on one GPU instance on one GPU instance on one GPU instance 48 hours become 6 and it cost you 18 dollars even though when you look at a per dollar per hour it's a lot higher of course it takes six times less six times less time on the other hand even when you use multiple GPUs on one of the most powerful and therefore most expensive instances in the cloud it is actually the most affordable surprisingly now to you and I it's not surprising at all the best way to reduce cost is to get your job done and get off and so getting your job done and get off is really the best way to keep your cost down on clouds and people are starting to discover that well one particular group of scientists discovered it recently and they did something that is extraordinary they did something that's extraordinary they're validating the results and trying to improve the detectors of neutrinos you guys know that there's this incredible incredible sight in neutrino in indicate neutrino detectors called Ice Cube in the South Pole neutrinos are these things called cosmic rays that were discovered about a hundred years ago and we finally detected them and they're neutrinos and they come very rarely but they are high-energy extremely light has almost no mass and because it has no almost no mass it doesn't interact with anything it doesn't collide with anything it flies right through everything and it's in fact so light that it interacts with the universe so lightly that it gets here faster than the speed of light because you know as you know light interacts with the medium and as a result neutrino gets here faster we could detect a neutrino and might be able to give us early warnings of something that's happening far out so that we could focus our energy to go discover it well that we created a neutrino detector called the Ice Cube in South in the South Pole and it's deep in the eyes one square kilometer one square kilometer one one kilometre one kilometer one kilometer down and this cube of detectors and when a neutrino strikes an atom a nucleus an ice nucleus this crystal clear ice underneath or above underneath the South Pole it emits light and these these these light amplifiers collects them up and detects them they detect you know tens of neutrinos a year it's not that many they detect tens of neutrinos and and what they want to do is improve the quality of their of their detectors and so they want to understand when the neutrino strikes a nucleus and in the midst that little tiny pulse of blue light as it travels through the ice they would like to understand that transport property better so that they could create better detectors okay and so they went off and they launched a gigantic simulator we and this gigantic simulator was launched on 52,000 nvidia gpus so somehow they found the perfect time around the world where 52,000 nvidia gpus are available in the cloud and they hit enter the person that hit enter well it's one of these guys it's got to be either Frank or Igor Frank and Igor are here actually where are you guys Frank and Igor hey Frank Igor good job who hit entered the largest single gpu-accelerated application ever it launched on 52,000 GPUs it ran in every single country and every single cloud and my understanding is is the cost was somewhere between fifty thousand to two hundred thousand dollars and and it was on Frank's Frank's Visa card now now here's here's the amazing thing in aggregate is 350 pedo flops 350 pedo flops of FP 32 on those 52,000 GPUs which is almost the computation of summit it's almost the computation of summit which is which is like you know what does anybody know the the answer to this next question is is there like 20 30 megawatts 20 megawatts or so it's like 20 megawatts or so and and so you guys hit Enter you you kicked off 52,000 GPUs servers I don't know how many megawatts were we're there for cranked up at that moment for an hour or so you simulate it and then you spawn you you had to start up all these jobs you got to retire all these jobs get all the data back out and altogether was about two hours incredible simulation now here's the amazing thing this is kind of interesting too now it turns out there's a there's a whole bunch of V 100's in the world than p1 hundreds and p-40s and we had several generations of them and and this is this is not the number of GPUs this is the events processed per by GPU type and and as you could imagine it makes sense since be 100 is so much faster than P 100 the number of even though the number of GPUs could be similar the events that a process was a lot a lot greater and I think I think Frank was was telling telling one of our guys that that in fact if you did it by cost the 100 is potentially the cheapest which kind of makes sense the faster you get something done the cheaper it is okay and so I've been I've been known to pass along a wisdom see the more you buy the more you save and it's it it is wisdom indeed and so so for all researchers that would like to conserve whatever research grant that you have you know find yourself the best GPU before you hit enter okay and so so it starts to rack up alright so this is a this is a cloud simulation an HPC cloud is starting to change the world of high-performance computing and in fact you you almost wonder what we're gonna do with top 500 and the reason for that is because much of top 500 high-performance computing is going to be done in the cloud in the future and so we really need to have a system that allows us to understand that better and more and more researchers are using this because it's just easier it's easier than building a super computing cluster it's easier than managing it's easier than then then then then then it's much much easier to use the cloud and now that we have the most advanced GPUs up in the cloud is also very cost effective and so it's both convenient cost-effective and you could you could conserve your your your grants for research hiring researchers today we're announcing that we're partnering with Microsoft to put a large scale and hopefully larger and larger and larger scale supercomputers in the cloud and high performance computing starts with the processing nodes one GPUs part one GPU per note many GPUs per node and of course connected by extremely low latency connectivity and this is connected by Mellanox is incredible InfiniBand and so all of these nodes are now connect with InfiniBand and all of it in aggregate turns into a really fantastic supercomputer the thing that's really great is we have this we have this registry we call the NGC the NVIDIA GPU cloud it's a registry where we store all of our latest and updated and optimized software stacks that I mentioned earlier during the talk all of those stacks whether it's for deep learning or data analytics or machine learning or molecular dynamics or quantum chemistry or fluid dynamics image processing or volumetric rendering all of that is optimized and opportun all the time and stored in NGC cloud you open up an instance you go grab one of those stacks it's in a container you launch it on Azure and you're doing science you're doing science so it's really really quite fantastic so I want to I want to thank Microsoft we're partnering with us on this and putting in the hands of every researcher in the world a supercomputer thank you I drank a can of Diet Mountain Dew before the talk and every three words I feel like what do you is there a graceful way of saying it sorry whoever wants to give a talk next time don't drink Mountain Dew all right let me let me change let me go from the world's largest computers to the most energy-efficient computers and so you guys know that arm is the most pervasive eisah CPU I said the world's ever known it's at a hundred and one hundred billion plus computing devices in the next several years a very few question it will cross the trillion trillion devices mark and now with IOT and sensors out all over the place and smart sensors all over the place you know arm is arm is really really going to continue to grow and this industry has has has pursued advancing arm into all different types of configurations of high performance computing and people do that because the CPU is completely open we use arm we use arm because there are certain types of computers and we want to build for example we built xavier which is the world's first robotics processor the configuration of it the way that it communicates the real-time nature of sensors the proportioning of a single threaded performance computation AI computation parallel computation with CUDA the proportion of all that was so different we needed we needed the ability to configure our own computer well it turns out a lot of people need the same thing whether it's because certain increasingly people realize that high-performance computing is the engine of the next Industrial Revolution now it sounds a little cliche and it sounds a little cliche even as I say it but it's completely true we automated power in the first couple generations of eras of industrial revolution and this time we're automating automation the ability to put AI everywhere is truly an extraordinary event and countries recognize this and so so uh nations ourselves in many nations around the world are investing in building their own high-performance computing infrastructure for example euro HPC the the folks in Japan are building their own hyper supercomputers for this very reason because they they need to advance supercomputing in the way that they see the world emerging there are different types of computers that are being built some of its done in the edge some was done in hyper scale cloud some of the high-performance computers are designed for very very fast i/o very very fast storage some of the design so that it could be incredibly secure and so everybody has a different motivation for designing high-performance computing there used to be one type supercomputers now there are so many different types as the universe of HPC expands in literally every single direction the ability for people to take a simple eisah like arm that has the pervasiveness of arm and to be able to configure all kinds of different computers is really quite power quite powerful and we see all kinds these are just some of the block diagrams that you guys might have seen as well the folks at ampere they call it emag they're optimizing harpers hyper scale and storage Amazon they call a graviton it's four hyper scale and smart Nix Marvel Thunder x2 hyper scale HPC and storage Fujitsu has a really incredible processor they call a 64 FX for supercomputing and Huawei just recently announced a really great processor they call the company 924 big data analytics and edge and so there's so many different configurations and everybody is optimizing for different things and the amount of i/o that they have is different the amount of cash they have is different amount of cores they have is different some of them have giant cores some of them has smaller cores a whole bunch of them and so the the configuration of it is completely different and people over the years have asked us if we couldn't please bring our CUDA GPUs to this ecosystem and so several months ago we announced that we would we would and so we've been working on cultivating developing the Kuti ecosystem forearm and it turns out it turns out that the lifting is not terrible because all of the applications are open source and we work with all the ecosystem and kuda is is the engineers that worked on cooter kuda and one of them is right here chris lamb had done such a great job that the the porting of the ecosystem and the cultivating of the ecosystem forearm has been really fantastic and we we had a good friend over at Oakridge start working on it right away and so so this is based on a thunder x2 and just basically one Volta and some of the things that they said we're fantastically they wrote it they wrote a paper on it the stack is really solid straight out of the box on par with power in x86 this is from Jack wells and Satoshi a good friend who's who's a you know provided a lot of guidance over the years and for the for the whole HPC industry a new wave of HPC Nai converged workloads in Japan and the importance of the arm work that's done in Japan for their for the national HPC efforts and the speed ups are all fantastic the spirits are fantastic and so today today we're announcing so that so it's it it seemed like annex it seemed like an experiment but it wasn't our company is fully dedicated to this we just haven't announced too many things since and we have we have a bunch of people working on it and so today we're announcing our first reference platform the NVIDIA HPC for arm and let me show it to you so this thank you so this ladies and gentlemen our four GPUs and they sit basically in that chassis up there this is the configuration and we made it so that anybody's CPUs anybody's CPUs could be connected to this okay and so these are the two CPU boxes in these boxes that we have here these are the the Marvel Thunder x2 is really fantastic CPUs really great i/o and the single threaded performance is approximately that of a high-end modern Xeon CPU and so so the Thunder X 2's two of them in here and two of them down here connected through an external cable for PCI Express and so this way whether it's an ampere or Fujitsu or anybody else this is a really wonderful way for you guys to get connected and each one of the pairs connect through this to connect to for voltage GPUs okay it's connection looks a little bit like that and so this is our first development system for our HPC [Applause] and it's lighter than you used to be I think we're making our systems lighter and lighter must be those simulations okay so this is what the box looks like and as you know we design everything in in digital so it's easy to make them translucent okay this is this is the beauty of doing everything in digital you guys okay we do everything in digital and and then we just I just asked him for show me what it looks like yeah it looks like this and I has four Mellanox Nick's these are to see X fives and then we'll have CX sixes next incredibly high performance and we've been working with the industry all of you the the the industry has has really really been fantastic everybody's jumping on and and we already have 30 applications from molecular dynamics to quantum chemistry to imaging to comment that was a one of the things that I demonstrated earlier tensorflow so now arm has AI tensorflow relyin which is a cryo-electron microscopy imaging system and all of your favorite and best programming tools including including PGI and the cpu the cpu partners are fantastic and growing and and you know this is I think this is going to be a great ecosystem now basically everything that runs in high-performance computing should run on any GPU or any CPU as well and so GPU CPU it's basically an open system as you know all of those applications are open source and could be ported from application to from platform to platform to platform and so we now work with IBM we work with Intel CPUs we work with AMD CPUs and we work with arm CPUs okay and so that's our reference platform and you know it's always good to see it work now it turns out all of these applications on top watching them work is not it's not so bad as grass grow but but it takes a long time as they should and so so we chose one that is a fund watch and it's called VMD now VMD is is developed by a gentleman named John Stone now John Stone I like to call him the great John Stone and so John Stone John Stone was probably the first CUDA developer scientific code could a developer and if not for Nandi being ported to CUDA quite a long time ago and when you see him he'll stand up you'll just be amazing he looked exactly the same he was a child then he's a child now and so John Stone ported AMD to CUDA and gotten incredible speed ups and so when we came up with this new new computer we needed to hey who's gonna try it hey let's give it to Mikey and so so we called John Stone and hey John Stone could you give this take this out for a world and he says yeah sure why not and so so he's only had it for a few days and and let's see what he came up with all right great John Stone so this would be the molecular visualization tool that I developed at University of Illinois and it's running on an arm machine in Santa Clara California and you're seeing it this is a fully featured version of the program so it has all the same features it does and all the other hardware platforms we support our NIH funded research center develops these research tools and we want to make them available to all the different hardware platforms that the research community uses and so you're seeing this now running on a thunder x2 based machine with two tesla v1 hundreds and it's showing a live interactive ray tracing that's being driven by Vishal here and this ray-tracing is then compressed in real time and streamed over the internet and is displayed here in Denver and what we're showing is a large photosynthetic organism that lives at the very bottom of various ponds where there's not very much light and it does photosynthesis these little green ring like structures or chlorophylls and they basically capture the photons from light and they that's in the early stage of a long sequence of operations that take place they convert that light into chemical energy in the form of ATP which is the fuel of all the cell life on earth and so this is a you know in terms of photosynthetic machines this is incredible because this simple thing simple compared to plants and other things it is it's energy return on investment is four times better than the best engineered thing that humans have been able to come up with and so we in our various collaborators all over the world are interested in studying how these things work and this is a when it's simulated on a supercomputer every single thing you see there has been simulated on a supercomputer both in isolation and now in totality this a grits to hundreds of millions of node hours over decades and many many many PhDs and researchers all over the world working on this so it's really cool to see this you're seeing an atomic detail structure and it's like a little Swiss watch it's amazingly sophisticated and so here you are you're seeing it running on an arm machine John nice work I get ladies and gentlemen I don't think anybody's ever heard a description of otherwise normal human know as pond scum with so much enthusiasm and with okay so so ladies and gentlemen John Stone [Applause] next time next time you're swimming in a pond and you step on that gooey slippery stuff on the bottom you'll think of John Stone from now on I am certain of it every super computing scientist in the world when they think pond scum John Stone I Love You Man incredible work all right let me talk to you about Io our DMA our DMA was was a was really really pioneered and and brought into industrial used by Mellanox the ability to bypass the kernel as as computers communicate with each other in a large distributed computing way put infinite ban on the map and and as a result it is just a fantastic way to do large scales large-scale distributed computing applications and simulations the challenge the challenge of it is this when we started adding GPUs to these to these systems and what we're what I'm showing here this is basically a this is basically a DG x2 node it has 16 GPUs 16 GPUs each one of them with one terabyte per second memory bandwidth so you have 16 terabytes of memory bandwidth terabytes per second of memory bandwidth driving half a terabyte of memory ok and so you can move a lot of data into the DG x2 for data analytics or deep learning or scientific computing and whatnot however if you were to move that data and you excuse me if you were to connect multiple dgx systems together to do a large sales large large-scale simulation or or a our training training job the communications between them to collect intermedia partial results to reduce them to broadcast them out and synchronize them with the other nodes that overhead of communicating becomes quite quite high and the reason for that is because the systems are running so fast each one of the nodes are now computing so fast and within the node is communicating with MV link and so the the commune patience inside the system is incredibly fast now all of a sudden the communications over the system among the systems becomes the bottleneck and so we created GPU direct our DMA that many of you have used already I'm sure and as part of this API we used to call Nikhil what we call nickel and nickel basically uses CUDA to perform our DMA between between the systems and working with working with Mellanox next we have the ability to transfer information from node to no without ever touching the CPU the reason why this is so important is because if you were to send it to the CPU and back we would cut the bandwidth in half at least not to mention that the bandwidth that we're talking about now hundred gigabytes per second hundred gigabytes per second 8 NICs on a DG x2 which is basically 800 gigabits per second or hundred gigabytes per second that bandwidth is of the order of the bandwidth of memories on CPUs and so these CPUs are going to slow down the data transfer we have the ability to transfer the data directly from node to node over GPU direct our DMA saving the CPU to still orchestrate and still running the application but most importantly so that we could sustain very very high speeds of data transfer the challenge now the challenge now is not only is this a problem for networking this is a problem for storage the amount of data that we are announced we're now generating in these large supercomputing computers is quite extraordinary and so this is the way it works today traditional storage you're now sending storage at 50 gigabytes per second across 8 NICs and these 8 NICs the the data goes to the CPU and then it comes back down to our GPUs we've invented this technology called GPU direct storage where now the storage can stream a hundred gigabytes gigabytes per second through these 8 NICs directly into our GPUs the combination the combination of these two technologies and many other libraries that will put into the i/o and multi-node multi-gpu networking and storage parts of the stack we're now calling all together the nvidia magnum io this is an area that's going to be rich with innovation and we're gonna put a lot of energy into helping you move data around the system in and out of storage between nodes for networking and we're working with the ecosystem storage vendors system makers NIC providers Mellanox and others to optimize this entire performance now the question is what does it feel like in the end the question is what's the feel like in the end this breakthrough Magnum IO is one of the biggest things we've done last year and the reason for that is almost not nothing else that we've done collectively as an industry would have sped something up by a factor of two three four magnum IO does that Magnum IO gave us a many X Factor speed-up and some of the things that you'll be able to do with it is really quite remarkable you know that we've been working on this thing called rapid this is one of the greatest things we Det we've done and I'm so proud of it I spoke at it and spoke about it at length last year data analytics data processing is one of the key parts of supercomputing going forward you can't just simulate the results you have to study the results and and getting all of that data off of disk into the system for data processing has become a gigantic bottleneck there's several parts of Rapids all of this is open sourced the the several parts of rapid the first thing is it's built on top of Apache Aero so that data could be read in whether it's CSVs or parque and it could be read into Apache Aero in a column er and a vectorized format so that CUDA could read it lightning-fast second thing is that desk which is also open source allows us to schedule multiple GPUs for the very first the scheduling of a large cluster of GPUs is a great breakthrough that desk has provided on top of it we built Rapids there are several components of Rapids the first component is the ability to ingest IO we call it COO IO is not shown here once you ingest the IO the data then you go into this thing called data frame it's basically pandas change a couple of lines of a couple of lines of python python code and all of a sudden it runs on CUDA and it runs on our GPUs lightning-fast we call that kudiye kudiye frame ku ml is basically psychic learn and it's compatible with X G boost a couple lines of change again in your Python code and boom it gets accelerated with ku ml and then lastly we recently introduced ku graph which is really getting a lot of excitement now the ability to process and Giganta graphs analyze it fly through it okay and so these stacks are the first accelerated data analytic stack the world's first accelerated data science stack and we see speed ups up to a hundred and I saw a speed up the other day seven thousand times well the reason for that is because you know most most data analytics programs were developed when the data is small and you practice it on small things but pretty soon you take that application that framework into high performance computing and Allison you got 10 terabytes it comes to a crawl and so the ability to rethink that whole data analytics pipeline is really really vital and really important and we we we're seeing great adoption all over the world 150,000 downloads already and there's so many people who are contributing to this I'm really delighted to see this and so now data analytics the acceleration stack is right there what we need on top of that is Magnum so that we can move data in and out of these systems and sometimes the data is so big it doesn't fit in a dgx to half a terabyte and we need to have more more than that and so we need to of course connect more of them together and Magnum allows us to do that as well well here are some examples of what to do so this is the folks at Pan Geo and they they have a reader called x-rays are and it reads and volumetric data of the worth the world's high precision world weather data and they want to import all this so they can understand climate change when they import the data it's like a hundred terabytes of volumetric data a hundred terabytes a hundred terabytes once they move it in they want to do analytics on it maybe they'll do some you know simple things like what's the average change in temperature of this part this cube of the earth over time over the last three decades maybe they want to do some convolution on it and so pan-pan geo x-ray has been sped up incredibly we we took took the whole stack out for a for a drive and we benchmarked in on T BCH which basically is taking 210 terabyte data sets and finding relationships between the two of them okay and so it's it's a well gosh I think what did it say went about without without Magnum IO as the GPU direct storage versus with is about 20 X speed up and then this is a structural biology simulator and basically what it's trying to do is take all these different time steps that came out of the simulator and wants to do a dissimilarity analysis finding these that create the dissimilarity matrix from frame to frame to frame to frame so that they could discover when the molecules change the least and when it changes the least that's probably because I found it the minimum energy state and it's finally stable and so so that analysis was sped up tremendously well I want to show you something that there's just simply impossible to do in real time until now okay so in this particular case you have a DG x2 you've got you have you have the pan geo excuse me got Magnum Magnum IO layer you've got the pan Gio's our reader so on top of that and then on top of that you'll do Rapids here you've got the dgx - you've got the Magnum IO and on top of that you got CUDA and then you've got Rapids here here again you've got DJ x2 on top of that you have Magnum and then you have to the VMD structural biology analysis tool what I'm about to show you I'm about to show you is a simulation that is gigantic and I created a simulation file a hundred 50 terabytes large it's a simulation 250 terabytes large now once the simulation is done we want to we want to understand what we simulated and so so in this particular case the only way to do it is to visualize it we're gonna visualize a hundred and fifty terabytes of data I'm going to fly through her four bytes a hard 50 terabytes is something along the lines of 25 thousand DVDs okay we're going to take 25,000 DVDs we're going to pile it all into a box I'm going to fly through every single scene random-access okay and so so this this next next next feet is something you should do at home you just can't all right and so I'm going to introduce we have a we have a we have a space engineer among us actually won't you introduce yourself sure my name is Ashley Morrison and I'm an aerospace engineer from NASA Langley Research Center she is gonna try to feast trying to figure out how to design a lander to send us to Mars it takes two years to get there right you were telling me it takes like two years to get there and six people have to live in what appears to be a condominium as they land now that's just the last part before they go they're gonna be like in a cruise ship they're gonna be discos there'll be movie theaters because these six people are gonna have to live in this thing for two years when get there they still want to be they they want to be st. enough to want to land right and so so so so anyways ashley is working on this all right so tell us about tell us about this adventure tell you what you know what how about let's show it to them first yeah okay ladies and gentlemen the Mars Lander [Music] [Music] [Music] [Music] now now actually first of all you know we when we saw the Curiosity Lander land on Mars there wasn't a ball of fire right it was not a ball of fire there were there were there were some there were some parachutes and it landed gracefully on Mars and now now we're gonna put six people in this thing and when to send them them to the surface of Mars and a ball of fire okay so tell us why we have to do that yeah absolutely so in order to send humans we're very fragile were very delicate creatures we need a lot of things to keep us alive so it's a lot more mass you need to get to the surface of Mars so you mentioned the Curiosity rover it's the size of a compact car that's about as big as you can do and no fireball so there were parachutes deployed at supersonic conditions to touch down on the surface when you're talking about payloads this size this vehicle is now the size of a two-story house you know it's more than 16 meters in diameter and its largest dimension it's it's absolutely massive so you're going from roughly 12,000 miles per hour to zero in a very precise targeted spot on the surface in less than seven minutes and to do that you can't use parachutes anymore so that's where the ability to simulate the types of physics for what will be a new deceleration technology your fireball really comes into play so that's the that's the technology change in the paradigm shift does anybody know how long it takes to stop from 200 miles an hour on a car to zero it takes takes some time yes and so we're twelve thousand miles per hour we're gonna we're gonna fly through this atmosphere is that it's 100 the atmosphere of Earth and so so at that speed you're not deploying parachutes not for a vehicle this size yeah I don't think the math works and so pretty much you've got retro boosters and you're firing these engines and and it in six minutes time I think you were telling me that that you would have traveled another few thousand kilometers basically the entire width of the United States and so you're inside the inside the inside the inside the atmosphere from thousand-meter to zero and and you guys gonna land you're gonna stick the landing you have to stick the landing you hope you stick the landing and so we're gonna send we're gonna send let's see if I understand the mission okay so you just got describe it I mean when you say it out loud just go yeah I could do that so so you so basically when a sin we're gonna send six people on a journey for two years they're gonna watch every single movie that has ever been made in two years right they get there they're gonna they're gonna be flying at this rock at 12,000 miles per hour and and unwittingly all of us on a ball of fire consumes them and and then and then somehow somebody her name is Ashley whoo-hoo-hoo-hoo-hoo three decades earlier did some math that said it's gonna be fine and you're gonna you're gonna land you're gonna land this thing like that you don't even need a joystick you won't need it because you're not flying anything at 12,000 miles an hour yeah okay all right that sounds right and so that's the mission [Applause] okay so before we made DG x2 and Magnum and port and got indexed to run on this whole thing index is our distributed multi node GPU accelerated volumetric renderer okay that's a very long description by the time that they were done they were done explaining to me several years ago that we needed something like that I said after that Jenson what you know what we need we need a distributed GPU accelerated volumetric scalable renderer as now I don't understand what you said and so but anyways this is the reason what we did it so that all these simulations creating terabytes and terabytes and tens and hundreds of terabytes of data are going to get put into storage we're going to scream the data out of storage and want to fly through all of this data in real time in real time so that we can analyze it and so what we need to do is we need to create a computer that has the ability to do that we had to create the i/o stack that could move it in and out of storage streaming it putting it directly into the computer while we visualize it or do analytics on it in real time the idea that we're going to have essentially a supercomputing analytics instrument that sits next to a supercomputer we're going to do the simulation on a supercomputer we're gonna do analytics on a supercomputing analytics instrument and so that's what DG x2 is about we've been building this for some time all the pieces have finally come together we showed you a few examples of it we showed you the miracle of landing on Mars talk to you about the pan geo x-ray x-ray czar reader we talked to you about using optics to visualize molecules in real time essentially providing a computational microscope using tensor RT or AI to inference in real time enormous database of a pre simulated weather pattern to go discover extreme weather and then of course doing data analytics at a pace and a speed nobody's ever seen before this is the benefit of accelerated computing and finally we've been able to put it together so this is what we talked about today the HPC industry the HPC universe is really expanding and an incredible rate in every single direction everything from arm CPUs that will now have the entire capability if everything that we've ever done in high-performance computing to edge computing that has the ability to now do AI and become make all of our sensors become software-defined 2d gx2 with the right stack on top of it now is a computational instrument the partnership we have with Microsoft to put world-class supercomputers in the cloud and of course one of the best things we've done Nvidia Magnum IO which now helps tackle one of the giant bottlenecks of networking and storage and all of that software is made available to you fully optimized and we're going to work on it for as long as we shall live and in my case quite a lot of time and so I'm talking about decades here I'll still be working on this and accelerating software when when Ashley sees the first person on Mars in the mid 2030s and so this is this is our grandest piece of work and something I'm super proud of and the world of high-performance computing will never be the same I want to thank all of you for your partnership and collaboration over the years thank you have a great supercomputer [Applause]
Info
Channel: NVIDIA
Views: 34,572
Rating: 4.7209301 out of 5
Keywords: NVIDIA, Jensen Huang, HPC, AI, Supercomputing, NVIDIA DGX SuperPOD, NGC, Artificial Intelligence
Id: 69nEEpdEJzU
Channel Id: undefined
Length: 111min 36sec (6696 seconds)
Published: Mon Nov 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.