NVIDIA: Deep Learning - Extracting Maximum Knowledge from Big Data Using Big Compute

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so our next speaker is the global head of the financial services business unit business and his Billy's business development but he is a PhD in device physics so so you know where you're at in the technology when you need PhD in device physics to sell the stuff his name is Andy Steinbach and he's with Nvidia and and he's gonna tell you he's here to talk to you about AI but he's got a long history in device physics at several different semiconductor companies he's helped startup machine learning practices and other companies but please join me in welcoming Andy Steinbach of Nvidia can everybody hear me thank you for the very kind introduction Frank well it's great to be here being a device physicist in fact it's a special point of pride and I want to talk today about artificial intelligence and deep learning specifically talk about some of the diverse applications and then talk at the end about the compute implications so this is no small task because deep learning and artificial intelligence the recent explosion it's not being compared with decade level secular trends like it's the new mobile or it's the new internet or it's the new PC it's being compared with no less than being the fourth Industrial Revolution so artificial intelligence is being compared with the rise of electronics you know the rise of electricity and the rise of the steam engine and I guess you could go back to the Bronze Age but so this is this is a bold claim and it's fair to ask is this hype is it hyperbole well the proof is in the pudding so I'll let you answer it for yourself but I hope to convince you that it is and please note that these trends again they're not decade level trends they're 50 year trends hundred year trends and what's interesting is I claim that we're probably something like 50 years into the artificial intelligence trend already so it's quietly been happening as we've been building up our compute power and computer architectures but most importantly 50 years of hard work since the 1950s 1960s 1970s on algorithms both machine learning and something called neural networks people neural networks and so finally what's happened in the last five years is it's exploded because basically in particular deep neural networks finally worked and they worked in a spectacular way and so 2012 was sort of the magic year what we call the Big Bang at Nvidia and what happened in 2012 was that algorithms trained using very large data started to work I'll talk about that in the next slide but there was a massive burst of research there was a whole development of special frameworks almost like compilers just to be able to more easily do quick research in deep neural networks early adopters like Google's and startups exploded that's an old number there's probably 50 billion or more in funding now and finally fortune 500 companies are adopting it so you ignore this trend at your peril and what happened what happened was that deep neural networks are particularly good at solving problems with unstructured data and researchers started smashing all the records in all the benchmarks in artificial intelligence that had been built up and sort of asymptotes it if you know what I mean for 20 30 40 years so the big thing that happened in 2012 in the upper left was at the annual computer vision a benchmark contest a deep neural network trained with GPUs on very large image data smashed all the machine learning records it jumped up that discontinuous jump of the green line you see there ten percentage points and it kept rocketing up until it achieved better than human performance the same thing happened within the last three or four years in speech recognition speech recognition handcrafted machine learning algorithms have a 3040 year history and as soon as deep neural networks popped up research for example at Microsoft and other places like Google and Baidu immediately lower is better on this graph in the upper right very quickly beat the existing and incumbent records and models things that were very difficult to do like robotics without being hand coded for specific tasks having human like this dexterity this these are robots that were trained at Google just like a child learns the camera in the upper part of the image is just watching if it drops it and then it tries at a different way and it worked and then finally in 2015 just not even two years two years ago a crowning glory which is that Google programmed a deep neural network using something called deep reinforcement learning so powerful that it beat in the game of strategy called go it beat the world's Grandmaster top rated go player this was in 2015 and it's a much more difficult game than chess you can't just compute the combinations it's a game of strategy and intuition this isn't just doing something like learning to walk or learning to speak or learning to say that's a cat and that's a dog like a three-year-old would this is taking ostensibly that the world's hardest game of strategy and a machine algorithm beat the world's best human player that's incredible so there'll be more as we go through this but let me hopefully your we're on the road to understanding and getting convinced why it's so revolutionary well at Nvidia we make parallel processors I'll explain a little bit later but we have to seize on this opportunity an autonomous car driving division and I want to show two videos to make artificial intelligence deep learning in particular a little bit more personal so that you can sort of touch it and the first thing that we did when we started our autonomous car driving division is you say well maybe I shouldn't try to make a car drive itself like a person right off the bat so I'll use these image recognition techniques and allow an SDK a software developer kit the hardware so the car manufacturers can make auto pilot s-- if you're texting and you don't see car stopped in front of you it'll break it'll break if you miss something it won't let you change lanes if there's a car near blind spot well what would you need to do that this is a deep neural network trained to operate and detect images at 60 frames a second it can detect cars it contest' Rian's stop signs it has to do it in all kinds of weather it can semantically segment the image and determine the objects the purple is drivable surface without it can draw boxes around cars it knows what cars and vehicles are around it it can segment lanes it can use that and check the blind spot to change lanes for example and it can categorize the drivable surface there in the middle the white lines are off the road the yellow lines are things that you really shouldn't hit and the red lines are things that you really really shouldn't hit it's the Palo Alto Stanford mall by the way it's funny and you can put all that together and that's fantastic this all happened in the last five years 2012 with zero hour it's amazing but what you would have to do with that is you'd probably wrap it we're all a lot most of us are engineers in some if-then-else code saying well if the sensors tell you there's a car and the velocity unit you should be stopping but you're not stop the car if you try to change lanes don't let them if there's someone there blind spot but the next phase is much harder its how would you make an autonomous driving car and what you do is you make the algorithm behave or mimic an ensemble of good drivers I emphasize an ensemble of good drivers not an average driver which today with all are distracting us is would be pretty bad that's the whole point that we're trying to do and so what's fascinating is that the car and this is an Nvidia test autonomous driving car using are using our hardware software developer kit it learns like your son or daughter would at 16 it starts in a parking lot and it's pretty bad [Music] and it gets confident in the parking lot and mom or dad says well or the driving instructor let's go out on the road sort of halting at first nervous not confident but pretty soon you're navigating with confidence challenges as you learn four more examples and pretty soon the autonomous driving car is able to navigate things we never taught it we never train this car to navigate a roadblock Construction roadblock diversion we never said you can go off the road we didn't say if off-road then else if the angle is this and the grass is mushy it did it on its own so in the algorithm world that ability to generalize called generalization to solve new problems that you've never seen that is the hallmark of intelligence this is really rather remarkable and obviously the industry has to regulate and make sure that it's safe but it's pretty interesting now let's shift gears for a second and most of us are engineers or scientists here we probably started at least in some fordham form of a nerd or a geek I still and proudly so so one of the things that's interesting is some of you may know a lot about this but I always like to have a mental model or a paradigm for how something kind of fits together and you're going to see examples that I show where some are things that humans do like driving a car or robotics some things are things that humans do terribly like assessing a hundred million different loans and predicting which one is going to default that's more like pretty that's predictive analytics is there a way that we can think about this that kind of puts it all in one nice bucket and there is I'm grossly oversimplifying but hey artificial intelligence it's really just fitting functions so there's some data we're going to make an algorithm that looks for a pattern and tries to fit a function to it I bet you can see a pattern you're there it is and so there's an input X and we're trying to fit a function y equals f of X to this data which fits it the best by some thing some kind of a metric and you can think of this as a cause and an effect a heart's getting interesting an example or an outcome again input or output and so you're telling me hey Andy why are you telling me about you know six seventh eighth grade algebra well what if X wasn't just a single dimension what if it were millions of dimensions every single piece of data in your data Lake and your siloed structured databases or data warehouses on what if you were mixing structured and unstructured data and what if Y was your most important business challenge like what is the chance this loan will default is this credit card fraud does this patient have cancer is this jet engine okay for the next flight or does it need maintenance early well it's really getting interesting now and so what you can think of this as a function but it's fitting very complicated nonlinear functions in an incredibly huge number of dimensions and that's what machine learning historically has done pretty well for maybe the last few decades what deep learning does is it can deal with nonlinear interactions and functions much better than ordinary machine learning which tends to be things like linear regression I'm not doing it service they're more advanced algorithms but this is incredibly complicated function fitting and that's why it can generalize so well and not over fit because when you have complicated models overfitting is a big problem and so I think of this as a question or answer pair you're sifting through all the data in your data Lake or database or in the real world and you're trying to synthesize an answer and so the standard way of doing artificial intelligence that's mostly what's in use today in production is called supervised learning you have many many examples of the behavior you want to synthesize like the people driving or that these are images and this is a Katniss as a dog that's called supervised learning you have labels or a target that you're trying to emulate like driving the car for hopefully a good driver well you can see that if you can do this as I'm showing it's really profound it's a revolution in the scientific method because what science at the risk of condescending to the entire room because we're all scientist or most of us in engineers scientific theory is at least is the way I see it right you typically you guess a model some people guessed a differential equation or a model and then they kind of prove that's that middle box they validated by fitting data and maybe they tune a few parameters but after a lot of trial and error they stamp that theory as it works in this regime and then you can predict things and that's electrical engineering and mechanical engineering and physics and chemistry and and and you get into softer Sciences where suddenly you can't do that but it works incredibly well we have these chips with billions of transistors and it's amazing but we don't have that for predicting credit card fraud or driving a car or understanding if some packets are a cyberattack when they're coming into your network because there's no laws of physics of a cyber attack or credit-card fraud there's only patterns of past behavior and so AI turns this around but it's a new kind of a scientific method data science it doesn't start with the model guess the right model and then validate it it starts with the data first so the data in our data lakes the data in our databases the data in our fabs in or whatever and it takes that data and it builds a model around it that fits it and the the revolution in deep learning is it can build incredibly complicated models and incredibly accurate models you fit the parameters there's a big risk of overfitting but there ways to avoid that and then you have a prediction so you have this thing can predict things we never predicted before it's like electrical engineering or physics or sorry I'm biased in my sciences but for all these other things that could never really have a good theory of and you can get 90% accuracy 99% accuracy and so contrast that for this is predictive analytics contrast that with data analytics no offence because a lot of us do data analytics and spend a lot of time but it takes historical data for example in our data lakes and it typically does backward-looking analysis queries averaging and you get insights into what happened past tense AI gives you predictions into the future what will happen next and that's a profound difference and the two are complementary by the way because we've spent the last 10 or 20 years collecting all this data doing some stuff with it and wondering when we're ever going to finally monetize our Hadoop data lake for example and finally we can because we can sift through it with AI algorithms and answer our most challenging business questions so I'll go through this slide quickly because I'm probably running a little slow but it's really a new paradigm for big data some of you many of you will recognize on the top there a little cartoon of a data analytics it's a typical data like environment this is a lambda architecture where some streaming data comes in it's stored in some kind of data warehouse or Hadoop infrastructure you might have batch data analytics jobs that calculate the closing of your quarter or all kinds of statistical reports and data analytics reports business intelligence reports you may have a speed layer like spark or something proprietary that lets you serve up applications that query the most recent events that have happened that the batch job might miss out on and you you you answer all kinds of business questions but artificial intelligence is this sort of virtuous cycle where you take that big data you in this circle on the bottom you build a powerful model you deploy that model and then as new data streams in at the edge you keep learning and it just gets better and better and every one of those models is like a little laws of physics or Snivy or Stokes equation for mechanical engineering for year for credit card fraud for what is the probability what will your speed Binns be for this hike a metal gate transistor with this lot of wafers going through halfway through the run card things like that it's really remarkable and so the interesting thing with this model is now you have these little artificial brains these little artificial intelligences and you can push the intelligence out to the edge of the network it doesn't have to be in your data center it's in your car it's in an airplane it's on an oil rig in the North Pacific where you'd rather not have any people or a mine in Africa because it's it's a dangerous environment and wouldn't it be great if you don't have to risk people's lives to be there so pushing that intelligence out to the edge and even taking data in in the edge learning incrementally at the edge and sending that incremental learning back to the data center to update your models well let's shift gears again let's talk a little bit about why these networks are so special so I guess I'll go through this quickly because it's easy to get in the weeds but with machine learning the name of the game for the last 10 20 years 30 years is you handcraft features a lot of us were familiar with Fourier transforms so you have a complicated time domain signal you take a Fourier transform and out pops two frequencies and say oh that's easy I can get rid of the noise with a high pass filter or low pass filter you transformed to a domain where the signal looked a very simple way so handcrafted feature engineering is that process of finding that Fourier transform or that whatever it is that transforms you into space where your data makes sense and then you kind of just separate it with lines between it that's the learning process the learning some machine learning enthusiasts will will not like me for this but it's relatively simple what and that's that's absolutely not fair and machine learning stands on a pedestal of fifty years of of magnificent Bayesian probability theory and the frameworks between deep learning and machine learning are more similar than they are different but the one thing that machine learning doesn't do well is you spend you you need to apply a lot of domain expertise to make these handcrafted features well one of the big things with deep learning is that it uses incredibly complicated models with many more parameters to find the features themselves so this is a network trained on phases and whenever you train a convolutional neural image network that's what this is on any objects the first layer in this network this cartoon is these are weights and biases and some of you may have seen this i if i get into explaining it a little go too much into the weeds but you're basically training the network you're training the numerical weights of these sets of connections and the first layer of images are these filters to the right of the person that woman's face those are called Gabor filters they're line in edge filters and it makes the observation that you can construct hierarchically any feature that you can draw with these simple basis functions so for faces you take these lines and edges you construct parts of a face and then an entire face and these are hierarchical feature sets a lot of problems in the world are hierarchical language right characters words sentences paragraphs novels Shakespeare the human body medicine atoms proteins molecules you know sub cellular structure cellular structure organs the functioning organism you know again and again plenty of problems like that and so what deep learning can do is it can automatically give it enough data find these ways to factorize and find features that you would not have discovered because they're just too many combinations by yourself and so the traditional machine learning approach would take a image of a cat and basically say well let me let me construct an ear detector and a cat has triangle ears and a round head and a chubby body so let me make a chubby body detector well you know network would just take this first layer and say well here's my iord detector it's 2 45 degree lines in a flat line and that looks like a cat's ear if I scale it right and there's some other attributes like color or texture that it's fuzzy and so you can build a cat that way and so you can you can solve for things that you never would have solved for the other thing deep neural networks can do better than machine learning again I'm being taking a little bit poetic license here is they can solve highly nonlinear problems this is actually a fairly simple neural network problem but Google wanted to try to minimize the power that they use in their huge data center with all their servers you can ajan its massive and so this is a curve the blue of sorry the red of the actual power usage efficiency so it's the difference between the power that you actually need to run the equipment that's dissipated and the power at the trunk line and when it says 1.1 that means 10 percent was wasted and it fluctuates as the servers go on and off is the temperature of the chillers makes your data center different air flows and different temperatures and they have thousands of sensors that give the state of the air the chillers that temperature the air circulation the workloads on all the different servers and their types in their configuration literally whether they're running UNIX or some other operating system and the curve you get is incredibly complicated but guess what if you take sort of a thousand variables let's say and you feed in the power usage efficiency that you actually had so this is a supervised case because you have examples of the answer and you do that with several years of data and of course ideally your configuration of servers should stay the same you can predict that behavior so now I feed in the current value of my variables or what I might like it to be and it can predict the power usage efficiency look at that nonlinear curve it looks like a forex trading currency stock chart or something but it can predict it and that's remarkable so the ability to map incredibly calm located nonlinear behavior is a big thing and just to geek out for one more slide and then I'll transition into some applications the thing that's so amazing about deep neural networks and the reason that some of the pioneers in the field for twenty thirty forty years sort of never gave up as they knew that there was something special which is that if this is a neural network and you input an image think about this how many images are there in a megapixel image with just a 256 grayscale there are 256 to the millionth power unique individual images there's 10 to the 78 atoms in the universe 256 to the millionth power and somehow when I write an image detection algorithm that says that's a cat or that's a pedestrian I have to take an input that could be any of those 256 to the millionth power images and find a way to sift through it and find a tiny infinitesimal subset in that space of images that I would say those look like cats or those look like dogs or pedestrians and guess what one of the magic of deep neural networks is whenever you have an algorithm or function with a lot of inputs or you're trying to solve a differential equation in space with a lot of inputs a complicated integrated circuit there's something there's an explosion in complexity that happens with the number of variables and in machine learning it's on page 7 of every machine learning book it's called the curse of dimensionality and it means that as you increase the number of variables n the number of states you have to sift through explodes exponentially goes to e to the nth power or something like that well guess what if as I step through this network I rule out say I'm trying to identify cat that some of these things work at some they're dogs if I eliminate only a factor of 2 of the states at every point but every time I step through the network I eliminate 2 more guess what that's a decaying exponential that's like an e to the minus n so it's searching through this space with exponential complexity and efficiency to find that one or small number of right answers and the only trick is this is in prediction mode can you train it so how do you train these networks some of them have millions or even billions of parameters and so the number of combinations is even bigger than that crazy number I just gave you a minute ago and so it turns out and one of the magics of deep neural network is it's an optimization problem you tell it what the answer is and it adjusts all these millions of parameters to get closer to the right answer and then it's just an iterative process of getting a wrong answer correcting all the weights with a lot of computing power and it's an optimization problem so see that that that minimum that dish-shaped thing there in the upper right the point at the bottom the minimum is the answer if it makes you feel about you can flip it around and pretend that we're climbing Mount Everest but it's an optimization problem where you're trying to find the minimum or the maximum and so oh it looks simple right this is a minimization algorithm it's called back propagation by the way it was invented in 1987 in a famous paper but it takes massively parallel it takes massive compute and luckily it's amenable to parallel compute and so how do we do all these amazing applications like computer vision speech recognition natural language processing we'll talk about in a second there's do we do we code do we code these algorithms so we say if you know I see this then do this then do that no it's no if-then-else rule-based programming we take a network and we feed it a bunch of data and we do this optimization thing where we optimize the weights and at the end if it converges to an accuracy we're happy with we say we're done and we can use it and so there have been a set of developments since that magic year of 2012 where deep learning frameworks were constructed by a lot of companies and luckily for all of us they're all open source so it's really an open source kind of an ethos or philosophy and tensorflow MX net cafe torciano I'm sure you've heard of some of them and so what they do they very efficiently let you construct these networks and train them and you can train them on different types of compute power but what you'll find is that if you have very much data at all you want to use parallel computing otherwise it'll take an unreasonable amount of time and that's where GPUs come in I'll talk about that the end I am badly in need of speeding up so very quickly you realize that it goes beyond just a one-trick pony it's not just speech it's not just computer vision using these frameworks you can plug a vision Network into a language network and you can say be great not just to say that that lower-right image on the left is a cat but it's a cat's black cat sitting on top of a suitcase well guess what they took a trained image network that could say that's a cat that's a suitcase they put human annotated or labeled images and they trained it and the computer generated those those annotations and I don't think you can tell the difference from the human now here's the other beauty of it is that these frameworks allow cross pollenization of different verticals so someone in medical research and a lot of these things are researched so this paper on the left was from Stanford a lot of them will post their code on github open-source you can take that code that code in these deep learning frameworks a medical researcher took that and said if you can do that I'll bet I can replicate or imitate the diagnosis of the pathologist or radiologist would have given of these chest x-rays screening for lung cancer and it worked and so these frameworks allow you to take an example that's very close to what you want to do but often in another domain and very quickly get something that's working at least for research getting it to production of course is a whole nother challenge I'm going to speed up but it's really revolutionizing medicine this was actually a fairly simple paper from a while ago but you can show that sometimes patients come into an emergency room and they look fine and then they crash and no one can figure out what's wrong with them it turns out that by just putting in the half a dozen basic things like pulse ox the way that they would monitor you an emergency room you can predict the probability of them crashing and dying and if you can do that and you can get warning you can figure out you know what to do and what the problem is it's not too far of a step from that to say that what if we took a patient's all their medical history the analytics their family history the history of similar patients and we did this supervised learning where we said these are the outcomes that these patients have and this patient looks a lot like these other ones and so a doctor could get something right there in the office to say to some patient hey you've got a 70% chance in the next year of congestive heart failure and a 91% chance in the next three years you better do something about it and this kind of thing is absolutely revolutionising medicine it's really revolutionising all kinds of industries and domains I'm doing a lot of work in finance so this is an example from Stanford where they took a hundred and twenty million loans from 20 years something like several billion I think it was three and a half billion monthly loan data points this is what we know about this person with this it's a mortgage loan this is whether they defaulted this is whether they were one or two or three months late and they trained a network and they wanted you want to predict delinquency or prepayment that you're going to pay it and get a different loan and so on this graph here is the actual observed number of pre payments versus the predicted number by the model and the black line 45 degrees is ground truth that's a perfect model and the blue deep learning model you can see is just hugging the ground truth the red is a more basic machine learning logistic regression linear regression model and you can see it's not nearly as accurate so that that is cash money for companies whose core competence is to understand the risk of lending insurance companies are all over this because what is an insurance company cost of goods sold of their product is guessing how much they're going to pay out statistically to their customers that's their cost of goods sold so I mean that's you know we call that underwriting but it's predictive analytics so for example if you're an insurance company and you want to issue crop insurance they can take satellite data of course they can say this is their field and this is their crops but more importantly they can take historical crop yield weather patterns and by regressing by using supervised learning with the labeled historical crop yield data they can do a model that will basically predict how much you would expect to pay out for a given crop insurance and you know insurance is very elastic so if you can if you can hit your target margin but lower your cost by 10 lower your cost by 10 percent and still hit your target margin you might get 50 percent more business in a certain segment so this is absolutely revolutionising all kinds of industries I'll go through this one quickly quant investment investment banks and hedge funds are going crazy they build things called factor models which try to predict what short and long term factors will predict the return of a security and now the whole world is your factor model you can take external data open data from the internet but things that might be hidden you can do speech detection on earnings calls and get the answer in a second not the next day they're companies that are selling proprietary data feeds to map the earth every 15 minutes at a sub-meter resolution that means you can track in principle the location of every tanker every ship on the planet cuando is doing all kinds of receipts and so you can predict the behavior that correlates to stock returns of for example um you know retail businesses so that's a huge thing semiconductor industry it's obvious that you can do things like defect detection again using image processing but if in your fab you could take the outcome of lots going through your fab you could say I know from my electrical test I have failure data defects and I have speed bins and parametric data for transistors or for memory and what if at every step in the run card I could throw in all my information all my metrology all my defect inspection all the sensor data from all my CVD and PVD process tools and all that jazz and my lithography you could predict it every point in the run card more than you knew before what's my yield going to be on this lot on this wafer could I do something to correct it parametrically what kind of speed bins am I going to have in four months when this high-speed logic goes through the production line it's really revolutionising things like this and people are gearing up to do these applications there is another kind of learning where you don't have labels called unsupervised learning and so here's a video for 40 seconds this is a bunch of images but we didn't tell the deep learning algorithm what the images represent they're no labels cat or dog it's clustering images that it thinks somehow are similar it doesn't understand them but it says these different clusters are somehow similar and we're manipulating it in a visualization tool but so the question is did it do its job did it find things that were similar let's look at this cluster oh it's military aircraft it didn't know that that's called unsupervised learning and so you can also do that semantically with language you can take documents and it will learn how different words cluster and without labeling their meaning it will learn relationships so you can look into these clusters just like in that last thing but this was trained on this with deep neural network trained on documents and then called an RNN recurrent neural network and look it clustered countries Switzerland Mexico Brazil it figured out that states in the US were somehow similar but it can do better than that it can figure out semantic relationships and meaning so this blew people away when this happened I love this so you can do something called an embedding it will embed the words in a lower dimensional vector space and you can do vector addition on semantic meanings so the high dimensional vectors that represent king/queen men and women you can take in a vector sense just like linear algebra that we all studied king- man plus woman equals queen and this just came out of it it just learned that on its own they were blown away when they saw this and there's so many things like that let's see I am late so I'm going to skip this example that was on unsupervised learning and shift to their very last few slides in my talk which is on the implications for computer architecture and I'm sorry I say I'm running late I'll try to go through this super quickly so that we don't push back the next speaker but big data requires massive compute small data now with customers I work with is a few terabytes now you small data you've got five terabytes doesn't you know dozens and hundreds of terabytes going to petabytes going to whatever that's the kind of data that we're going to see in the future in our data lakes and so the challenge is that to train that data on these massive models requires incredibly huge literally super compute power and so the problem is that ok Moore's Law is as we all know who is rolling over now but we all know as well that starting back at 65 nanometers you couldn't you couldn't crank up the clock speed indefinitely so the single threaded performance started rolling over it's that lower blue curve wait before Moore's law started flattening out so what that implies is that you have to go to a parallel computer architecture and I had a nice video that I'm going to skip for time but that shows how and NVIDIA GPU has typically three thousand to five thousand parallel cores and not for every operation but for maracle intensive operations those five thousand cores are something on the order of a hundred times faster than a CPU that might have 10 20 50 parallel cores if you have a dual socket machine and all right I introduced you Leonardo and he is going to paint a picture for you guys in the way that a CPU might do it as a series of discrete actions performed sequentially one after the other it can let me speed it up ladies and gentlemen Leonardo [Applause] - why ladies and gentlemen and what's happening is that so you have the potential with graphics processing units which came from the world of video we've repurposed them to be general computing machines that's what we do at Nvidia and we have a language called CUDA which allows the deep learning framework programmers to easily leverage these parallel computing cores and so the good news is that even though Moore's Law is shallowing and single threaded performance already with shallowing we've still got a lot of juice left to squeeze in this lemon to make some sweet lemonade because you see that the cpu performance is scaling very weakly now both in compute power and memory bandwidth the Green Line is the GPU compute power in flops and memory bandwidth and so what you need is typically if you would have tried to do something like deep learning machine learning or complicated data analytics you would typically scale out to a large cluster maybe something like a spark cluster but nowadays people that are using these deep neural networks they don't necessarily want to scale out to a large cluster they can scale up to a dense GPU node server that might have eight GPUs in it and because it doesn't have to talk over the network it has high-speed interconnects within the same box you can use those 5,000 cores on every GPU and they can communicate efficiently and what you get is quite remarkable so we built and I guess this is the sales pitch of this of this talk but we built a custom-built supercomputer it's a 3u server with eight of our latest GPUs and this box called the DJ x1 is equivalent to it's faster than 128 Knights landing servers when you benchmark it an image recognition deep neural and network and other neural network benchmarks and beyond just raw a supercomputer our it has a fully integrated software stack where all of the deep learning frameworks are containerized and you can download them from software registry so everything you need as an enterprise to have your data science team be productive in that one box that's a three year server that's more powerful than the most powerful supercomputer in the world was in in in 2004 it was something called the earth simulator in Tokyo it had its own building and it was a two billion dollar supercomputer number one in the world this thing is more powerful that's a 3u server so it's really remarkable the power that you get and this is exactly what's needed to train these deep neural networks I'll just make this the last slide because I do apologize I'm running late but we asked ourselves well if this one box is equal to the most powerful supercomputer in 2004 and 128 state of your data center service now what would happen if you plugged 128 of these boxes together if you really needed power to crank through a deep learning well we did it because we're Nvidia and we have them and we plug them together with high performance computing fiber optics within each box they're these very high-speed links that are faster called envy link faster than PCI Express and you immediately get I hope I'm not misquoting this it's the 26th or the 28th most powerful supercomputer in the world on the top 500 list it was number one on the green 500 list and we're using it for deep learning research there's a there's a deep learning framework just for cancer that mixes HPC molecular dynamics and deep learning and by the way if you only plug 13 of these dgx together you'd already get on to the top 500 list and so we have the power we have the data in our Hadoop data lakes and our databases and we have the algorithms so the next 50 years is going to be about this and I hope that was interesting and thank you very much for your time thank you so much you
Info
Channel: The Artificial Intelligence Channel
Views: 3,816
Rating: 4.8421054 out of 5
Keywords: singularity, transhumanism, ai, artificial intelligence, deep learning, machine learning, immortality, anti aging, deepmind, robots, robotics, self-driving cars, autonomous cars, Tesla motors
Id: 9iODaocJjxI
Channel Id: undefined
Length: 45min 33sec (2733 seconds)
Published: Wed Oct 11 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.