MIT 6.S191 | Deep Learning New Frontiers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi everyone and welcome to lecture six of MIT  6.S191! This is one of my absolute favorite   lectures in the course and we're going to  focus and discuss some of the limitations of   deep learning algorithms as well as some of the  emerging new research frontiers in this field   before we dive into the technical content  there are some course related and logistical   announcements that i'd like to make the first  is that our course has a tradition of designing   and delivering t-shirts to students participating  this year we are going to continue to honor that   so to that end we have a sign up sheet on canvas  for all students where you can indicate your   interest in receiving a t-shirt and once you  fill out that sign up sheet with the necessary   information will ensure that a t-shirt is  delivered to you by the appropriate means   as soon as possible and if after the class if the  canvas is closed and you can't access that signup   form please just feel free to send us an email  and we'll find a way to get the t-shirt to you   so to provide a take a step back and give  an overview of our schedule of this course   so far where we've been and where we're going  following this lecture on limitations and new   frontiers we'll have the due date for our final  software lab on reinforcement learning tomorrow   we're going to have two really exciting hot  topic spotlight lectures with brand new content   and that will be followed by a series of four  guest lectures you'll have time over the rest   of this week to continue to work on your final  projects and the class will conclude on friday   with the student final project presentations  and proposal competition as well as our award   ceremony so speaking of those final projects  let's get into some details about those for   those of you taking the course for credit you  have two options to fulfill your grade the first   is a project proposal where you will work  in up to a group of four to develop a new   and novel deep learning idea or application and we  realize that two weeks is a very short amount of   time to come up with and implement a project  so we are certainly going to be taking this   into consideration in the judging then on friday  january 29th you will give a brief three-minute   presentation on your project proposal to a group  of judges who will then award the final prizes   as far as logistics and timelines you  will need to indicate your interest   in presenting by this wednesday at midnight  eastern time and will need to submit the slide   for your presentation by midnight eastern time  on thursday instructions for the project proposal   and submission of these requirements are on  the course syllabus and on the canvas site   our top winners are going to be awarded  prizes including nvidia gpus and google homes   the key point that i'd like to make  about the final proposal presentations   is that in order to participate and be eligible  for the prize synchronous attendance is required   on friday's course so friday january 29th from 1  to 3 p.m eastern time you will need to be present   your or your group will need to be present  in order to participate in the final proposal   competition the second option for  fulfilling the credit requirement   is to write a one-page review of a deep learning  paper with the evaluation being based on   the completeness and clarity of your review this  is going to be due by thursday midnight eastern   time and further information and instruction  on this is also available on canvas so   after this lecture next we're going to have a  series of two really exciting hot topic spotlight   talks and these are going to focus on two rapidly  emerging in developing areas within deep learning   deep learning research the first is going  to highlight a series of approaches called   evidential deep learning that seeks to develop  algorithms that can actually learn and estimate   the uncertainties of neural networks and the  second spotlight talk is going to focus on machine   learning bias and fairness and here we're going to  discuss some of the dangers of implementing biased   algorithms in society and also emerging strategies  to actually mitigate these unwanted biases   that will then be followed by a series of  really exciting and awesome guest lectures   from leading researchers in industry and  academia and specifically we're going to have   talks that are going to cover a diversity  of topics everything from ai and healthcare   to document analysis for business applications  and computer vision and we highly highly highly   encourage you to join synchronously for these  lectures if you can on january 27th and january   28th from 1 to 3 p.m eastern these are going to  be highlighting very exciting topics and they   may extend a bit into the designated software lab  time so that we can ensure we can have a live q a   with our fantastic guest speakers all right  so that concludes the logistical and course   related announcements let's dive into the fun  stuff and the technical content for this lecture   so so far in taking success 191 i hope that  you've gotten a sense of how deep learning   has revolutionized and is revolutionizing so  many different research areas and fields from   advances in autonomous vehicles to medicine  and healthcare to reinforcement learning   generative modeling robotics and a variety  of other applications from natural language   processing to finance and security and alongside  with understanding the tremendous application   utility and power of deep learning i hope that  you have also established concrete understanding   of how these algorithms actually work and how  specifically they have enabled these advances to take a step back at the types of algorithms and  models that we've been considering we've primarily   dealt with systems that take as input data as the  in the form of signals images other sensory data   and move forward to produce a decision as the  output this can be a prediction this can be a   outputted detection it can also be an action as  in the case of reinforcement learning we've also   considered the inverse problem as in the case  of generative modeling where we can actually   train neural networks to produce new data  instances and in both these paradigms we can   really think of neural networks as very powerful  function approximators and this relates back to a   long-standing theorem in the theory of neural  networks and that's called the universal   approximation theorem and it was presented in  1989 and generated quite the stir in the community   and what this theorem the universal approximation  theorem states is that a neural network with a   single hidden layer is sufficient to approximate  any arbitrary function to any arbitrary position all it requires is a single layer and  in this class we've primarily dealt with   deep neural models where we are stacking multiple  hidden layers on top of each other but this   theorem completely ignores that fact and says  okay we only need one layer so long as we can   reduce our problem to a set of outputs inputs and  a set of outputs this means there has to exist a   neural network that can solve this problem it's  a really really powerful and really big statement   but if you consider this closely there are a  couple of caveats that we have to be aware of   the first is that this theorem makes no  guarantees on the number of hidden units   or size of the layer that's  going to be required to solve   such a problem right and it also leaves open  the question of how we could actually go about   training such a model finding the weights  to support that architecture it doesn't make   any claims about that it just says it proves  that one such network exists but as we know   with gradient descent finding these weights  is highly non-trivial and due to the very   non-convex nature of the optimization problem  the other critical caveat is that this theorem   places no guarantees on how well the resulting  model would actually generalize to other tasks   and indeed i think that this this theorem  the universal approximation theorem points   to a broader issue that relates to the possible  effects of overhype in artificial intelligence and   us as a community as students invested  in advancing the state of this field   i think we need to be really careful in how  we consider and market and advertise these   algorithms while the universal approximation  theorem was able to generate a lot of excitement   it also provided a false sense of  hope to the community at the time   which was that neural networks could be used to  solve any problem and as you can imagine this   overhype is very very very dangerous and this  over hype has also been tied in to what were two   historic a.i winters where research in artificial  intelligence and neural networks more specifically   slowed down very significantly and i think we're  still in this phase of explosive growth which is   why today for the rest of the lecture i want  to focus in on some of the limitations of the   algorithms that we've learned about and extend  beyond to discuss how we can go beyond this to   consider new research frontiers  all right so first the limitations   one of my favorite and i think one of  the most powerful examples of a potential   danger and limitation of deep neural networks  come from this paper called understanding deep   neural networks requires rethinking generalization  and what they did in this paper was a very simple   experiment they took images from the dataset  imagenet and each of these images are associated   with a particular class label as seen here and  what they did was they did this experiment where   for every image in the data set not class but  individual images they flipped a die a k sided   die where k was the number of possible classes  they were considering and they used this this   flip of the die to randomly assign a brand new  label to a particular image which meant that these   new labels associated were completely random with  respect to what was actually present in the image   so for example a remapping could be visualized  here and note that these two instances of dogs   have been mapped to different classes altogether  so we're completely randomizing our labels   what they next did was took this data this  scrambled data and tried to fit a deep neural   network to the to the imagenet data by applying  varying degrees of randomization from the original   data with the untouched class labels to the  completely randomized data and as you ex may   expect the model's accuracy on the test set an  independent test set progressively tended to zero   as the randomness in the data increased but what  was really interesting was what they observed when   they looked at the performance on the training  set and this is what they found they found that   no matter how much they randomized the labels the  model was able to get close to 100 accuracy on the   training set and what this highlights is that in a  very similar way to the statement of the universal   approximation theorem it gets at this idea that  deep neural networks can perfectly fit to any   function even if that function is associated with  entirely random data driven by random labeling so to draw really drive this point home i  think the best way to consider and understand   neural networks is as very very good function  approximators and all the universal approximation   theorem states is that neural networks are very  good at this right so let's suppose here we have   some data points and we can learn using a neural  network a function that approximates this this   data and that's going to be based on sort of a  maximum likelihood estimation of the distribution   of that data what this means is that if we give  the model a new data point shown here in purple   we can expect that our neural network is going  to predict a maximum likelihood estimate for that   data point and that estimate is probably going  to lie along this function but what happens now   if i extend beyond this in distribution region to  now out of domain regions well there are really   no guarantees on what the data looks like in  this region in these regions and therefore we   can't make any statements about how our model  is going to behave or perform in these regions   and this is one of the greatest limitations  that exist with modern deep neural networks   so there's a revision here to this  statement about neural networks being really   excellent function approximators they're  really excellent function approximators   when they have training data and this  also raises the question of what happens   in these out-of-distribution regions where the  network has not seen training examples before   how do we know when our network doesn't know  is not confident in the predictions it's making building off this idea i think there can be this  conception that can be amplified and inflated   by the media that deep learning is basically  alchemy right it's this magic cure it's this be   all and all solution that can be applied to any  problem i mean its power really seems awesome and   i'm almost certain that was probably a draw  for you to attend and take this course but   you know if we can say that deep learning  algorithms are sort of this be all   all convincing uh solution that can be applied  to any arbitrary problem or application there's   this also resulting idea and belief that you can  take some set of training data apply some network   architecture sort of turn the crank on your  learning algorithm and spit out excellent results   but that's simply not how deep learning works your  model is only going to be as good as your data   and as the adage in the community goes if you  put garbage in you're going to get garbage out   i think an example that really highlights  this limitation is the one that i'm going   to show you now which emphasizes just how much  these neural network systems depend on the data   they're trained with so let's say we have this  image of a dog and we're going to pass it into   a cnn based architecture where our goal is to try  to train a network to take a black and white image   and colorize it what happened to this image of  a dog when it was passed into this model was   as follows take a cl close look at this result if  you'll notice under the nose of the dog there's   this pinkish region in its fur which probably  doesn't make much sense right if if this was just   a natural dog but why could this be the case why  could our model be spitting out this result well   if we consider the data that may  have been used to train the network   it's probably very very likely that amongst  the thousands upon thousands of images of   dogs that were used to train such a model the  majority or many of those images would have   dogs sticking their tongues out  right because that's what dogs do   so the cnn may have mapped that region under the  mouth of the dog to be most likely to be pink   so when it saw a dog that had its mouth closed  it didn't have its tongue out it assumes in a way   right or it's it's built up representation is such  that it's going to map that region to a pink color   and what this highlights is that deep learning  models build up representations based on the   data they've seen and i think this is a really  critical point as you go out you know you've taken   this course and you're interested in applying  deep learning perhaps to some applications   and problems of interest to you your model is  always going to be only as good as your data and this also raises a question of how do  neural networks handle data instances where   that they have not encountered before and  this i think is highlighted uh very potently   by this infamous and tragic example from a  couple years ago where a car from tesla that   was operating autonomously crashed while operating  autonomously killing the driver and it turned out   that the driver who was the individual killed  in that crash had actually reported multiple   instances in the weeks leading up to the crash  where the car was actually swiveling towards   that exact same barrier into which it crashed why  could it have been doing that well it turned out   that the images which were representative  of the data on which the car's autonomous   system was trained the images from that region  of the freeway actually lacked new construction   that altered the appearance of that barrier  recent construction such that the car   before it crashed had encountered a data instance  that was effectively out of distribution and it   did not know how to handle this situation  because i had only seen particular bear a   particular style and architecture of the barrier  in that instance causing it tragically to crash   and in this instance it was a a occurrence where  a neural network failure mode resulted in the loss   of human life and this points these sorts of  failure modes points to and motivate the need   for really having systematic ways to understand  when the predictions from deep learning models   cannot be trusted in other words when it's  uncertain in its predictions and this is a very   exciting and important topic of research in deep  learning and it's going to be the focus of our   first spotlight talk this notion of uncertainty  is definitely very important for the deployment   of deep learning systems and what i like  to think of as safety critical applications   things like autonomous driving things like  medicine facial recognition right as these   algorithms are interfacing more and more with  human life we really need to have principled ways   to ensure their robustness uncertainty metrics  are also very useful in cases where we have to   rely on data sets that may be imbalanced  or have a lot of noise in present in them   and we'll consider these different use  cases further in the spotlight lecture   all right so before as a preparation for  tomorrow's spotlight lecture i'd like to   give a bit of an overview about what uncertainties  we need and what uncertainties we can talk about   when considering deep learning algorithms so  let's consider this classification problem where   we're going to try to build a neural network that  models probabilities over a fixed set of classes   so in this case we're trying to train a neural  network on images of cats images of dogs and   then output whether a new image has a cat or has  a dog right keep in mind that um the probabilities   of cat and dog have to sum to one so what happens  when we now train our model we're ready to test it   and we have an image that contains both a cat and  a dog still the network is going to have to output   class probabilities that are going to sum to one  but in truth this image has both a cat and a dog   this is an instance of what we can think about  as noise or stochasticity that's present in the   data if we train this model on images of cats  alone or dogs alone a new instance that has   both a dog and a cat is noisy with respect to  what the the model has seen before uncertainty   metrics can help us assess the noise that's the  statistical noise that's inherent in the data   and present in the data and this is called  data uncertainty or alliatoric uncertainty   now let's consider another case let's take our  same cat dog classifier and input now an image   of a horse to this classifier again the output  probabilities are going to have to sum to one   but even if the network is predicting that this  image is most likely containing a dog we would   expect that it should really not be very confident  in this prediction and this is an instance where   our model is now being tested on an image that's  totally out of distribution an image of a horse   and therefore we're going to expect that  it's not very confident in its prediction   this type of uncertainty is a different type  of uncertainty than that data uncertainty   it's called model or epistemic uncertainty and it  reflects how confident a given prediction is very   very important for understanding how well neural  networks gener generalized to out of distribution   regions and how they can report on their  performance in out-of-distribution regions and   in the spotlight lecture you'll really take a deep  dive into these ideas of uncertainty estimation   and explore some emerging approaches to actually  learn neural network uncertainties directly the third failure mode i'd like to  consider is one that i think is super fun   and also in a way kind of scary and  that's this idea of adversarial examples   the idea here is we take some input example for  example this image of a temple and a standard cnn   trained on you know a set of images is  going to classify this particular image   as a temple with 97 probability we then take that  image and we apply some particular perturbations   to that image to generate what we call an  adversarial example such that if we now feed   this perturbed example to that same cnn it  no longer recognizes that image as a temple   instead it incorrectly classifies this image  as an ostrich which is kind of mind-boggling   right so what was it about this perturbation that  actually achieved this complete adversarial attack   what is this perturbation doing remember that when  we train neural networks using gradient descent   our our task is to take some objective j and try  to optimize that objective given a set of weights   w an input x and a prediction y and our goal  and what we're asking in doing this gradient   descent update is how does a small change in  the weights decrease the loss specifically how   can we perturb these weights in order to minimize  the loss the objective we're seeking to minimize in order to do so we train the network with a  fixed image x and a true label y and perturb   only the weights to minimize the loss with  adversarial attacks we're now asking how can   we modify the input image in order to increase the  error in the network's prediction therefore we're   trying to predict to perturb the input x in some  way such that when we fix the set of weights w   and the true label y we can then increase the  loss function to basically trip the network up   make it make a mistake this idea of adversarial  perturbation was recently extended by a group   here at mit that devised an algorithm that could  actually synthesize adversarial examples that were   adversarial over any set of transformations  like rotations or color changes and they   were able to synthesize a set of 2d adversarial  attacks that were quite robust to these types   of transformations what was really cool was  they took this a step further to go beyond 2d   images to actually synthesize physical  objects 3d objects that could then be   used to fool neural networks and this was the  first demonstration of adversarial examples that   actually existed in the real physical world so the  example here these turtles that were 3d printed   adversarial to be adversarial were incorrectly  classified as rifles when images of those turtles   were taken again these are real physical  objects and those images were then fed into   a classifier so a lot of interesting questions  raised in terms of what how can we guarantee   the robustness and safety of deep learning  algorithms to such adversarial attacks which can   be used perhaps maliciously to try to perturb the  systems that depend on deep learning algorithms the final limitation but certain that i'd like  to introduce in this lecture but certainly not   the final limitation of deep learning overall is  that of algorithmic bias and this is a topic and   an issue that deservingly so has gotten a lot of  attention recently and it's going to also be the   focus of our second hot topic lecture and this  idea of algorithmic bias is centered around the   fact that neural network models and ai systems  more broadly are very susceptible to significant   biases resulting from the way they're built the  way they're trained the data they're trained on   and critically that these biases can lead to very  real detrimental societal consequences so we'll   discuss this issue in tomorrow's spotlight talk  which should be very exciting so these are just   some of many of the limitations of neural networks  and this is certainly not an exhaustive list and   i'm very excited to again re-emphasize that we're  going to focus in on two of these limitations   uncertainty and algorithmic bias in  our next two upcoming spotlight talks   all right for the remainder  of this talk this lecture   i want to focus on some of the really exciting  new frontiers of deep learning that are being   targeted towards tackling some of these  limitations specifically this problem of   neural networks being treated as like black box  systems that uh lack sort of domain knowledge and   structure and prior knowledge and finally  the broader question of how do we actually   design neural networks from scratch  does it require expert knowledge   and what can be done to create more generalizable  pipelines for machine learning more broadly   all right the first new frontier that we'll delve  into is how we can encode structure and domain   knowledge into deep learning architectures to take  a step back we've actually already seen sort of an   example of this in our study of convolutional  neural networks convolutional neural networks   cnns were inspired by the way that visual  processing is thought to work in in the brain   and cnns were introduced to try to capture spatial  dependencies in data and the idea that was key to   enabling this was the convolution operation  and we saw and we discussed how we could use   convolution to extract local features present  in the data and how we can apply different sets   of filters to determine different features and  maintain spatial invariance across spatial data this is a key example of how the structure of  the problem image data being defined spatially   inspired and led to a advance in encoding  structure into neural network architecture   to really tune that architecture specifically for  that problem and or class of problems of interest moving beyond image data or sequence data the  truth is that all around us there are there   are data sets and data problems that have  irregular structures in fact there can be a   the paradigm of graphs and of networks is one  where there's a very very high degree of rich   structural information that can be encoded in a  graph or a network that's likely very important   to the problem that's being considered but it's  not necessarily clear how we can build a neural   network architecture that could be well suited to  operate on data that is represented as a graph so   what types of data or what types of examples could  lead naturally to a representation as a graph well   one that we're all too immersed in and  familiar with is that of social networks   beyond this you can think of state machines which  define transitions between different states in   a system as being able to be represented  by a graph or patterns of human mobility   transportation chemical molecules where you can  think of the individual atoms in the molecule   as nodes in the graph connected by the bonds  that connect those atoms biological networks   and the commonality to all these instances and  graphs as a structure more broadly is driven by   this appreciation for the fact that there are so  many real world data examples and applications   where there is a structure that can't be readily  captured by a simple a simpler data encoding like   an image or a temporal sequence and so we're going  to talk a little bit about graphs as a structure   that can provide a new and non-standard  encoding for a series of of of problems all right to see how we can do this and to build  up that understanding let's go back to a network   architecture that we've seen before we're  familiar with the cnn and as you probably know   and i hope you know by now in cnn's we  have this convolutional kernel and the   way the convolutional operation in cnn layers  works is that we slide this rectangular kernel   over our input image such that the kernel can  pick up on what is inside and this operation   is driven by that element-wise multiplication  and addition that we reviewed previously   so stepping through this if you have an  image right the the convolutional kernel is   effectively sliding across the image applying  its filter its set of weights to the image   going on doing this repeatedly and repeatedly  and repeatedly across the entirety of the image   and the idea behind cnns is by designing these  filters according to particular sets of weights   we can pick up on different types of  features that are present in the data   graph convolutional networks operate on  a very using a very similar idea but now   instead of operating on a 2d image the network is  operating on data that's represented as a graph   where the graph is defined by nodes shown here in  circles and edges shown here in lines and those   edges define relationships between the nodes  in the graph the idea of of how we can extract   information from this graph is very similar in  principle to what we saw with cnns we're going   to take a kernel again it's just a weight matrix  and rather than sliding that kernel across the 2d   the 2d matrix representation of our image that  kernel is going to pop around and travel around   to different nodes in the graph and as it does so  it's going to look at the local neighborhood of   that node and pick up on features relevant to the  local connectivity of that node within the graph   and so this is the graph convolution operation  where we now learn to the the network learns to   define the weights associated with  that filter that capture the edge   dependencies present in the graph so let's  step through this that weight kernel is going   to go around to different nodes and it's  going to look at its emergent neighbors   the graph convolutional operator is going to  associate then weights with each of the edges   present and is going to apply those weights across  the graph so the kernel is then going to be moved   to the next node in the graph extracting  information about its local connectivity   so on applying to all the different nodes in the  graph and the key as we continue this operation   is that that local information is going to be  aggregated and the neural network is going to   then learn a function that encodes that local  information into a higher level representation   so that's a very brief and intuitive introduction  hopefully about graph confirmation on neural   networks how they operate in principle and it's  a really really exciting network architecture   which has now been enabling  enormously powerful advances   in a variety of scientific domains for example  in chemical sciences and in molecular discovery   there are a class of graph neural networks  called message passing networks which have been   very successfully deployed on 2d two-dimensional  graph-based representations of chemical structures   and these message passing networks build up a  learned representation of the atomic and chemical   bonds and relationships that are present in a  chemical structure these same networks based on   graph neural networks were very recently applied  to discover a novel antibiotic a novel drug   that was effective at killing resistant bacteria  in animal models of bacterial infection i think   this is an extremely exciting avenue for research  as we start to see these deep learning systems   and neural network architectures being applied  within the biomedical domain another recent and   very exciting application area is in mobility and  in traffic prediction so here we can take streets   represent them as break them up to represent  them as nodes and model the intersections and the   regions of the street network by a graph where the  nodes and edges define the network of connectivity   and what teams have done is to build up  this graph neural network representation   to learn how to predict traffic patterns across  road systems and in fact this modeling can   result in improvements in how well estimated  time of arrivals can be predicted in things   and interfaces like google maps another very  recent and highly relevant example of graph   neural networks is in forecasting the spread of  covin-19 disease and there have been groups that   have looked into incorporating both geographic  data so information about where a person lives and   is located who they may be connected to as well as  temporal data information about that individual's   movement and trajectory over time and using  this as the input to graph neural networks and   because of the spatial and temporal component to  this data what has been done is that the graph   neural networks have been integrated with temporal  embedding components such that they can learn   to forecast the spread of the covid19 disease  based not only on spatial geographic connections   and proximities but also on temporal patterns  another class of data that we may encounter is   that of three-dimensional data three-dimensional  sets of points which are often referred to   as point clouds and this is another domain in  which the same idea of graph neural networks is   enabling a lot of powerful advances  so to appreciate this you will first   have to understand what exactly these  three-dimensional data sets look like   these point clouds are effectively unordered sets  of data points in space a cloud of points where   there's some underlying spatial dependence  between the points so you can imagine having   these sort of point-based representations of  a three-dimensional structure of an object   and then training a neural network on these  data to do many of the same types of of tasks   and problems that we saw in our computer vision  lecture so classification taking a point cloud   identifying that as an object as a particular  object segmentation taking a point cloud   segmenting out instances of that point cloud  that belong to particular objects or particular   content types we what we can do is we can extend  graph convolutional networks to be able to operate   to point clouds the way that's done which i  think is super awesome is by taking a point cloud   expanding it out and dynamically computing a graph  using the meshes inherent in the point cloud and   this is example is shown with this this structure  of a rabbit where we're starting here from the   point cloud expanding out and then defining the  local connectivity uh across this 3d mesh and   therefore we can then apply graph convolutional  networks to sort of maintain invariances about   the order of points in 3d space and also still  capture the local geometries of such a data system all right so hopefully that gives you a sense  of different types of ways we can start to   think about encoding structure internal  neural network architectures moving beyond   the architectures that we saw in the first five  lectures for the second new frontier that i'd like   to focus on and discuss in the remainder of this  talk it's this idea of how we can learn to learn   and i think this is a very  powerful and thought-provoking   domain within deep learning research and  it spawns some interesting questions about   how far and how deep we can push the  capabilities of machine learning and ai systems the motivation behind this field of what is now  called automated machine learning or auto ml   is the fact that standard deep neural network  architectures are optimized for performance on   a single task and in order to build a new model we  require sort of domain expertise expert knowledge   to try to define a new architecture that's going  to be very well suited for a particular task the   idea behind automated machine learning is that  can we go beyond this this tuning of of you know   optimizing a particular architecture robustly  for a single task can we go beyond this to build   broader algorithms that can actually learn what  are the best models to use to solve a given   problem and what we mean in terms of best model  or which model to use is that its architecture   is optimal for that problem the hyper parameters  associated with that architecture like the number   of layers it has the number of neurons per layer  those are also optimized and this whole system   is built up and learned via an a a  algorithm this is the idea of automl   and in the original automl work which stands  for automated machine learning the original work   used a framework based on reinforcement learning  where there was a neural network that is   referred to as a controller and in this case this  controller network is a recurrent neural network   the controller what it does is it proposes a  sample model architecture what's called the child   architecture and that architecture is going  to be defined by a set of hyper parameters   that resulting architecture is can then be trained  and evaluated for its performance on a particular   task of interest the feedback of the performance  of that child network is then used as sort of the   reward in this reinforcement learning framework  to try to promote and inform the controller   as to how to actually improve its network  proposals for the next round of optimization   so this cyclic process is repeated thousands upon  thousands of times generating new architectures   testing them giving that feedback to  the controller to build and learn from   and eventually the controller is going to tend  towards assigning high probabilities to hyper   parameters and regions of the architecture  search space that achieve higher accuracies   on the problem of interest and will assign low  probability to those areas of the search space   that perform poorly so how does this agent  how does this controller agent actually work   well at the broad view at the macro scale it's  going to be a rnn based architecture where   at each step each iteration of this pipeline the  model is this controller model is going to sample   a brand new network architecture and that this  controller network is specifically going to be   optimized to predict the hyper parameters  associated with that spawned child network   so for example we can consider the  optimization of a particular layer   that optimization is going to involve prediction  of hyper parameters associated with that layer   like as for a convolutional layer the size  of the filter the length of the stride   and so on and so forth then that resulting  network that child network that's spawned and   defined by these predicted hyper parameters  is going to be tested trained and tested   such that after evaluation we can take the  resulting accuracy and update the recurrent   neural network controller system based on how  well the child network performed on our task   that rnn controller can then learn to create an  even better model and this fits very nicely into   the reinforcement learning framework where the  agent of our controller network is going to be   rewarded and updated based on the performance  of the child network that it spawns this idea   has now been extended to a number of different  domains for example recently in the context of   image recognition with the same principle of a  controller network that spawns a child network   that's then tested evaluated to improve the  controller was used to design a optimized   neural network for the task of image  recognition in this paradigm of designing this   designing an architecture can be thought of as  neural architecture search and in this work the   controller system was used to construct and design  convolutional layers that were used in an overall   architecture tested on image recognition tasks  this diagram here on the left depicts what that   learned architecture of a convolutional cell in  a convolutional layer actually looked like and   what was really really remarkable about this work  was when they evaluated was the results that they   found when they evaluated the performance of  these neural network designed neural networks   i know that's kind of a mouthful but let's  consider those results so first here in black   i'm showing the accuracy of the state-of-the-art  human-designed convolutional models on an image   recognition task and as you can appreciate  the accuracy shown on the y-axis scales with   the number of parameters in the millions shown  on the x-axis what was striking was when they   compared the performance of these human-designed  models to the models spawned and returned by the   automl algorithm shown here in red these neural  designed neural architectures achieved superior   accuracy compared to the human-designed  systems with relatively fewer parameters   this idea of using machine learning using deep  learning to then learn more general systems or   more general paradigms for predictive modeling and  decision making is a very very powerful one and   most recently there's now been a lot of emerging  interest in moving beyond automl and neural   architecture search to what we can think of more  broadly as auto ai an automated complete pipeline   for designing and deploying machine learning  and ai models which starts from data curation   data pre-processing to model selection and design  and finally to deployment the idea here is that   perhaps we can build a generalizable pipeline  that can facilitate and automatically   accelerate and design all steps of this process   i think this idea spawns a very very  thought-provoking point which is can we   build ai systems that are capable of generating  new neural networks designed for specific tasks   but the higher order ai system that's built is  then sort of learning beyond a specific task   not only does this reduce the need for us as  experienced engineers to try to hand design   and optimize these networks it also makes these  deep learning algorithms more accessible and more   broadly we start to get at this consideration of  what it means to be creative what it means to be   intelligent and when alexander introduced this  course he spoke a little bit about his thoughts   on what intelligence means the ability to take  information using it to inform a future decision   and as humans our learning pipeline is definitely  not restricted to optimization for a very specific   task our ability to learn and achieve and  solve problems impacts our ability to learn   completely separate problems are and  improves our analytical abilities the   models and the neural network  algorithms that exist today   are certainly not able to extend to this point  and to capture this phenomena of generalizability   i think in order to reach the point of true  artificial intelligence we need to be considerate   of what that true generalizability  and problem-solving capability means   and i encourage you to think about this this  point to think about how automl how auto ai   how deep learning more broadly falls into  this broader picture of the intersection   and the interface between artificial and human  intelligence so i'm going to leave you with that   as a point of reflection for you at this  point in the course and beyond with that   i'm going to close this lecture and remind  you that we're going to have a software   lab and office hour session we're going to be  focusing on providing support for you to finish   the final lab on reinforcement learning  but you're always welcome to come   discuss with us ask your questions discuss with  your classmates and teammates and for that we   encourage you to come to the class gather town  and i hope to see you there thank you so much
Info
Channel: Alexander Amini
Views: 48,269
Rating: undefined out of 5
Keywords: deep learning, mit, artificial intelligence, neural networks, machine learning, 6s191, 6.s191, mit deep learning, ava soleimany, soleimany, alexander amini, amini, lecture 2, tensorflow, computer vision, deep mind, openai, introduction, deeplearning, ai, tensorflow tutorial, what is deep learning, deep learning basics, deep learning python, bayesian deep learning, evidential deep learning, deep evidential regression, adversarial attacks, graph neural networks, automl, generalization
Id: -boCMDouF2g
Channel Id: undefined
Length: 50min 46sec (3046 seconds)
Published: Fri Mar 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.