S3 E2 Stanford Prof Chelsea Finn: How to build AI that can keep up with an always changing world

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] Our Guest today is Chelsea Finn Chelsea is one of the world leading researchers in artificial intelligence and Robotics after completing her undergraduate at MIT she did her PhD at Berkeley where she was co-advised by Sergio Levin and myself from there she spent a year at Google and became professor at Stanford her PhD research made Pioneer contributions in robot learning meta learn few shot learning and won the highly prestigious ACM doctoral dissertation award as professor at Stanford she continues to lead the charge in artificial intelligence and Robotics research and she has won many awards including the MIT tr-35 innovator award the IEEE Robotics and automation Society early career award the onr young investigator award faculty awards from a wide range of companies including Microsoft Intel and Samsung as well as many best paper Awards aside from being one of my own favorite researchers turns out Chelsea is also a listener's favorite we have been asked many times to try to get her on the podcast Chelsea before diving into today's conversation I'd like to thank our podcast sponsors index Ventures and weights and Pisces index Ventures is a venture capital firm that invests in exceptional entrepreneurs across all stages from seed to IPO with offices in San Francisco New York and London The Firm backs Founders across a variety of verticals including artificial intelligence SAS fintech security gaming and consumer on a personal note index is an investor in covarian and I couldn't recommend them any higher weight and biases is an ml Ops platform that helps you train better models faster with experiment tracking model and data set versioning and Model Management they are used by openai Nvidia and almost every lab releasing a large model in fact many if not all of my students at Berkeley and colleagues at covariant are big users of weights and biases Chelsea so great to have you here with us welcome to the show thanks Peter happy to be here so glad to have you on um well Chelsea AI has made a lot of progress over the past decade training large neural networks on large amounts of data has enabled unprecedented capabilities in visual recognition speech recognition natural language understanding even decision making with beating the world champion and go playing video games and even robots learning some basic skills what can AI not do yet today that's a great question I think that there's been tremendous progress in machine learning and AI in some of the applications that you've mentioned but there are also still a lot of real world use cases where it's challenging to successfully deploy machine learning systems and one of the reasons for this is what's called distribution shift this is when the data that the system sees when it is deployed is a little bit different from the data that it was trained on so for example a robot that was trained to do a very simple task like picking up a cup in a lab environment if it's then asked to pick up a cup in a slightly different environment or pick up a slightly different cup that's a little bit different from what it was trained on and it won't naturally generalize to the new situation or another example that we can probably all relate to is spam filters in our emails the in our inboxes the emails that we get from spammers change over time and and spammers often actually try to actually evade these fan filters that are trained with machine learning and as a result the data is changing over um between when it was trained and between when it's actually being used and as a result it doesn't act as successfully as we would hope it to so so this distribution shift seems like it would be almost everywhere pretty much nothing is completely stationary in in the real world yeah absolutely so first I mean the real world is changing all the time and second we also often want to deploy systems in different parts of the world or in different situations so as a result of this um this kind of distribution ship comes up all the time in different real world applications and unfortunately we don't currently have great tools for actually tackling it so I think that one of the most common tools to use in industry is just to try to keep on training on the most recent data that you've seen but that ends up being fairly laborous requires labels for the most recent data and so some of my research is actually trying to address these challenges by trying to just improve the generalization of these systems and also allowing systems to just generalize to new parts of the distribution by being more robust immediately or making it possible for these systems to be very quickly adapted to new parts of the distribution either adopting on the Fly themselves or making it easy for practitioners to be able to have tool for kind of editing the the behavior of these models got it now when you say you're trying to dress this in in your research by making the models more robust or allowing for editing can you dive a bit deeper on that it sounds it goes well beyond just continuing to train there's some fundamental change to the models here you're proposing there are a few different techniques that we've been exploring and one of them is kind of some of them are targeted at certain kinds of distribution shifts so I'm one very kind one specific kind of distribution shift that comes up all the time is when models latch onto spurious features so um what this example of this is something where say you're trying to classify objects in an image uh oftentimes actually the object is highly correlated with the background so you might see you might be trying to classify between dogs and wolves for example and dogs will often more often be on like grass or something like that whereas wolves might more often be on snow backgrounds and oftentimes then we need to train a model on this kind of data it will actually pay attention to the background of the image in order to make the classification rather than paying attention to the actual object that you're trying to classify and so we've developed techniques that actually try to encourage the model to pay attention to the correct features of the image and not rely on these various attributes um that's kind of one set of approaches and then the second set of approaches we've been looking at is allowing them to adopt on the Fly more easily and there's a whole range of approaches that I guess I could talk quite a bit about um the some of them are more specific to Robotics and I guess some of them are are more specific to kind of General machine learning systems let's start with a more General machine learning ones yeah how does it work to tackle this problem yeah so let's see they're even within the general machine Learning Systems there's lots of different approaches but I guess one that I'll I'll talk about is we've been looking at um make it possible for a practitioner to quickly edit a model uh so one classic example that we've been bringing up is if you ask a a language system who is the prime minister of the UK uh if it is train trained uh just a few years ago it would probably tell you the answer is Theresa May uh and that's actually a few Prime Ministers ago and is quite out of date and we'd like to be able to make a targeted edit to this model without having to fine-tune the entire model without having to retrain the model because a lot of these models are actually quite large and fine-tuning them or retraining them uh first it can be computationally expensive and second it's somewhat of a blunt tool it will update all of the weights of the model and it might update the model in ways that you don't want to because there are other parts of the model that are doing just great and so we've been developing techniques that allow practitioners to make a very targeted edit to the model but actually basically train what we call a model editor that can edit a very specific function of the model without touching the other functions of the model okay that sounds pretty interesting can you elaborate a little bit what kind of data do you need what happens to the model yeah absolutely so the we've been taking a metal learning approach to this and we've essentially been collecting we collect a small data set much smaller than the kind of the days that the original model was trained on that gives it examples of how we want to edit the model uh so for example in the Prime Minister setting we may want to tell it that if I ask you who is the prime minister of the UK and tell you to edit it in a specific way then you should also edit the behavior of the question of who is the PM of the United Kingdom um or or other versions of that like is Theresa May the PM of the UK and once you have a data set that has these examples of edits and also the kind of uh examples that aren't related to that edit then you can train this small model Editor to be able to make targeted edits to the model and that model editor can take a couple of different forms one form that we've uh one approach that we've taken is to have the editor actually directly change the weights of the model with a a rank one update uh to the model I rank one update to each the way it makes it so we don't actually have to output something that's as large as the entire weight Vector of the neural network we can actually output something much smaller than it can expand it um another approach that we've been taking is a more non-parametric approach and that's something where we actually store the set of edits that we want to apply to a model and then we were given a new input we try to judge whether or not that input is in scope for any of the edits that we want to apply and if it is then we have a separate model that will take care of that new input and if it's not in scope then we'll just ask the ridge model interesting now when when you are considering things in scope out of scope it seems like that itself would have to keep changing over time too is there any way to somehow automate that process because now maybe next year something else has to go out of scope and something new has to come into scope yeah so predicting they're not an edit or in an input are kind of within this game same scope is actually a fairly simple problem it's just a binary classification problem to tell are these within the same scope or not um and so that's actually the beauty of that second approach is you actually just need to kind of maintain this memory of edits that you want to be applied to the model and that is fairly simple to do you can simply add things to that memory or remove things from that memory if if it's kind of no longer applicable as things are changing over time the other thing that you could do is uh and we actually haven't done this ourselves but it's something that we've thought about is you could actually if you're kind of if your bank of edits that you want to apply the model has grown too large you could actually kind of distill it into the original model uh with a fine-tuning like approach and then kind of flush that memory uh and start over with a um with a new set of edits and so in this process when you want to make an edit the edit is really really fast uh to the model you just need to add something to that uh that kind of memory bank of edits uh and the kind of scope classifier that that tells whether or not something is in scope and the other model that kind of handles the edits um kind of applies the edits those only need to be trained once for all of the edits that you you might want to apply you think you can possibly automate some of this let's say let's say you kind of mine the internet for more novel data right things that are time stamped at a later date than the thinkstep your model was trained on and somehow Maybe um say everything that's from a later time stamp from a reliable source if it contradicts something the model already thinks or knows it should be allowed to overrule it so it's kind of a different kind of training where the later things somehow get to overrule the earlier things yeah absolutely so one thing that we're actually currently working on right now is essentially allowing one of these models to try to read the news uh like you're mentioning like read things that are that are more recent from the data that it was trained on and then try to figure out what what has happened and what contradicts my current beliefs and then try to actually automatically generate the the edits that should be applied to the model and update the model accordingly uh and so that's something that we're currently working on I don't know of any approaches that can that can really handle that problem effectively right now but I think it's a really exciting Direction and something that should be be possible because people can kind of update their knowledge by reading the news as well very interesting now earlier you said that there are some methods for dealing with distribution shift that are more applicable and kind of General machine learning others that are more specific to robotics what makes what do you do in robotics that's different yeah absolutely so I guess first this kind of editing in principle we've been focusing on language models in that work but in principle could be applied to really any sort of machine Learning System including robotics applications um but in robotics you I think that there's something that is a little bit special that we can that sort of makes the problem easier in some ways so robotics is an extremely hard problem but one thing that's a little bit different from standard machine learning is the system is actually interacting with the world and it's making multiple decisions not just one decision and because of that it could actually collect a little bit of its own data uh both during training of course but also during testing and it can affect the observations in the data that it sees and so in robotic settings one of the things that we've been looking at is seeing whether the robot can adapt to a new situation on the Fly and that might mean maybe it kind of attempts the task once doesn't quite succeed but if it fails at the task that doesn't mean that it's done from there it can actually um from there can actually actually retry the task and try a new strategy on the Fly and this is something that people do all the time if you uh if you're trying to like put a key into a door to open your um to open your front door it might be a little bit tricky to get the key in on the very first try uh but that's that's no big deal if you can't you can just kind of try again with a slightly different strategy maybe push slightly harder or Orient the key slightly differently uh and that means that these systems they don't have to be in principle they don't have to be quite as robust when they're making these decisions if they can adapt on the Fly and so we've been developing approaches that um in robotics allow kind of the robot to try again and to try slightly different strategies maybe a priori it actually learns a couple different strategies for the task not just one strategy and second actually allowing it to adapt and that could mean that it takes a couple different trials um although more realistically uh these robots we want them to be able to adapt on the Fly autonomously and so that means they actually need to figure out how to retry and figure out how to retry from the state that they ended up in after they failed so that actually also introduces somewhat of a challenge as well it seems like if a robot were to fail as you said it'll introduce a challenge because it's now in a state it might never have encountered during training because during training maybe it mastered how to do this and now it has to somehow deal with any of a bigger challenge that it's in a very unknown territory to recover from to then do another attempt how do you deal with that yeah so this is actually a really huge challenge that I think is underappreciated in a lot of Robotics and machine learning so as a specific example of this say that you have a legged robot and maybe it was trained to kind of navigate um navigate a building or navigate a certain scenario to to get to some goal or to deliver something and then maybe something has changed maybe there's an obstacle on the way um or maybe kind of there's something on the ground that it didn't see before uh and at test time when it's in that new situation maybe it like flips over or maybe it kind of ends up on its side or ends up in a position that has never seen before this is really really challenging because if if it had never seen itself flipped over for example then now it doesn't have any data that tells it what to do when it's flipped over and it actually needs to learn and kind of collect data on its own and explore in the new environment in order to figure out what to do in that situation so first it's a really hard problem um but second is something that we've started to make some some strides towards so uh first is something where you really at test time you shouldn't just be running a policy you shouldn't just be executing uh what you learned during training um you need to actually be learning and updating the the kind of the neural network policy as you go uh and second you may also need to be doing some exploration so if you're in a very new state you may need to actually not just be kind of greedily acting with respect to the policy that you're updating but actually also taking somewhat exploratory actions and one kind of key Insight that we've seen so far is trying to take exploratory actions not just in general not just kind of taking random actions but taking exploratory actions towards the data that it has seen so far so trying to get back to the data that it's seen and that seems to work quite better A lot better than just trying to explore in general because then you'll hopefully uh move towards the data distribution that you know well and from there you should be able to kind of complete the task and continue to make more progress um watching a talk you gave about that just yesterday and I thought it was really intriguing there's a lot of work that has been done where human data is used as a prior to stay close to to encourage more interesting behaviors of the agents as they explore the world but it seems in this work what you're doing is you are actually looking at the agent's own past experiences and somehow saying staying close to your own past experiences should be good as you encounter new situations what's the intuition that it would also apply there yeah so I guess the agent's own prior experiences if it has learned the task before in the previous situation um that I think that the that's the kind of part of the data data distribution that where it knows how to act where it knows how to succeed and so staying close to the data distribution is generally a good idea because that's where um that's exactly where it was trained and so this is kind of a principle that we've been looking at for this sort of adaptation it's actually also just generally a um a more General principle so in this whole subfield called offline reinforcement learning where you want to be able to train a robot from an offline data set um generally the key principle there is also trying to stay within the data distribution that is that it saw during training and by doing so it's actually these algorithms are much more successful at being able to learn from offline data without any sort of online interaction and um yeah I think it's just generally a pretty successful principle for um for robotics kind of have a maybe stick taking one step back we want to go beyond what's seen in the training distribution ideally generalize Beyond um but fundamentally I guess I'm curious do you think we can even hope to generalize beyond the training distribution in the sense that how can we learn about things that we haven't seen um where where does the juice come from if you want to generalize beyond the training distribution yeah it's a great question first I'll say that there are some kinds of distribution shifts that are just impossible uh to handle especially if the system is only given one opportunity or one chance to do it uh or if maybe you're deploying a robot in a new scenario and it's a really safety critical scenario where it makes a mistake um and it kind of it causes a catastrophic failure there are definitely scenarios where it's simply impossible and I think that actually part of the research challenge is figuring out what scenarios are impossible and what scenarios are are kind of scenarios that we can make Headway in um so that's one comment um another thing that I think is worth noting is that in order to handle these kinds of settings we do need to go beyond some of the standard assumptions in machine learning the really common assumption in machine learning is that the data is kind of independently and identically distributed uh between training and testing and that we have samples from that distribution and a lot of the um the work that we've been doing on trying to make models more robust is introducing some additional assumptions like maybe you have uh maybe you know for example you maybe you have a medical imaging system and you wanted to deploy it in a new hospital that has never seen before and in your trading data actually maybe you know the hospital that each of the parts of the data distribution came from and in that scenario you can actually leverage the hospital labels to actually make your model more robust to handling a new hospital uh by actually for example training a model to be invariant to whether it's from hospital a or from hospital B and so we introduce these in new assumptions that allow us to actually improve generalization in very specific circumstances so that's a circumstance where if you go to new hospital um it could be still that that's impossible because the new hospital is um it does things in a completely different way from all the previous hospitals but um there are there is something that we can say about about the generalization like if the hospital isn't too different if it uh kind of comes from the same kind of meta distribution of hospitals as what we saw during training that we may actually be able to generalize beyond the trading distribution um so yeah first there are some things that are impossible second there are some additional assumptions that we can introduce um and then lastly like we've been talking about I I don't think we have to necessarily constrain ourselves to this train test Paradigm that's common in in a lot of machine learning if we actually continue to train during testing uh as we as a robot experiences more data or as we get a little bit of unlabeled data from a distribution for example then um then we may be able to kind of get beyond the fact that um we're seeing stuff that's completely new very interesting recently you uh I think coined this term I hadn't seen it before um single life reinforcement learning what is that yeah so this is actually like very similar to what we were talking about before so um essentially we've been thinking about how if we want a system to adapt um in a new environment at test time and it has some previous experience we want the system we want the robot to be able to learn on the fly in the new situation and we want it to be able to do so without interventions without any sort of kind of reset of the environment without a human necessarily providing um kind of giving help to the robot if it gets stuck or something like that we want to be able we want the robot to be able to do the task in kind of a single life or in a single episode and the other thing that's different about this is that typically in reinforcement learning you there's this trial and error process so the agent or the robot will attempt the task once it will then kind of get reset to some situation and then attempt it again and it'll repeatedly try to perform the task over and over again in order to eventually get a policy that's very successful uh and so the really the outcome of that process is this policy that can do the task uh and can reliably do so and this makes a lot of sense in like a factory setting where you want to be able to do the task over and over again however there are a lot of scenarios where you don't need a policy to be able to do it over and over again you just want the robot to do the task once uh and so maybe it has a lot of experience um doing a certain task like maybe um for example you want it to navigate a building in order to find something in the building um and it has some experience doing that in the past maybe you wanted to find something new or maybe something has changed about the building you don't need it to kind of repeatedly again kind of run this reinforcement learning process you just need it to be able to go into the building in this new scenario and retrieve the object once and so this is what we're calling single life reinforcement learning where it has one life or one episode in order to do the task once and its goal is to do it successfully and ideally as quickly as possible so as a single life but am I understanding correctly to single life refers to in some sense the evaluation time but before that it can do a lot of training to be ready for that single live single episode evaluation run sort of so the it is it does have to do with evaluation in the problem setting is we defined it we just said that it gets some prior data some prior experience some offline previous data it doesn't get to kind of um collect us like specifically try to collect experience um and I think that the other crucial thing is that at test time it's in a new situation that I haven't seen before and so it's going to have to learn uh and so it's not just that it's going to be deploying a policy at test time in this one life it actually needs to continue learning throughout the reinforcement learning throughout this kind of single episode or single life and um and this is pretty different from from the standard uh reinforcement learning setting you've done a lot of the leading work in deep reinforcement learning over the past several years and in reinforcement learning typical formulation is that an agent is requested to optimize a reward function and the more reward it gets collected by the agent the better the performance of the agent for example scoring the game or something task completion for a robot then I'm watching your talk at Carnegie Mellon just uh from a couple weeks ago and um I found you saying I actually don't like reward functions uh what do you wanna what do you want instead yeah so I I think that in the real world reward functions are extremely unrealistic uh the real world doesn't just tell you uh if you're doing well or not um for example like it doesn't tell me if I'm doing good research or it doesn't tell um a college student whether or not they're they're doing a good job uh maybe grades are a proxy for whether or not the student is doing a good job but they really aren't um kind of a a great indicator of whether whether they're kind of I don't know happy and doing a good job at handling everything that's coming at them and so and if we want to train a robot to to do something uh to pour water for example there's nothing that that um that tells them whether they're doing a good job of that either so I think that they're they're not something that actually comes naturally in the world and it's something that the reinforcement learning problem statement assumes um I don't think I fully know what the alternative should be I I think that task specification and generally communicating goals to robots is a really underrated problem in reinforcement learning in robotics uh because it's not even clear how like how it should be formulated even um if I were to use a reward function I think that kind of sparse01 reward functions like you succeeded or you didn't succeed is probably the best type of reward function because that is um kind of much less ambiguous uh fairly unambiguous compared to something that tries to evaluate a more fine-grained notion of progress um I also think that I could ultimately it'd be awesome if we could tell robots in natural language or with gestures uh how how we would like them to complete certain tasks uh and in the long run I think that that um kind of natural forms of of communication are um are the ideal way to communicate goals and tasks to robots and in fact you've done some work on that um where agents learn to understand what it means to open or close a door kiss say a bit about how that works yeah so we have some work um I guess we have a few different things on this but we basically have some work where if the robot has collected some data um you can use crowdsourcing to label the data and say what was the robot doing or if you were to tell the robot um it's kind of an instruction uh for this Behavior what would that instruction be uh and so something like that um I think crowdsourcing is a is a promising way to try to collect annotations for this because it's pretty scalable and then from there uh once you have those annotations then you can learn uh something that maps from a video to those annotations and you can use that to um essentially either learn some form of reward function or try to Simply maximize the the probability that uh that the label will correspond to the desired goal or the desired task so this can essentially be a way to derive reward functions without assuming that they're just given to you and then once you have that you can use your favorite reinforcement learning algorithms or planning algorithms in order to actually optimize for the behavior that you want it seems if we could have such a fully General reward function if this was trained on a very large amount of data maybe then then reward functions could become actually pretty pretty useful absolutely I think so yeah um one of the huge challenges with reinforcement learning that makes it this a little bit difficult is that the algorithms are going to be optimizing against them and so it's going to be trying to optimize for the parts of the space that the reward function that you just learned will give you a yes we'll give you success and this means that if there are any holes or any parts of the data that the reward function hasn't been trained on then the optimizer can often find it uh and so this this can make things a little bit tricky uh because maybe it's trained on a lot of data but it's not trained on all possible behaviors that the robot might try to do to trick it into thinking that it did the correct task so um they need to be quite robust that said there are also tools for trying to um kind of prevent this sort of exploitation of the reward function as well um like using ensembles to estimate uncertainty or something like that yeah I find this direction really interesting and I'm also pretty hopeful about this direction because for example in generative models for image generation Gans have been very successful at image generation follow a bit of a there's a bit of a similar two-player game there's a generative Model A discriminator discriminator corresponds to the reward in fact you wrote one of the early papers making that analogy in reinforcing learning with what happens in Gans and it seems like once you have enough coverage but maybe that's the challenge here once you have enough coverage uh the discriminator might be pretty good yeah so I think that these sorts of adversarial optimizations kind of two-player optimizations are quite a promising approach for this and we're using those kinds of approaches in our research as well um but as you said covering the entire space I think could possibly be intractable I think that there are just so many um so many possible behaviors that are out there I mean there's obviously all the behaviors that humans do but even that is only a very narrow set of the space because humans do things with a very specific purpose and when a robot is trying to learn something they'll be waving their arms out around in the air randomly um or be doing things that are just very strange they're very different from from what humans would ever would ever do so um as a result I think that covering that entire space can be very difficult this is also a reason why exploration is very difficult as well because exploration if you try to explore every possible behavior that is out there in the world then you may not actually ever end up on the one behavior that the kind of the one sequence of actions that will actually lead you towards uh towards the right thing and so I do think that we um I think that these sorts of adversarial optimizations are promising but I also think that it's worthwhile to try to investigate approaches that um that don't require you to cover the entire space um and so yeah we that's something that I always keep in mind when developing these kinds of approaches one thing I've been wondering recently I'm curious about your thoughts on this is Dad is it possible we're just not Trading long enough and the reason I'm I'm asking that is because if I look at our three-month-old son I mean he's not doing the most directed things in in the world he's already three three months in granted he doesn't have to because we'll provide food and it will provide everything for him but um it still seems like very few people have the patience to have their robot run for three months plus and still see essentially almost nothing all that adult meaningful happen and just keep going yeah absolutely um I have a lot of thoughts on that I I think that first um part of it is yeah we need to have a lot of patience I also think that a lot of reinforcement learning algorithms um do need a lot of supervision so reinforcement learning offers This Promise of being able to allow robots to autonomously learn things but in practice oftentimes a person needs to be there and reset the environment kind of back to where things were or if the robot knocked an object or off the table a human needs to kind of come and pick it up and arguably some of these things are things that parents do as well uh for kids um and so some of it is somewhat unavoidable but I also think that uh well my guess would be that the kind of the typical robot learning algorithm um often it needs actually interventions like every I don't know every 10 seconds or maybe if you're lucky every minute and I think that that's a lot more frequent than um than than parents do if we're to kind of continue the analogy and so I think that first we need more autonomous systems so that we can run for them for longer and so this is this is something that we've been working on a lot is improving the autonomy of the robot learning process and then the second thing is um I don't think we need to necessarily rely fully on online data collection uh and I think that the robot Learning Community is um has been for a while kind of stuck in this place where for every project we do we start from scratch we don't reuse any prior data we kind of collect a new data set for that project from this particular setup we learn on that data set and from this standpoint I think this is really problematic because you're never going to accumulate three months of data within the context of a single project uh there are if you have enough resources and have enough robots maybe you can paralyze it and do that to some extent but in general it's going to be pretty impractical and I think that if the computer vision Community had to recollect imagenet every paper that they wrote I think that it wouldn't have made nearly as much progress as what they do right now so I think it's really important to be storing data and actually having algorithms leverage previously collected data and build upon that and have that previously collected data that offline data be growing over time and ideally something that people are sharing across institutions as well and actually you've you have collected several of those larger data sets yourself and in collaboration with other institutions maybe you can list them out yeah so we've collected we've had some initial attempts at this so um one of our first attempts was um or I guess my very first attempt was when I was doing an internship at Google and I collected some data um of kind of a robot randomly pushing objects around in this gray bin uh I put the data out there and it was used a tiny bit for some kind of video prediction benchmarking um and then after that we expanded the data and collected some data at Berkeley um this is with Frederick Ebert on um with kind of the Sawyer robots um that was in both these data sets were kind of very pretty different from a lot of the existing data sets because they had lots of objects and lots of diverse interactions and they were they were quite large in size but they were still only on a single robot usually in a single environment and so then from there we uh kind of I had just moved to Stanford so we collected data at Stanford we collected data at Berkeley we collaborated with some folks at UPenn and we also combined it with the Google robot data um and so we had data from um at least four different institutions and across seven different robot platforms and that was our first attempt at really trying to share data across institutions uh and people have been using that data a little bit um but one of the kind of challenges we realized there was that the um it was all fairly random interactions interactions that were fairly hard to learn from as well you couldn't just use imitation learning with with the data um and so then more recently we collected um what we what we call the bridge data set where we're collecting data on these low-cost um these low-cost arms it was all demonstration data so it's all really high quality data um and it spanned multiple different environments although it was all collected um at Berkeley so this was in collaboration with Sergey at Berkeley um and that one I think is actually has made kind of significant stride sense robonet and then more recently I've been thinking that we really need um a larger effort to really get a much larger set of environments so the bridge data set had around that kind of order of 10 environments and if someone else wants to use it at a different place in a new environment being that generalizing from 10 environments to an 11th environment is going to be pretty hard and so I think we need something more like thousands of environments and so one thing we're working on right now is um we've kind of formed a kind of a coalition of researchers at several different institutions and we want to actually have people take home robots and uh take robots home and then collect data in um something closer to kind of a thousand different environments can people who are listening to this contact you and volunteer to take a robot home and collect some data possibly I think that um if they're in the air the area of one of the institutions that we're collaborating with then then certainly oh very cool now Switching gears for a moment um the model agnostic meta learning or mammal paper is one of the most highly cited papers in artificial intelligence it's very rare to have such a successful paper even for the most accomplished researchers like you do you recall how that paper came about and at the time did you think it would be such a big paper yeah so um it kind of came about uh through conversations with um with Sergey and we I don't fully remember all of the conversations um I certainly was frustrated that we were training robots from scratch for every task and I thought that the kind of initial kind of we thought that one initial way to do that is instead of trading them from scratch maybe we could kind of have some set of pre-trained weights and be able to quickly adopt those weights had a number of conversations about this around the idea and um I think that the I think that Sergey um I think the circuit was the one who proposed the kind of this initial technique for avoiding training from scratch actually in the context of key learning which is actually one of the one scenarios where mammal was actually really terrible at um and we figured out a really simple scenario that we could test it on which was um this kind of sinusoidal regression problem like a 1D regression problem really really simple and um I I think I was working at something on something else at the time and so I didn't really want to spend that much time on it but I spent about a day coding it up uh and running it and it seemed to work on the first try uh and so that was that was a good sign uh whenever anything especially whenever anything Works in research on the first try that's usually a really good sign um and so because it worked on the first try I decided to pursue it a little bit further tried it on Omniglot um and other fee shot learning problems the nails was spent um actually a lot of time trying to also extend it to some reinforcement learning problems um and then regarding your second question I knew at the time I was definitely very excited about it at the time I was pretty excited about it and more excited about it than some of my other projects and so I could tell it was something that was larger than my other projects uh and I can tell that it was something that seemed to work really well because it worked on the first try uh I don't think I necessarily knew how big it would be and there certainly have kind of since then be in projects that I've also been um kind of really excited about as well um although they haven't yet had that much time to have have impact um so I think that I could tell that it was going to be a successful paper and it was something I was really excited about it but I didn't know um I certainly didn't foresee people using it for all the different applications that I've seen people use it for I'm curious where several years uh since the paper was published right um and it was one of the early meta learning papers meta learning is learning learning to learn right um how do you personally see the trajectory of meta learning from the mammal paper to where it is today and maybe where it'll go in the future yeah so it's been a pretty in pretty introductory I um I also teach a course on metal learning and so and I try to keep that up to date uh every year the I think that in I guess in metal learning I think that there are really three broad classes of methods that have been kind of quite successful the first is kind of block box methods um this includes some very early work by Adam centauro at all um the Earl squared paper that Rocky was the first author of that you co-authored um and Jane Wong's paper um that was a kind of applying these rnns for metal learning second class of approaches is these optimization based ones that that kind of mammal um is a part of and then the third class of approaches is um things like prototypical networks and matching networks that are more um non-parametric in nature they try to kind of compare examples um and so those are really I think three of the most impactful I think um classes of works um since then I I mean certainly there's been like a ton of works that have tried to make these each of these classes of approaches better in different ways and um seeing that seeing all of those Works has been quite interesting uh and then there's also been works that have tried to analyze the use of metal learning in different applications we've seen that meta learning may not be that important in few shot image classification settings because it's really a lot about feature learning and less about actually learning to optimize um we've also seen that um and so yeah there was that way we also seen kind of a lot of different applications of this I think that one application that I've heard people being excited about although I don't know that much about is uh is drug Discovery uh where you want to be able to use this um on different possible drug candidates and adopt it with a very small amount of data for a New Drug candidate um we've also been using it in education domains as well where you want to adapt to new curriculum or adapt to new instructors or new exams um and then I think the most recent wave that we've seen a few shot metal learning um the kind of a Hot Topic is in the context of large language models and in context learning and I think that this is in some ways kind of a Revival of some of the Black Box approaches that we've seen where you um where you have kind of a neural network that's kind of learning from a few examples and the thing that's been different and new and exciting about this approach is that it seems like this few shot learning is emerging without explicitly training it to do few shot learning um you essentially train it on this unstructured data uh like Wikipedia and then it seems like it can do that without without explicitly setting up the training data in a certain way and so there's also been some work analyzing why that happens the scenarios in which that happens um so yeah I think that that's kind of a rough trajectory of where I've seen things there's also some pretty exciting work on learning optimizers as well uh and um and trying to learn optimizers optimizers that are very general and can like outperform Adam or something like that now I like the way you describe the evolution I remember at some point I gave a talk about meta learning and uh Jurgen Smith Hoover was in the audience and he essentially he he called me out like he likes calling people out and he he said look if it's only one it's only one time learning to learn it's not really meta learning it needs to keep recursing it needs to learn to learn from there to learn to learn to learn and and keep repeating this kind of Summer acceleration and I personally haven't really gotten any handle on how to do that and haven't seen as much work in that space either compared to what you're describing I'm curious about your thoughts have you seen anything on that front um I guess but first when you when you brought up Jurgen Smith I thought you were going to mention kind of the work prior to the kind of the evolution that I mentioned and I guess of course I should mention that kind of prior to these three classes of modern approaches there was also a lot of work um in the late 80s early 90s on metal learning um that actually introduced some of some somewhat similar ideas um although it's really looking at the ideas and less at um less at seeing some of the I think kind of exciting results that we've seen with the more modern approaches um but specifically for your question on this more recursive form of meta learning um I don't think I agree that it that that has to be a requirement for these algorithms I might refer to that more as like Meta Meta learning and Meta Meta Meta learning uh I I guess I haven't seen anything on that front that has been particularly exciting although I guess I'll mention a couple of things um one is that there was a work um I believe by Luke Metz at Google where they learned an Optimizer and then they actually showed that the Learned Optimizer could be used to optimize itself uh and that was something that was I think kind of cool and a little bit mind-twisting and um the second thing that I'll mention is I think that I sort of think that metal learning can be viewed as learning on different time scales where the few shot adaptation processes is really short learning um period and the metal learning process is at a slower time scale where you're actually trying to learn priors about the world and from that perspective I could certainly imagine something that learns at more than two time scales uh that learned that kind of different intermediate levels of time scales as well uh but I'm also not sure if there's a lot of practical value from um from from kind of going even more meta uh and I certainly haven't kind of directly observed um kind of practical value or uh myself from this sort of thing curious we look ahead what what's your vision for what will happen in the next let's say five to ten years in artificial intelligence it's a good question uh I think that I mean one trend of course has been that we've seen larger and larger models we've been seeing people train more and more General models and especially in large language models and generative models um and vision language models and I think that that has actually been a very exciting Trend uh for example the gpt3 paper was exciting for me because it showed something that was far more General than a lot of the other machine learning systems that we've seen that are pretty specialized for the tasks that they were trained on and so I expect to see that Trend somewhat continue uh to try to build larger models and see what they're capable of um I also suspect that the I suspect that data plays a really massive role in these systems and so I suspect that we'll or I at least hope that we'll see systems that um or techniques and tools for understanding how the data affects the system and tools for curating data sets and tools for understanding how understanding different parts of the data set and in robotics I think that we're actually in a regime and this actually relates to one of your earlier questions where I think we're in a regime where we're quite data limited right now I think we do need much more data in robotics I think we are are not training on nearly enough data because first we're generally training on data sets that are smaller than NLP and computer vision data sets and second I think that robotics is actually a lot harder than a lot of the classic NLP and computer vision problems requires a lot a lot of precision it is a very high dimensional system and so I think we're going to need more data than what we need in those other settings and so in robotics I hope to see a push towards larger data sets that are open source and released and a push towards improving the generalization of robots leveraging those data sets as well that definitely resonates now I'm curious where did you grow up what kept you busy as a little kid and how did that all get you into artificial intelligence yeah so I grew up in California about an hour east of where I am right now I really love California it's a place where you can do lots of Outdoors things all year round uh and yeah um yeah the weather is great uh I um as a kid I when I was younger I really liked doing jigsaw puzzles uh and yeah getting better at that and solving that um I also I did a lot of different things I I played trumpet uh and started doing that in fifth grade and was um was in the window Ensemble was in the kind of marching band um in high school and yeah I really liked making music um I also uh did some sports I played some soccer uh and I also did a fair amount of swimming and I swam competitively in high school uh as well um and then I also actually as a in middle school and in fifth grade I was on a Lego robotics team uh which was um trying to build these robots out of Legos and do a set of challenges within a very short time period I think it was actually like a two and a half minute time period that you got uh uh to have the robot do all these sorts of different things um and it was my first exposure to debugging uh I remember as kind of distinctly remember a moment in fifth grade where I went up to the coach of the team and I told her it's not working um and what like what should I do and she told me that I should figure it out and I had this kind of aha moment of like I can actually like I don't know what the solution is but I can actually instead of just asking someone for what the answer is I can actually try to think through it and try to figure it out um and so yeah that was a fun experience as well I didn't really at the time know that robotics would be a big part of my career uh but it was something that I enjoyed doing um apparently I also bossed around a lot of the the guys on the team I think there were five guys on the team and me uh and the coaches uh they never told me this uh they never called me this when I was there but apparently um the parents would all call me the general because I wouldn't tell everyone else what to do uh on the team um and since then I've tried to adapt my my leadership style to be a little bit more um kind of uh compassionate I think um yeah so those are some of the things that I did uh yeah when I was growing up yeah I'm curious how that you think like what triggered your I'm like and right now obviously you're very focused on AI um where do you think that kind of transition happened from what you described is a very wide range of activities that you were doing to really Zone in on AI as such a big Focus for you I started learning how to code maybe in eighth grade um and I really enjoyed that and I knew that I wanted to go into engineering because both my parents were in engineering and I got the sense that engineering was all about solving problems and puzzles and I knew that I liked trying to solve problems and puzzles uh when I got to college I basically applied to schools that I thought were good in engineering uh and I also really liked biology in high school and so I was considering biological engineering um but as soon as I got to MIT I got the sense that uh electrical engineering computer science was a department where you could you I mean first would be trained to do you get a really good engineering training and it's also just like a really great department at MIT but I also got the sense that it's something that you can you can it opens a lot of doors and it's something that you can take in many different directions once you understand computer science and electrical engineering and so um I knew that if I wanted to do bio later into the future I could do that with a CS degree uh or an eecs degree uh whereas if I did biological engineering I probably wouldn't be able to go into other paths and so that got me into from there it was kind of pretty sold and uh kind of pretty eager to do eecs at MIT and then from there um in terms of the classes that I was taking I was never really that interested in some of the more systems e-classes and I was really just fascinated by um by how you might teach a computer to see and how you might learn from data I thought that machine learning was really just seemed like a fascinating concept and also computer vision seemed like a fascinating concept and so I took three computer vision classes uh as an undergraduate most I think all of which were graduate level classes uh starting in my sophomore year and from there I did a little bit of research as well and yeah I got the sense that AI was just a really cool area it also involves kind of probability and in various forms of math which I enjoyed as well and found a little bit kind of intellectually deeper and then from there I um when I was applying to PHD programs and ultimately at visit days I got the sense that robotics was an area that I was particularly excited about because you were going all the way to like really the end system and the end goal which is to get a physical robot to do something and it's also something like that's really rewarding to see um if you could actually get the physical robot to do something well I I would agree on that front um that's a beautiful story no it's easy to forget but you're actually still very early in your career as in you're recognized as one of as one of the leading AI researchers but actually your PhD started about eight years ago you did it pretty quickly in about four years uh and now you're a professor for only three years and you've done so much already in so little time so I'm curious are there any productivity tips you can share um I mean my I think my biggest tip is to work on things that that um that you're excited about I think that at least for me if I'm excited about something and I find it interesting and I find it intriguing and I enjoy doing it then it's that it doesn't even feel like work it doesn't feel like I'm necessarily being productive it feels like I'm exploring something and learning something and doing something that I enjoy doing uh and so that's certainly how I felt about AI research um and coding up things like mammal and I still try to every once in a while cut up a small idea um so yeah I think that that's my probably my my number one thing uh is is yeah for me trying to to do things that I enjoy doing um doing something you enjoy allows you to bring a lot more energy to it right now one of the things that also stuck with me from your PhD is is that I think you went swimming pretty much every single day do you think that also improves your productivity overall yeah absolutely um so I yeah I really enjoy swimming as well and so I like doing things that I enjoy doing uh I I think that swimming does a number of things for me uh I in addition to just enjoying doing it and also it I usually swim with other people and so it's a social activity as well um but I find the exercise is uh very good at stress relief and so it helps me reset and if there's something that I am stressed about for whatever reason even if it's not a good reason um after I go swimming typically I'll be less stressed about it uh and that can help me focus on the things that that matter uh rather than dwelling on the things that I shouldn't be dwelling too much on um and then I also think that the I don't know being in good health in general um is is good and it helps me sleep I think sleep well at night I think that days I don't swim or the days that I don't exercise I usually have a harder time going to sleep um because I'm just less tired um and I suspect it also helps with energy levels although I don't really know um I haven't really done that many controlled experiments with it um I also don't drink any coffee or anything so yeah I don't know oh I guess the other thing that I'll mention is I I also um I also like try to sleep like eight hours a night as well I think that that was really important and in undergrad I think I was pretty good at sleeping um enough but I definitely found in grad school that if you don't sleep enough you can't like at least I didn't I found that my brain wasn't good at the work I needed it to do and so um and yeah making sure that you're treating your body well is is generally important there are so many students out there in the world that their hope is to join your lab for their PHD or maybe join your lab as an undergrad researcher or as a volunteer and so forth what do you look for when you recruit students yeah it's a great question I mean one thing that I look for is I mean it's good good there's some some amount of research experience experience and and the student is eager to do a PhD then that suggests that they can be successful at research in the future and that they know what they're getting themselves into as well um another thing that I look for or look into a lot is the letters of recommendation that they have uh if there's someone who uh who can say great things about them then that means that uh that they're probably a great person to work with uh and they're someone that um yeah a great person to work with and someone enjoys working with them and so forth um and then I guess one other thing that I I sometimes look for is um kind of eagerness to work with real robots as well I find that there's so many applicants that are really excited about machine learning um and there's a smaller set of applicants who have worked with real robots and are excited to continue working with real robots and also really get that same um that kind of equal uh that also find it very rewarding to to work with real robots and to see a real system work although I often find that students don't exactly know exactly what they want to work on during their um when they're applying and so I've had situations where I've had two students come in one I thought was going to be really excited about working with robots and one that I thought was really excited to do more algorithmic work and it turned out that it ended up being the complete opposite um the one I thought was going to do algorithmic work he actually ended up being really excited about working with robots and the other person she ended up being really excited about doing more algorithmic work so um it's yeah um those are some things uh it's also just really hard to evaluate applicants uh just by looking at um looking at kind of their their application and even like after talking to them for 30 minutes is also pretty hard so um it's a noisy process and so I also um I think it's good for students to apply to lots of different places because there are also just like so many really amazing uh Labs out there that are doing really cool work maybe one last question um do you have any advice for let's say high school students who are probably still too early in their uh career to to join a research lab typically what are some things they can maybe do on their own that could guide them into an AI career in the future yeah that's a great question I mean certainly when I was in high school I was swimming and uh in marching band and not thinking about AI uh and so I guess first I I don't I don't think students need to have it all figured out in high school uh it's good to have kind of a rough idea for what you want to do in a rough plan but also to understand that things will be a little bit different and things won't go to play go according to plan and so forth um and then I think that beyond that um I do think that there are yeah opportunities to learn a lot of stuff uh just online uh when I was in in high school I don't think that things like YouTube really existed at the time I certainly didn't didn't know about them at the time but now there's all sorts of resources online for um learning how to code or learning about um learning a little bit about machine learning or learning about probability and statistics and so forth and so um I guess the first thing that I would say is trying to learn the basics uh trying to learn how to program and trying to learn how to uh debug things make things and so forth and then uh from there I think that Beyond just learning the basics also if there are opportunities to do projects and get your hands wet and um or get your feet wet and uh actually try to build things and try to try to explore robotics teams I think are great um yeah I think that actually is trying to do projects and trying to take them on is um a really great way to to explore something because first you you'll learn a lot through the process of trying to build it like if you try to build a Spam classifier or something like that you'll learn a lot about the process that isn't necessarily taught in courses all the time uh and then you'll also learn this really useful skill of debugging and trying to solve problems on your own that I think will be extremely valuable in the long term thank you Chelsea this is such a a great conversation uh thank you for joining us yeah happy to do thanks so much for listening if you enjoyed this conversation just as much as I did please give us a thumbs up leave a comment put a rating it'll help other people find the show thank you

Info

Channel: The Robot Brains Podcast

Views: 5,071

Rating: undefined out of 5

Keywords: The Robot Brains Podcast, Podcast, AI, Robots, Robotics, Artificial Intelligence, LLMs, Reinforcement Learning, Meta Learning

Id: ZD15OtMbaNw

Channel Id: undefined

Length: 66min 54sec (4014 seconds)

Published: Wed Mar 22 2023