MIT RoboSeminar - Ben Recht - Trying to Make Sense of Control from Pixels

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Thank you, everyone, for coming. It's an honor to introduce Ben. Ben is an associate professor at-- of course, we have people coming in. Ben is an associate professor at Berkeley, at the easiest department at Berkeley. And Ben actually is a graduated from MIT. He's spent 2000 to 2006, I think, at MIT Media Lab. So should be proud of that. We should be proud of it. And then he spent three years at Caltech as a postdoc. You started your faculty academic career in University of Madison, Wisconsin and then moved to Berkeley, where you sort of been-- fundamental research in understanding the relationship between controls and machine learning, sort of looking at those feedback loops that happen or made happen when you have a system that is learning through experience. And he has received many awards. Out of which, maybe I'll just highlight one. In 2017 he got the test of Time Award at NeurIPS, which is awarded to a contribution from 10 years earlier, if I understand correctly. I don't know how many of you have read, and how many times you've read, the paper "Random Features for Large Scale Kernel Machines," which was awarded in 2017, written in 2007, a paper that clearly helped define a generation of people that worked in machine learning. And I would be curious maybe later to hear what are your thoughts of how things have changed since then-- how many people use kernel machines or how much have we sort of still embrace those lessons? But I think that the one thing that I really wanted to say today to introduce Ben is that he's an avid communicator. That he does an excellent effort in just communicating research in very different avenues. So he teaches many courses on optimization, on machine learning, or just statistical learning in Berkeley. He has given many, many, many talks. Like, if you look at the list of talks in his CV, you can spend like pages just going through, skimming through. It just gives you an idea of sort of his investment in communicating ideas. But that's not it, right? So he has a very followed Twitter account, in the order of like 11,000 followers, which helps create many debates in the community and start many discussions. And the last part is that he has a blog, where he sort of uses to explain some of his ideas and try to, at least that's my impression, try to write in very simple words complex ideas. Both from academic ideas, but both from a technological perspective or scientific perspective as well as humanistic perspective. So he has in his blog like you cannot serve two masters. The harms of dual affiliations. Or sort of trying to explain what it means for academics to have a foot here and a foot there and sort of his perspective. But maybe the most important one is a 14 part series "An Outsiders Tour of Reinforcement Learning," where he tries to explain that intricate relationship between the machine learning community and the controls community, both from a scientific perspective but also from a mom humanistic perspective. Like all the wars that happen instead of taking credit for great ideas and what do we do to merge to merge them and put them together. So I think that effort of trying to distill complex ideas and put them in simple words to sort of broaden the community that can access to those, to me, it is one of the most noble arts of an academic. And I think that Ben with his efforts and his dedication to that sort of embodies that like feel. So thank you very much for coming. Man, what an intro. [APPLAUSE] Alberto, thank you for a wonderful intro. That was really-- that was really kind. Yeah, I will say about my Twitter account, my number one rule there is never tweet. So I break the rule a lot. But, yeah, that's the goal. So today, so as Alberto mentioned, I've been interested in reinforcement learning probably for about half a decade now. It actually hasn't been a very long time, but now I've just realized that it's become much longer than I thought it would. It was this thing that-- there was a lot of excitement at Berkeley about this. There was a lot of things that seemed to be happening and this is something I kind of want to get my head around. Over that time, I really feel like we I learned a lot. I figured a lot of things about what are our challenging problems. And what I want to talk about today is something I just don't have a solution to. And so this is going to be a weird talk. I've given a few times. I'm still trying to figure out how to do it. Where its like, I want to tell you about stuff that I don't know the answer to at all. And I think it's really a pressing and challenging problem for us as a community. Today and tomorrow, there is a conference called LIDS@80 at 80 it's a retrospective history of the Laboratory for Information Decision Systems. Turns out, it's been named LIDS for 40 years. The original name was the Servomechanisms Lab, which is amazing. I really feel like maybe we need more labs with servomechanisms. In any event, but it's a crazy-- it's really interesting to look at the intellectual history of MIT and to see all of these things that have happened. And we just had a discussion which closed with Sanjoy Mitter suggesting that, perhaps what I'm going to talk about today, is one of the grand challenges in that space. So I was, like, amazing. And I should told Sanjoy to come, but anyway. Yeah, he's busy. But we'll see where that goes today. I feel like, to me, it was like, I felt better about the talk after hearing Sanjoy say that. I'm like, fine. Sanjoy doesn't how to solve it, that means that we have a real problem here. So this is joint work. It's been with a lot of great people at Berkeley. We've been-- Sorry. Go ahead, man. AV problems. Got it? Got it. AV problems fixed. So there's been a lot of interesting work with great people at Berkeley. It's actually a nice collaboration between Francesco Borrelli's group and my group and a bunch of really smart and enthusiastic students and postdocs. OK, so right. Why did people get excited about reinforcement learning? Well, let's pinpoint why. It was hard to kind of separate what was supposed to happen and not happen, I think-- when was this science paper? Does anybody remember? Someone said 2017, but that seems way too recent. I think it was earlier, right? It was like 15? 2015? Right around then? Why do we get excited? Well, one, because people thought go was hard. That's fine. Two, we're in the middle of a deep learning revolution. And three, so people thought you could use learning for anything. And I think three, people started to do something where they were taking-- they were really having a lot of success in fusing kind of complicated sensors like cameras into robotic systems. I think they were like three things kind of floating around each other. You wanted to figure out what part of this is actually the part we should be taking home? So, also to be fair, because there are people in the audience who know this, we've been interested-- reinforcement learning is not five years, old right? There is a long intellectual history of that as well, much of which had been done at MIT by various faculty. But I think for a long time, it wasn't part of a mainstream excitement. And then for some reason, about 2015 2016, reinforcement learning was going to solve every problem. And how we made that jump from niche subject to like the solution to everybody's problem is an interesting thing. Why did that happen? I'm not sure. The other thing that happened that I think never happened and I think is really challenging is that, while reinforcement learning works in games, which are beautiful closed environments with very well understood rules, in robotics, I think the wins were a lot less grandiose. Even though they would make it-- you could get places like DeepMind or OpenAI to suggest that we were a step away from artificial general intelligence. And we know that these robotic systems that we'd want to actually put out into the world, they have to be robust and safe before we're really going to actually put these in mission critical tasks. So one thing that we are also very interested in is how do you actually take-- what are the wins that we could take away that we can put into robust engineered robotic systems and autonomous systems? So my lab invested a bunch of time trying to just get at like what exactly is the thing that makes this work. Where is actually the kind of nugget of something that's different than things we already knew before? I'm not going to talk about a lot of that, so hopefully people maybe saw I've given a few tutorials on that perspective. Both on this blog that Alberto put pointed out-- I have a survey that tries to sum up some of these that was published in the Annual Review of Control, Robotics, and Automation last year. I gave a talk at Coral about it and to talk about [INAUDIBLE] which are both online, so I'm not going to talk about any of that stuff other than to say that once you start looking under the hood, a lot of this stuff doesn't work. Because a lot of this stuff doesn't work and the promises are a little bit grandiose. When you start to get down to it, you're like, what actually works? Why are people excited? I think the thing that made people excited was that they were using cameras with robots. That seemed to be the really salient thing. Even in go, right, at the end of the day, the main revolution in go-- like the big thing in go was realizing that I could take a picture. I could treat the board as an image and then just predict the next board. And that would be like three dan amateurs. And then from there you add tree search, you do all this other stuff, and all of a sudden you're up to nine dan. But the main-- that first insight was I could just treat this as an image and predict the next image, and that was already a huge leap over where people had been. So that was amazing. And then, of course, there was maybe less basic things that people could solve Atari. I actually don't think that's as impressive. But since then, people have done a lot of cool stuff. I got a picture from Alberto's lab here of being able to actually use vision in the loop for doing complex processing and sorting and grabbing and manipulation. So trying to have that be part of the sensing technology. And like I said, cameras are amazing sensors, right? They are-- you get millions of time series per second to process, which is wonderful. So instead of most, when you take a signals and systems course or controls course, when you say multi-input, multi-output you usually mean five, but here we have millions, OK? And it's a different kind of setting-- what multi means. And, I think, I just remember, and I was trying to come back through this. Like the thing that seemed to really capture this imagination is this idea of policies that would map pixels to actions. That was the thing that was exciting. The question is, what's the right way to do that? What's the right way to put pixels inside of complex feedback loops? And this is the question I don't have a good answer for. So I want to talk about some things we've been thinking about in this space, how we've been thinking about it. I don't think any of them are at all final answers at all. I mean, I'm not even sure they're first answers, but they're just trying to poke at a little bit what makes this hard and how actually can we start to make progress towards understanding it? So let me begin-- OK, let me get us to that question. Let me get to that question. So I usually use this silly cartoon in a lot of my talks where we say, OK, how do you actually go control a quad-rotor? OK, everybody knows you want to move the quad-rotor from point A to point B. You write down everything you know, which includes Newton's laws-- acceleration is the derivative velocity, Velocity is the derivative position, and F equals MA, right? These are the-- we write those down we write down a few other things about the actual geometry and shape and moments of inertia of the quad-rotor. And then from that, we maybe solve an optimal control problem. And there's a lot of work, both from the theory perspective-- and it's actually what I was hinting at. A lot of stuff that my group has done as well, trying to understand what's the right way to solve this problem, which is a Markov Decision problem, when I don't know the dynamics. And what I want to say today is I don't think that problem is actually that interesting. We've been doing a lot of work on it. I think a lot of the fun demos and [INAUDIBLE] like to approach things this way, but at the end of the day, I told you, it's Newton's laws. Most of these things you know. You get down to like-- there's parametric uncertainty. It's a real thing. But we know also how to tune parametric uncertainty. Now, the other thing is we have this crazy assumption here that I measure state. Right? And in all of these things, we measure state. And how many systems do you actually measure state, man? Really measure state, right? So the crazy thing is I'm going to go to a much harder problem. The question is, do need sophisticated learning in MDPs? And I would say, no. If you measure state, system identification is least squares. We spent a lot of time showing that actually just least squares is optimal for this. No matter what situation you put yourself in, you're not going to beat least squares if you're measuring the state. That problem's too easy. That problem's too easy and we've known that. Standard engineering works, so it has to be true. But it was kind of making that formal, I think, was important. Do you want to make this hard, is you say instead, I don't actually observe state, I observe a picture that's a function of the state. OK, now I have a weird problem. And all of a sudden I've gone through the case of an MDP to a POMDP. Because I have to figure out some way to go back-- I mean, I know people like to say, some people say, 20 megapixels and that's over redundant. So I have a complete copy of the state. I do not think that is helpful. You do not want to have 20 million dimensional states. That's too high dimensional and that makes control impossible. So we actually have-- most of the problems that we have, especially if you're working from images, are POMDP problems. To be completely fair, as soon as you have a state measured with noise, it is POMDP problem, because you have to filter to estimate your state. So POMDP he probably actually think are much more prevalent. And so, is pixel driven control actually well modeled by MDPs? I would argue no. You could, and people do when they do these Atari things, take a chain of these very highly pixelated small images and then make predictions. And you could do that, but for the most part, especially in robotic systems, we know all that state. We know what states are important to be able to do the kinds of things we want. Not always, there's exceptions to this, but like the challenging part is actually fusing some of these more these more complicated sensing modalities with that kind of state representation. And I can't say this enough. This is-- actually, Leslie is not here. She wanted to make sure I told everybody this again. So that like as soon as you have imperfect state information, you have POMDP, not a MDP problem, Leslie would go even farther than I would, she would say that if you have a game, that can maybe be an MDP and everything else is POMDPs. Everybody who knows Leslie knows that she would say that. I'm not putting words in her mouth. OK, so lastly, one other thing that I want to point out that is a challenge is that most people when they do these complex things with cameras, just take like a standard off-the-shelf deep net, plug it into their control system. and then assume that machine learning is going to help them, right? And actually, I think there is a really weird, subtle thing that happens once we start putting machine learning in feedback loop. And, actually, why I like this problem so much-- this particular problem of when you have noise in your state-- is that when you have perceptual sensors, is you see how quickly are IID view of machine learning breaks. Machine learning, hopefully everybody has taken a course machine learning, hopefully they tell you, the promises of machine learning and learning theory are actually super weak. I think we've solved them. I do think that machine learning is effectively solved. I think machine learning was effectively solved in 1970 and we can talk about that at the reception. I don't think anything really has changed from our initial view. But I think-- so the main difference between-- So what machine learning promises you is that if I sample a bunch of data from a distribution and then do something, minimize empirical risk, that I'm going to perform well if I sample from that distribution again and then evaluate. But there's something weird there. Like, how do-- as soon as you stop sampling from the same distribution, machine learning doesn't work anymore. And there's another line of work I'm not to talk about today where we've been studying how quickly those kinds of shifts can happen. And I think if you plug that paper-- I'm not going to talk about it today, but we have a paper where we created the test set for ImageNet and you see that even trying to recreate somebody's own rules for sampling from a distribution causes distribution shifts that have very, very large unexpected errors in the models that were trained on the one set. So these distribution shifts are real, and if you do something where you train some robotic policy using your vision system and then turn the vision system off and deploy without the vision, you're using a different policy. So some of the work-- I mean, work I really like by Chelsea Finn, Sergey Levine, Peter [INAUDIBLE],, Trevor Darrell, on doing-- what do they call it? Guided policy search. They actually train an optimal controller first using knowledge of where everything is and then they turn that off and just use the camera itself. And that is deploying now a policy that is different than what you were using when you were collecting the data. And so the camera is actually seeing something that are immediately outside of policy and you have to account for that. OK, so standard generalization actually does not work and we need something else. So we have some issues. One MDPs, which I spent the last five years thinking about, which is great, don't necessarily solve all the hard problems. They don't get us everywhere. And pixel-driven control in particular is probably not going to be best modeled by an MDP problem. I would also say that that first one was amazing to me. If we spent five years working on this-- we wrote all these hard papers. We bashed our heads against the wall. And what it looks like now-- there was a paper at the Last [INAUDIBLE] by Stephen Tu, Horia Mania, and myself, which essentially shows that for linear control systems-- the best MDPs for linear control systems-- the best thing to do is you take some data, you fit your model, and then you treat the model as true. Like, that's optimal. You could prove, I think-- I don't see Max or Dylan here. Max Simchowitz and Dylan Foster just proved this minimax optimal. And that's crazy. I mean, of course, it's what we've been doing since 1920. I don't know, a very long time. It controls. But it's interesting that from the perspective of MDPs, that problem is relatively straightforward. And all the extensions, the tabular cases, you see the same thing. To nonlinear things, we never have guarantees, but I'd be surprised if there was something that amazingly different there. But these other two things are big problems. How do we actually deal with this kind of imperfect measurement and how do we actually quantify errors? And so I want to just of look at a simple case today. Again, I think there's is a huge problems. As you say POMDP, everybody should be terrified. Everything is hard as soon as you move away from MDPs. And so I don't have good answers, but I want to talk about one relatively simple thing that we've done that we're looking at. Again, I do not believe it's an answer at all. I want to show you a demo of us trying to do it, which I'm amazed that it works and that's cool, and maybe where we go from there. OK, so let me give you an example. This demo will be at the end. Francesco Borelli, for his undergraduate course in vehicular dynamics, built this cool platform called the Berkeley Autonomous Racing Car System. It is-- what do we have on here? Well, we put a camera on it. They don't usually have cameras. It has a dumb IMU, some encoders, and an ODRIOD. It's a really cheap, janky remote control car. Autonomous driving. [CHUCKLES] There it is. It's nice that if it crashes into something, nobody hurts and nobody cares too much because-- Vicky, the student who's been doing most of the engineering on this, calls this car Oscar because it belongs in the trash. [LAUGHS] It's fine. It's fine. We love Oscar. He's been good for us. So let me give you a goal to do with this kind of car. It has very limited sensing capability and I'm going to actually cripple it. I'm going to tie its hands behind its back. I want to actually have this follow a demonstration and find the optimal way to trace a demonstration as fast as possible. Only given one demonstration and only using the sensor I wrote here. The camera, an IMU, and the encoders. OK, so no external-- no global positioning system. And so this is what that web camera sees. This is the Wozniak Lounge at Berkeley. And so it's from images like this and the wheel encoders and the IMU can we actually give-- get this thing to actually follow that trajectory and do something faster than the initial driving. And it's challenging because there's no depth and all the coordinates are relative. OK, so it's not-- I mean, I know you guys can do this. I'm not saying you can't do this. I want to say, what does this highlight? All right, I'm sure anybody here could give you that platform-- well, actually probably not that platform. You would build a better one. It's MIT. So, you would build a better car platform that would solve this problem. You would do it in a weekend. But still, let's just think about what's hard and what makes this problem challenging. And really, the reason why is we'd like to scale this to something more interesting and more large scale. Right, so Francesco does a lot of this stuff on real cars on a couple proving grounds, where they can actually learn to improve maneuvers in real cars using a variety of sensors. But they're usually much better, more accurate state estimators. There's also a bunch of people-- there's really great platforms. It's funny, this car is considerably more robust and heavy duty than ours. There's this really cool-- if anybody's visited Georgia Tech, they have a really cool dirt track where they race these things. They cost a lot more, so it goes. They have GPUs on board. It's nice. And then there's also-- there's great work done by Scaramuzza's lab at ETH where they actually do racing with quad-rotors. I think racing is one of these things where I don't actually think it's a real application-- it's fun. There's fun things that could come out of it and drone races are obviously going to be cool. But I think it has a lot of the character of a lot of robotics applications and it brings a perception in in a nice way as well. So I that's kind of why it's nice. And the videos-- their videos are better than mine, but that's fine. They have really cool videos, you get nice demos, and it highlights a lot of the issues. OK, so how do we model the abstractions of doing this kind of autonomous racing? Now, let me give you a bit of it. And we'll give one view. So one thing would be you have some unknown locally linear dynamics. I actually do believe-- in this case, we do have unknown dynamics. And the biggest thing being the tire forces being the hardest thing identify. You have some observation model, which is the camera as some other sensors, and then are our model is that you also have this thing that takes whatever comes with the camera and it gives you back a state estimate. It doesn't even have to be a state estimate, honestly. It just has to be a estimate of a linear function of your state. So, for example, it could just be position estimate without the velocity, right? Something like that. OK, so we get some kind of measurement. So we're mapping back to what would be-- this would be linear control after applying the perception, but we have this error term that's induced by the fact that we're using perception. And quantifying what the heck that error term is actually most of the problem. And it's weird. It's weird. If you look at the errors that come out your SLAM system, they're weird. They're a little bit hard to follow. Now we plug a-- we build a controller around that, and that is our-- that's kind of one abstraction for this problem. And in particular, today, let me just ignore-- I'm going to, at least for now, ignore the fact that we have to learn the dynamics too. We do it in practice. I don't really know how to think about those things together, but we'll get there. What's wild about the fact that we have to learn the dynamics is that you're never getting-- I mean, maybe it's not that wild-- you're getting an output. All your outputs don't have any-- there's no global grounding. They're just what your perception system tells you. So you have to learn the dynamics model from the output of your-- I'm saying SLAM. Could be a neural net, could be whatever the heck you want it to be. Random kitchen sinks. I don't care. Whatever is the thing that maps camera images back to these positions. OK, so we want to say, we're going to make the machine learning basically black box. This perception thing is our machine learning and I'm not going to talk much about what it does. We're going to try to quantify what comes out of it. And then we want to understand how we use it. If someone hands us this black box that they've tuned, it's either-- actually, we've looked at three things-- random kitchen sinks, somebody's neural network that, again, they've trained from images, or it's just ORB-SLAM. One of those three. And then I think all of those will give you some kind of position back. And then what does this tell us about how we do a learning and control co-design? So, in some sense, I think Russ would be thinking this immediately, he'd be saying, look, this is just output feedback. This has been studied for since the '60s. A lot of the development was here I learned this morning. I know that. I knew that already. But it's always fun to hear about the origins of robust control and how MIT was connected to them. And so it is just kind of we have a state that evolves according to time. We want to minimize some costs. In this case, it would be some cost to go around the track. We have some linear observation that is corrupted by noise and we want to build a controller. And so, if you wanted to do this, what's the problem? The biggest problem-- well, it's actually-- honestly, it's a little weird to be totally honest, right? Solving optimal control in this setting is a little weird. I think most people should know that LQG is much harder than LQR. Or is much less robust than LQR. It's also harder, fundamentally. You have this-- all the sudden, you have this thing where I just take my state and apply-- you have state feedback. Then when you do LQG-- what is LQG? LQG is just doing linear quadratic problems, but now I observe things through some Gaussian process instead of perfectly. At that point, now you have to build a Kalman filter. And then you take the output of the Kalman filter as true and you do state feedback. That's the optimal solution. It gets more complicated as this noise process gets more complicated. Obviously, things get more complicated, and the hard part here is just figure out what the heck this E is. We're quantifying how can we talk about the E. So here's a model. And this was actually the hardest thing for us and I'm still not happy about this at all. So once you even just decide that that's the problem, it's like, now how do I model the noise? So what I have here is that the EK is going to be-- now there's this h, which is my appearance function. I do not remember why I called it H, but fine. H lifts the-- just because F and G were taken, I guess. H is going to lift things up into the high dimensional space. So every state corresponds to some image. And then maybe that image is noisy and that noise depends on the state and maybe we have some other noise there. This is like too much crap, but it's fine. We can put all sorts of stuff in the appearance. And then your neural network, your ORB-SLAM, your whatever, maps you back into some function of the state. And the error is just how off are you from that CX. And we're going to say-- I shouldn't have this be equals. Well, no, I can have it be equals. This is fine. This is nice. Nice thing about adversarial noise models, it's some matrix times your state. So it's a linear function of your state. I don't know what delta is and that delta can be time bearing. But I just want to say it's some kind of map. It's some kind of state related quantity and then noise. Now, that's the adversarial part. So it is equal. So you can see that the balance in these two things actually is what makes it kind of complicated. I could put everything in eta. I can make something here. I have to make some assumptions about how all these things play together. And I think that the way you make those assumptions and the way that you quantify those assumptions is actually what makes this really challenging. So, being a control theory minded person, what I would say is that you have this idealized system, which we'll just take this error and give you a beautiful model and then the perception errors are going to come in. Instead of having this thing actually give us perfect state, we have this thing kind of introducing these perception errors and I'm just drawing it as a block diagram. I don't know if there are any control theorists in the room. Go ahead. Does the fact that this error is sort of when you're in your state-- I mean, time-varying is fine, but a lot of times I think of those errors being very correlated with what's a very time-varying SLAM system's going to be more accurate in certain parts of the world. Exactly. That's what the state-- that's what the-- No, but you're saying that I could actually index it by state. Yeah, but this is a linear relationship. Yeah. That's great. Is that Going to be critical in the treatment here or not? No, this is just going to be, I'm going to show you how we do it in this case. We're also-- we're not sure-- what we're trying to figure out right now and what we're trying we're trying to quantify is what's better. Because it's definitely true. You've done SLAM before. So it's definitely true that SLAM is not spatially homogeneous. And we're trying to actually quantify that and come up with a good model. I think that the next revision-- we have a version of this on archive. The next version of it is going to have a different error model to be totally honest. Because I think this is not as realistic as it could be. This is just trying to account for the fact that you go faster, you're going to have blur. That would do something like that. It's not going to account for the fact that you have spatial and homogeneity, which is definitely a real problem. But, do bear with me, because I think that the treatment, at least what we try to do, I think, does carry over. OK. So eta is like the part that we're just going to say is bounded. It's just bounded. And so that's why we'd like to have this thing that's state dependent and not have-- because if you just wanted-- you could just have one unbounded error. We're not-- again, these are the kind of things. I just want to show there are lots of choices with how you model the noise and actually how you fit that. And that's what we're still trying to figure out-- what the right thing to do is. I think spatial and homogeneity actually does seem to be bigger-- well, you'll see at the end. The complicated thing is blur is actually doesn't seem to be the issue with velocity. It's dropouts. I'll show you that at the end. Quantifying the error, the hard part, and then you can build some stuff around it. OK. OK, so here's what-- OK, I just want to give-- let's see. I don't have a ton of time, but let me give you a very high bird's level view of what we're doing here at the end of the day with this thing. What we came down to, again, is that we're doing control with these disturbances mapped in and we want to build robust control around it. And I have a way of doing robust control now that I've been trained by Nikolai Matni, who is now at Penn, to do. I like it a lot. I think that the way that these kind of errors that come from things machine learning propagate come through in a natural way. I feel like there's a lot of other places where this is not the right thing. For some reason the kinds of errors we get introduced by measurement error seem to be able-- we are able to handle them more transparently in this framework. I do not believe it's the only solution though. I just can't think about it better. Let me just at least tell you the way we think about it. You don't have to think about it this way. I just can't tell you how we think about it. Russ is skeptical, but it's fine. I like this view. So here's our classic view Also for me, it's like I get to be a convex optimizer again. The classic view of things. You have you have this abstraction box. I want to divine K. Everybody knows that K is just going to be some policy function, but, of course, this is immediately not convex. Even in the linear case it's not convex if you write things this way. This is known in robust control. And most of robust control is fancy tricks to make things convex in one way or another. I like this particular fancy trick. It's actually-- there are traces of this idea throughout the control literature. It's just not the most popular thing. But if you look at like Doyle, Francis, and Tannenbaum, they do this in the first chapter. Which is to say that I can actually think of the whole thing as a big system. When you talk about [INAUDIBLE] pose in this as an interconnection, you are thinking about the whole system and a global view rather than the local view. So in particular, if I have my controller be a linear time-varying, possibly time-varying function of the state, then the map from the disturbance to both the control and the state is some convolution. And so I can-- the system actually completely determined by that convolution. I never have to know anything else other than what those phis are. And so I could just think of the whole global system as some map that takes disturbances and maps them into these internal configurations. OK, so this is, again, not new, but what's nice about this is that you can actually-- and this is also not new. Again, it's in the first chapter of Doyle, Francis, and Tannenbaum's feedback systems. Is that now you can look at it like compositions of these maps to get out what the controller in the real world would be. In this case, and this is kind of obvious, if you take the map that goes from state to disturbance and then the map from goes to disturbance to output, that seems like the composition of those two should be your control. And it is. So it turns out it is. It's that you invert the thing that maps disturbance to state and then you multiply by the thing that maps disturbance to controller. And that actually is the realization that you would use in the world. And so what this lets us do is you take this original optimal control problem, where maybe you want to enforce some safety, maybe you have robustness issues, maybe you have uncertainty, and you map it into this kind of high dimensional but convex problem with these mappings. And so this is called systems level synthesis, because you synthesize the completion operators and then you take those out and you build the thing you would build in the real world. So it's kind of a nice way. It's lifting this thing into this higher dimensional space, working with essentially-- some people might-- this is essentially a juiced up version of what's called disturbance feedback. It's nothing new, but it's powerful. It lets us make everything convex and I think because of that. It lets us be very transparent for how the errors propagate. So what we do in the case when you have an output is, again, all we care about are these maps between the disturbances and the state and the input. And so your perception errors here would be eta. I'm lumping everything to eta now, just make my life easier. Your disturbances for your model are W. OK, so that would be your equation errors. And then I just have to make this matrix that maps these the services to X and U. And it turns out that you can get affine-- there's an affine set of things that are realizable that map to actual implementations. I could write those down if I know A, B, and C. And then I can, again, now it's more complicated, but I can construct out the controller from the phis themselves. And what's cool and what we use, and I think that there's lots of the reason why I like this again, is that when you want to be robust, what you could do is say-- instead of saying, I have a perfect A, B, and C-- Let's imagine I have A plus delta A and B plus delta B and C plus delta C. We were talking about delta C, this is why it's coming up. Now what you can do is you say, I just actually do synthesis treating this as true. Treat the A, B, and C that I fit as true and then account for the fact that I'll have a delta. And this is this beautiful thing I love about system level synthesis. In this case, we're only looking at C plus delta C. And turns out you get some new delta out. This is some operator. It doesn't really matter. But you have some term here. Multiply-- this phi hat is saying, let's take the models as true, even though we know they're false. And saying, how do I have to transform them to get the actual realization? When I go to the real world, is you end up multiplying by this map "I" minus delta. And if I can quantify the size of this delta, I can now quantify the suboptimality of this solution. And that's kind of-- that's why I like SLS, because you always get equations like this. And usually it's just by dumb linear algebra. And then now if I want to bound the norm of this, I bound the norm of this one times the norm of that one. And if I want to bind the normal 1 minus delta, we know how to do that. Or like sometimes you'll get 1 minus delta inverse. Again, you start to plate these kind of rules. So we're using this lemma from a paper by Ross Boczar, Nik Matni, and myself in this paper to deal with the delta C. And so, as I said, the actual map is "I" minus delta times phi hat times WN eta. And so now we can actually see how the trajectories that are realized, the true trajectories, are this. And so we can see how the true trajectories arise as the design system response, the actual noise, and these errors in perception. And the errors in perception are all delta and eta in this case, which is cool. OK, so now you're like, OK, I should be done. But this is where the machine learning comes in and this is where the generalization error comes in and this is what's weird. OK, so you take this thing and you have these perception errors. And you synthesize your controller using those perception errors. And now the question is actually how do you show that the new controller is actually going to respect everything, because I have perception errors that I have in-- yeah, Russ? [INAUDIBLE] The cost? What cost do you want? We do all-- I mean-- [INAUDIBLE] worst case? No, so we are actually doing-- [WHISTLING] See the skip slide? [CHUCKLES] I'm going to skip slide. Just for Russ, I had that ready. So there are lots you can do, and they really just depend on how you want to characterize the different disturbances. So typically the objective is related to how we think about the noise. So LQR tends to be things where you're thinking you have either sensor noise or some kind of natural stochastic process. You could do worse case. We could do L1. I actually think L1 is actually surprisingly useful in a lot of cases, because like for saturation limits, L1 seems to be like the right model. But we can do-- it's just a norm. You end up just having a norm and then you have to just propagate the error through it. So you can treat them all-- it's whichever one you feel most comfortable with. [INAUDIBLE] Yeah. Oh, for your cost? It probably does. So I think you have near both design decisions though, right? Which makes it aggravating, because everything now is coupled. Yeah. Where was I? Get back out of here. I was down here. OK, this did bring me to this idea of generalization. So right, the classic machine learning, I've mentioned this before, has this annoying property that like the generalization results on-- we rely on statistical arguments about closeness of the training and test. You assume the same distribution. And the thing is, as soon as you have a closed-loop distribution-- so you collected this data and open-loop somehow or the different sensor, I put things in closed-loop, I have a different distribution. And so you end up kind of moving from the something close to something far away. And I think the most important thing in our paper, and the only thing I think is probably really-- because we've done this in these different settings. Russ is asking about what costs we look at. The gentlemen in the front here is asking me about how actually model my perception errors. In both cases, actually, the way that we actually prove that we get suboptimal but not terrible control is to leverage this idea that I can actually build a controller that keeps me close to my data. I could build a controller to say-- so you know that if you put this controller out in the world, it's going to do something different. I'm going to see different data. So you can actually impose as a design and impose as a constraint I should stay close to my data, I should try to make the controller move into parts of space that I've already seen. Which is weird, but at the same time, that's kind of what we want to do. We don't want to be surprised. We don't want to be surprised. We would like to be boring. So the way that that actually pops up in our theory is that you end up having this thing which says control. You want to really make sure that the mapping from the sensor errors to your state. Which is something, again, this is one of the design variables, Those are small. And they're bounded by the things that depend on the noise. And, again, this is the part that will change. You change your homogeneous noise thing. Whatever is on the left hand side is going to change. Go ahead. I guess I have a question. Yeah. This is assuming what you see is depending on your state. But most of the things we worry about is like you saw something crazy in the road, it doesn't really matter if you're driving in the middle of the lane or on the left or on the right. There's a crazy thing there you can't control. That's right. That's right. I'm not talking about that problem today, but I do think that this gets you there. So if you build a controller that's designed to stay in the regime of things that you believe are true, and all of a sudden you get a spurious sensor measurement that doesn't map to what you saw before, shouldn't that be good signal? I haven't thought about it. I don't know yet. I don't know yet that's a great question. That is a great question because I totally agree. I totally agree. That's I think where you want to go, right? It's hard to simulate those things. And I don't think we want to rely on simulation to-- like, I really do not believe in this mindset that we just capture all edge cases by simulation and that will solve our robust control problem. And we know that's not true, right? Because Tesla had this thing where this guy drove his car under a trailer truck in Florida on a two lane highway when the truck was taking an unprotected left. And then two years later some guy drove his car in Florida under a truck that was taking unprotected left in Florida. It's like, you saw that edge case already, guys. That's kind of dark joke there because both of them died, but still, that's like, [CHUCKLES] anyway. OK, so let me now go-- So basically what this means is that now we're stuck with how we train these things. The training actually has to be done in this way where either you have a dense sampling of the space, so I could stay close to my sampling-- for racing we could probably do that as long as the track doesn't change. Or, again, for racing, imitation learning is a possibility, where you want to stay close to the things you've seen in previous laps, but you can improve as you move along. And, indeed, that's kind of the demo I'd like to show. Let me skip out of that again. I forgot which slide to go to. I'm going to skip it. These simulations are boring. This simulation is less boring. Let's go back to this one. This is actually like-- Here, we're trying to fuse a bunch of these things together. It's not perfect synergy. As everybody knows, right, the theory and practice our farther in-- well, actually, I don't know. Are they father in practice or in theory? We'll see. We'll see. But we're fusing a bunch of these ideas together and trying to bring these two worlds together. So in particular, for this car demo that we did, we built a single imitation. A single thing to track. And that actually was really important. We do see, as you try to move this thing away from your demonstration, that everything goes to hell. So we have one demonstration from a human. And then we use-- the way this works is used to laps to generate more data that I could follow. So in some sense, while the math isn't quite the same for our synthesizer here, you'll see in a second, we move to MPC, of course we do. Because you go from beautiful control theory to MPC, because of course we do. But, the lesson from generalization stays the same, which is that I start with data I get from a human and then the first step is to do something dumb and slow to get more data that I can stay close to. Once I have that, we implement what's called learning model predicted control, which was a brilliant idea by Ugo Rosolia that I feel like everybody should know, so I wanted to talk about it. It's like one of the most amazing reinforcement learning ideas that nobody in reinforcement learning knows. So I just wanted to talk about that very briefly. I'll show you how we do it. And we kind of are just at this point just using a smart data structure to stay close to the data. We are using ORB-SLAM here. We're recording more data to stay close to the data. And again, what you'll see in the second is, again, using previous data to stay close to the data. In all of these cases what we're trying to do is stay close to what we did before and allow ourselves a little bit-- inside that boundary doing a little bit to add improvement. So it really is just kind of imitation learning setup that I was talking about. Let me skip SLAM. You guys all know SLAM right? If you don't, we could talk at the reception. Don't want to go through SLAM. I didn't want to talk about [INAUDIBLE].. And why are we not using a neural net? I don't know. We couldn't. I don't care. I actually, OK, we're not using neural net, because, I don't know if you guys know Vicky E, graduate of MIT, if anybody has met Vicky before. She's amazing. She worked with Bill Freeman. Bill has induced some biases, so she doesn't like neural nets. So we'll blame Bill. No, actually, I don't have neural nets either. Also, I think people just-- SLAM is really good for a lot of things, so we'll just stick with that. Anyway, it works for a lot of things. So iterative learning MPC. I just wanted to tell you about this. Everybody should know about it because it's amazing. Standard MPC, hopefully everybody knows. You want to maximize reward subject to your dynamics. And what you do is you build this terminal constraint. Somehow, this magical terminal cost function that induces robustness and allows you to work on short time horizons and extrapolate to large time horizons. That's standard MPC. This Q function, your terminal Q function, you usually design. And there's lots of tricks to designing it. There are whole books about it. This is this kind of like-- the goal to make MPC really robust is how you pick that Q. Learning MPC learns the Q, learns the Q function. It is a Q learning algorithm, but does not look at all like standard Q learning. It's really beautiful. The idea is you let SS, you safe set, be the set of all the data you've ever seen before. Since we're doing an iterative task, we actually know what the value is, because you know how long it takes to get to the end of the task. So I can always have the value which is just the last trajectory you did if you saw that safe point before, because there's always stuff that's been previous to you on this track. OK, so what you say is I have to land in a safe point and the value is just whatever the value was there before. So I'm just constraining myself to land in a state I've seen before. But in between I can explore. And this a super weird. The exploration is now just saying, I give myself a horizon to explore, but I want to end up somewhere that I feel like is reliable. And that's kind of our area of learning MPC. It's such a brilliant idea. I love this idea. And it works really well. And it's like weirdly, this kind of Q learning is the opposite of standard Q learning, because standard Q learning explores in the Q. It tries to visit places-- it uses Q and says, where am I uncertain about Q and go to places that I haven't been before. Here, it's saying, no, you have to only go to high certainty places and I'm going to allow myself some exploration before I get there. It's just a weird turning on the head and we haven't been able to find a good connection in most reinforcement learning, including all of Dimitri's book. We had to look through it. OK, and so for autonomous racing, the cost that you have to get is actually really nice. It's just the time to get to the end of the track. That's pretty easy. You pay a penalty 1 if you're not the end. You pay a penalty and then 0 once you get there. That's the way that-- you want to minimize the amount of time it takes to get to the end. And I did want to-- I don't want to talk about this too much, but do I want to say that, while we can write down the vehicular dynamics, we never use them. We just fit-- and this is another part that I didn't understand, but we put it in there it so much better. Rather than having this nice model that we know, we fit all the tire forces and this complex interaction between the headings and the velocities and this other weird thing with the moment of inertia and governing the [INAUDIBLE] car, we fit that all. Just like look at the previous data we've seen before and fit locally linear dynamics. And then we'll take these guys as given. This is just conversion from local to global so that's fine. OK, linear [INAUDIBLE] OK. Oh, there's a video. Great, there's video. So here is the car driving in the lounge. I got to fast forward over one part in a second. All right, so this is the first demonstration given by the human. And this thing is actually annoying to drive, so it's slow. It's actually not as easy. And so this is what the dashboard looks like. Poor Oscar. I say so may mean things about him. This is the kind of thing that you're seeing. Obviously, there's a lot of good clutter in this room so it allows us to get a lot of key points for SLAM. You guys can probably all see where those are. Should have done another jump cut. There it is. Key points. Obviously, on all the table legs. By the way, that white rope in the middle was there for Vicky, because she's driving. It's not there for-- it is really just there to-- the car never sees it. You'll see it's never in the frame. So the white rope is there for us and for Vicky. It's actually not part of-- the car has no idea it's there. OK, and we had this first thing where it does the PID lap. This PID lap is super, super boring. I can't remember how long it takes, so I'm going to skip to here. Now we learn from previous lap data. Some parts of this video, I mean, if you guys want to stick around at the end, I will show it to you. But I wanted to show you here, these green dots are the things out in front that we think are safe. And those are the places that's trying to go. And the red heading is the trajectory that it picks to explore. So what you're seeing there is the green places are places that it thinks it can go. The red heading is how it's trying to optimize to get there. And it's stitching through-- the red that's going around here is the initial training data. And finally, [INAUDIBLE] I have to go. There's some more data. And I think right-- they're just going to let it go. And here's after 20 laps. Here we go. Got my cue. Now it drives much faster. And there, now you see actually that's trying extrapolate a lot further into the future. I think what also is fascinating here is that we gave it a certain amount of space it could drive. Like, we said you have to stay inside the-- we draw the red and then we draw a [INAUDIBLE] on the red, and it finds that actually it's much better to not-- if it wasn't just minimize the time, it's much better to create this weird ellipse-- well, it's not weird-- it creates an ellipse instead of the initial trajectory, which I think is-- It just learns it. And this is what the camera view looks like. Again, this is the main sensor. I think it's going to show-- there's Vicky. And there is-- now you'll see the dashboard camera, and as I mentioned before, watch how it flashes. So we're already starting to see the state dependents here. It's not like there's a blur effect, that you lose tracking effect. So try to actually quantify what's happening in the sensor is something that we're doing right now. I'm going to skip that one. And just in case you don't believe me, we did it in a different room. You could just go-- we did this in three rooms. Here's room two. Driving around. It does work in different environments let's just put it that way. We didn't just do one room. OK. [INAUDIBLE] Yeah, Russ is right. It does have a Kalman filter inside of it, yeah. So that's fair too. So Russ is saying it's not using the actual car dynamics. We didn't plug that in. But it does have this kind of that kinematic smoothing. That's Newton's laws. So that is a fair point. That's a fair point. I can turn it off though. If that would make you happier, we could do it with a turned off. I don't know. OK, so how do we sum this up? OK I do think there's something interesting about trying to understand the uncertainty of these perception maps as a visual sensor. And I think that the main thing that we found that's interesting is that I can get some suboptimality guarantees and I can get some kind of predicted safe execution as long as I don't try to be too crazy. And that the perception still will be a bit of a-- these sensors are not panaceas. We still kind of have to tie our hands, because, sadly, what we're handed from machine learning, is that I can only replicate that what I've seen before. I think knowing that and knowing a little bit about how to do the uncertainty quantification, does allow us to use these sensors. So how we go beyond that, I'm not sure. We have this one idea. We like this system level synthesis idea. What's been interesting in the group is we've figured out kind of nice ways with the same framework to kind of include all of these things. Although, it starts to-- if have all of these things maybe we shouldn't be deploying a robot. But in different ways, I do think that the hard part, and we're still nowhere near close, is really understanding what the right way to do this last one is. I feel like the uncertain dynamics and safety constraints, these are things we can handle. These are things we can do. The perceptual sensing is a huge one. And also dealing with the fact that your perceptual sensor is supposed to be designed, as we pointed out-- as my friend here in the front has pointed out, is supposed to be designed to also be able to give us error signals. So as they're both-- I mean, most people don't use cameras to guide their low level control. That's kind of insane. We're doing it though. That's fine. Let's just see what happens. That's what we did today. But we do use cameras. We're supposed to use them for detecting static objects that possibly shouldn't be in our scene and moving them out or getting out of their way. And I don't think we're anywhere really towards understanding how to integrate the low level control with those kind of detections in a safe and reliable way. But that's why life is exciting. That's why we have lots of things to do. I just close one more thing, which is a plug for the fact that many of you may have came to learning for decision-- sorry, Learning for Dynamics and Control 2019, Learning for Dynamics and Control 2020 will be in Berkeley. I'll give the plug. We're going to take-- we're going to take contributed submissions this time. So if you have something that you think is cool that you would like to share, please consider sending it. The deadline is not November 15th. Oh, I forgot to fix that. I should fix that on the fly. Let's say it is December 6th. Sorry. Man, that was an ambitious early version. Things take longer. But, yes, they're just six page papers. They're extended abstracts. We're going to have the best ones. The ones we like the most. We'll get orals. And then everybody gets a poster and we're excited see how this goes. All right, with that, I'll stop. Thank you very much. [APPLAUSE] Any questions? Two questions about the error. So you have the term C times x plus eta. So first, your system is nonlinear. So you're basically controlling a local time variant. Yeah. So when you just-- your x is large. First, your nonlinear terms are also crammed into place. That CK is a second order term, so why did you consider that to begin with if your x is large, the non-linear terms are also in error? And my second question is, so you talked about eta is a bounded error. Is it like a-- do you consider like [INAUDIBLE] or is it-- I mean, if it's coming from a neural network, is it something more-- weird. So I know other people have asked it. So when it comes from a neural network, what is the error? Let's start there. Does anybody know? Nobody knows. So I think this is actually a great question. Honestly, even-- I'll say this-- even with ORB-SLAM or just normal SLAM methods, what the errors look like under Gaussian assumptions they're kind of easy to write down. Under real assumptions, when you kick them out of a camera, they don't look Gaussian. And they're definitely not spatially homogeneous. So, actually, I think quantifying what the heck's perceptual sensors are doing is super, super interesting and super hard. I think this is kind of like their next step of what we do with these things that are-- qualifying them with data itself is also really hard. It's something I would like to do. With regards to non-linearity question, I will say that we're trying to account for part of that and the delta of x. The other thing we're doing, right, we're making these local linear assumptions. So it's not like we're linearizing around equilibrium. In the car, what you saw, we would only do a linearization over the last two laps. So we're only linearizing at the higher speeds. Because, otherwise, you're right, I mean, the errors get really huge. Yeah, go ahead, Russ. So if you-- I'll repeat the question. OK, that's fine. All right. So your ORB-SLAM, it's like you're taking and sticking a Kalman filter in a priori. So if you took an LQG problem and you imposed your perception system as your Kalman filter and you did SLS on that, would you get K? Would you get LQR out? That's an awesome question. OK, so the answer is-- OK, sorry, let me put this way. If you do SLS and you solve LQR in the standard thing-- sorry, if you do LQG and you solve it using SLS you do recover the separation principle. This is with you imposing the Kalman filter. No, without imposing the Kalman filter. The analogy here. Quadratic cost and Gaussians. Right, right, but here the analogy is you've written in the Kalman filter. That's your ORB-SLAM, right? [INAUDIBLE] Sorry, what was the question? You've a priori constructed your Kalman filter-- Right. --into the-- That's a different question. So now, we know that the separation principle, which is like what everybody does, it's what everybody does and it's cool. Right, it was a brilliant idea. Who do we attribute that to? I always forget. Simon. Simon? OK, so but the idea is and it's-- for LQG? [INTERPOSING VOICES] But for-- Kalman. Fry Kalman, right? I was going to say maybe [INAUDIBLE],, but, OK, anyway, it's old. But the idea is that, right, so, amazingly, the structure goes filter, Kalman filter, and that optimal solution is do a Kalman filter, treat the output of the Kalman filter as true, and then do state feedback. And that's a miraculous thing that happens. In most systems, we do that anyway. We build a filter that gives a state estimator. We treat the output of the state estimator as true and we do some kind of optimal control around it. Right, kind of what we do. We can, using SLS, bound the suboptimality of that. So we can incorporate that model. We're not doing that in the demo. But you're asking SLS to solve an output feedback problem here. And It does not give you a state. It does not give you-- It gives you time-varying, whatever, crap out, right. The K that's coming out of SLS is a more complicated thing. It's much more complicated. Well-- But it should, in your setup, it should converge the K from LQR if that was-- It does if you have a quadratic. If you have the H2 cost. If you have a quadratic cost and you're assuming Gaussian noise, that is what SLS gives you. But if you don't-- so you put an L1-- you get a different solution out. You get a different solution. It's still something you can implement and it will be some kind of filter design, but it's more complicated. [INAUDIBLE] Yeah, K is not a matrix. Yeah. My question, how does the-- if we want to do alpha feedback at K [INAUDIBLE] these phis, like the whole history [INAUDIBLE],, how does that relate to building like a safe set [INAUDIBLE] on having state. Like, I can't [INAUDIBLE] history of [INAUDIBLE].. So how do I connect those two? First question. OK, the question is in Q-learning. Q-learning comes from NDP land which depends on states. So how did you actually use Q-learning ideas when you have outputs? Great question. My question to you is why can't you. I had [INAUDIBLE]. Oh, good. That's great. That's great. So, OK, we can come back to that, because maybe I wasn't clear. The first part of the talk in the second part of the talk are connected yet. And I think that's the really important part. [LAUGHTER] What connected them? What connected them? [INAUDIBLE] We were trying to figure that out because-- You were asking. Sorry, I didn't say it clearly. The only thing that connected the first part and the second part, really, was the fact that the only thing we could do is imitation learning. And I that it actually informed the design, because we got stuck a bunch of times here trying to do this. Really, the only thing that kind of connects those two and the thing I'm really trying to connect together, the theory says all that should work is either you completely density sample any kind of point of existence. Which, to be fair, is the Elon musk model. Or it's also the Waymo model, to be really fair. So Waymo and Elon Musk say we map everything. Dense complete coverage mapping of everything, all weather conditions, all animals, all obstructions. Or we imitate things that we've seen before. That's what the theory said. That's the only thing that we could bound and we could construct examples where you don't do that and everything goes to hell. And that's how we kind of-- that actually led to the engineering design of this system. Was that we tried to put everything in as close to an imitation learning setting as possible. Now how do I actually start to connect everything back together, we'll get there. We're not there yet. So sorry, I should-- that was kind of the point of this slide. The only thing we know how to do is imitation learning. And actually, if you look at all the other racing examples I gave you, that's all they do too. So, there is something. I mean, there might be something there. But again, I don't have the answers yet. Any other questions? If not, I guess we'll continue in the reception. OK, great. Outside. Thank you very much. [APPLAUSE]
Info
Channel: MIT Robotics
Views: 1,311
Rating: 4.7037039 out of 5
Keywords: MIT, robotics, seminar, talk
Id: o7qENRDB2ug
Channel Id: undefined
Length: 62min 6sec (3726 seconds)
Published: Tue Nov 05 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.