Thank you, everyone, for coming. It's an honor to introduce Ben. Ben is an associate
professor at-- of course, we have
people coming in. Ben is an associate
professor at Berkeley, at the easiest
department at Berkeley. And Ben actually is
a graduated from MIT. He's spent 2000 to 2006,
I think, at MIT Media Lab. So should be proud of that. We should be proud of it. And then he spent three years
at Caltech as a postdoc. You started your
faculty academic career in University of
Madison, Wisconsin and then moved to Berkeley,
where you sort of been-- fundamental research in
understanding the relationship between controls and
machine learning, sort of looking at those
feedback loops that happen or made happen when
you have a system that is learning through experience. And he has received many awards. Out of which, maybe
I'll just highlight one. In 2017 he got the
test of Time Award at NeurIPS, which is
awarded to a contribution from 10 years earlier, if
I understand correctly. I don't know how many
of you have read, and how many times you've read,
the paper "Random Features for Large Scale Kernel
Machines," which was awarded in 2017, written in 2007,
a paper that clearly helped define a generation
of people that worked in machine learning. And I would be
curious maybe later to hear what are your
thoughts of how things have changed since then-- how many people
use kernel machines or how much have we sort of
still embrace those lessons? But I think that the
one thing that I really wanted to say today
to introduce Ben is that he's an
avid communicator. That he does an excellent effort
in just communicating research in very different avenues. So he teaches many
courses on optimization, on machine learning, or
just statistical learning in Berkeley. He has given many,
many, many talks. Like, if you look at the
list of talks in his CV, you can spend like pages
just going through, skimming through. It just gives you an idea
of sort of his investment in communicating ideas. But that's not it, right? So he has a very
followed Twitter account, in the order of like
11,000 followers, which helps create many
debates in the community and start many discussions. And the last part is that he has
a blog, where he sort of uses to explain some of
his ideas and try to, at least that's
my impression, try to write in very
simple words complex ideas. Both from academic
ideas, but both from a technological perspective
or scientific perspective as well as humanistic
perspective. So he has in his blog like
you cannot serve two masters. The harms of dual affiliations. Or sort of trying
to explain what it means for academics to have
a foot here and a foot there and sort of his perspective. But maybe the most important
one is a 14 part series "An Outsiders Tour of
Reinforcement Learning," where he tries to explain that
intricate relationship between the machine learning
community and the controls community, both from a
scientific perspective but also from a mom
humanistic perspective. Like all the wars that happen
instead of taking credit for great ideas
and what do we do to merge to merge them
and put them together. So I think that effort of
trying to distill complex ideas and put them in simple
words to sort of broaden the community that can
access to those, to me, it is one of the most
noble arts of an academic. And I think that
Ben with his efforts and his dedication to that sort
of embodies that like feel. So thank you very
much for coming. Man, what an intro. [APPLAUSE] Alberto, thank you
for a wonderful intro. That was really--
that was really kind. Yeah, I will say about
my Twitter account, my number one rule
there is never tweet. So I break the rule a lot. But, yeah, that's the goal. So today, so as
Alberto mentioned, I've been interested in
reinforcement learning probably for about half a decade now. It actually hasn't
been a very long time, but now I've just realized that
it's become much longer than I thought it would. It was this thing that-- there was a lot of excitement
at Berkeley about this. There was a lot of things
that seemed to be happening and this is something I kind
of want to get my head around. Over that time, I really
feel like we I learned a lot. I figured a lot of
things about what are our challenging problems. And what I want to talk
about today is something I just don't have a solution to. And so this is going
to be a weird talk. I've given a few times. I'm still trying to
figure out how to do it. Where its like, I want to tell
you about stuff that I don't know the answer to at all. And I think it's really a
pressing and challenging problem for us as a community. Today and tomorrow, there is a
conference called LIDS@80 at 80 it's a retrospective history of
the Laboratory for Information Decision Systems. Turns out, it's been
named LIDS for 40 years. The original name was
the Servomechanisms Lab, which is amazing. I really feel like maybe we need
more labs with servomechanisms. In any event, but it's a crazy-- it's really interesting to look
at the intellectual history of MIT and to see all of these
things that have happened. And we just had a discussion
which closed with Sanjoy Mitter suggesting that, perhaps what
I'm going to talk about today, is one of the grand
challenges in that space. So I was, like, amazing. And I should told Sanjoy
to come, but anyway. Yeah, he's busy. But we'll see where
that goes today. I feel like, to me, it was like,
I felt better about the talk after hearing Sanjoy say that. I'm like, fine. Sanjoy doesn't how to solve
it, that means that we have a real problem here. So this is joint work. It's been with a lot of
great people at Berkeley. We've been-- Sorry. Go ahead, man. AV problems. Got it? Got it. AV problems fixed. So there's been a lot
of interesting work with great people at Berkeley. It's actually a
nice collaboration between Francesco
Borrelli's group and my group and a bunch of
really smart and enthusiastic students and postdocs. OK, so right. Why did people get excited
about reinforcement learning? Well, let's pinpoint why. It was hard to kind
of separate what was supposed to happen
and not happen, I think-- when was this science paper? Does anybody remember? Someone said 2017, but
that seems way too recent. I think it was earlier, right? It was like 15? 2015? Right around then? Why do we get excited? Well, one, because people
thought go was hard. That's fine. Two, we're in the middle of
a deep learning revolution. And three, so people
thought you could use learning for anything. And I think three, people
started to do something where they were taking-- they were really
having a lot of success in fusing kind of complicated
sensors like cameras into robotic systems. I think they were
like three things kind of floating
around each other. You wanted to figure
out what part of this is actually the part we
should be taking home? So, also to be
fair, because there are people in the
audience who know this, we've been interested--
reinforcement learning is not five years, old right? There is a long
intellectual history of that as well, much of which had been
done at MIT by various faculty. But I think for a long
time, it wasn't part of a mainstream excitement. And then for some
reason, about 2015 2016, reinforcement learning was
going to solve every problem. And how we made that
jump from niche subject to like the solution
to everybody's problem is an interesting thing. Why did that happen? I'm not sure. The other thing that happened
that I think never happened and I think is
really challenging is that, while
reinforcement learning works in games, which are
beautiful closed environments with very well understood
rules, in robotics, I think the wins were
a lot less grandiose. Even though they would make it-- you could get places
like DeepMind or OpenAI to suggest that we
were a step away from artificial
general intelligence. And we know that these
robotic systems that we'd want to actually put
out into the world, they have to be robust and
safe before we're really going to actually put these
in mission critical tasks. So one thing that we are
also very interested in is how do you actually
take-- what are the wins that we could take
away that we can put into robust engineered
robotic systems and autonomous systems? So my lab invested
a bunch of time trying to just get at
like what exactly is the thing that makes this work. Where is actually the kind
of nugget of something that's different than things
we already knew before? I'm not going to talk
about a lot of that, so hopefully people
maybe saw I've given a few tutorials
on that perspective. Both on this blog that
Alberto put pointed out-- I have a survey
that tries to sum up some of these that was
published in the Annual Review of Control, Robotics,
and Automation last year. I gave a talk at Coral about it
and to talk about [INAUDIBLE] which are both
online, so I'm not going to talk about
any of that stuff other than to say that once you
start looking under the hood, a lot of this
stuff doesn't work. Because a lot of this
stuff doesn't work and the promises are a
little bit grandiose. When you start to get
down to it, you're like, what actually works? Why are people excited? I think the thing that
made people excited was that they were using
cameras with robots. That seemed to be the
really salient thing. Even in go, right, at
the end of the day, the main revolution in go-- like the big thing
in go was realizing that I could take a picture. I could treat the
board as an image and then just predict
the next board. And that would be like
three dan amateurs. And then from there
you add tree search, you do all this other stuff,
and all of a sudden you're up to nine dan. But the main--
that first insight was I could just
treat this as an image and predict the
next image, and that was already a huge leap
over where people had been. So that was amazing. And then, of course, there
was maybe less basic things that people could solve Atari. I actually don't think
that's as impressive. But since then, people have
done a lot of cool stuff. I got a picture from
Alberto's lab here of being able to
actually use vision in the loop for doing
complex processing and sorting and grabbing
and manipulation. So trying to have that be part
of the sensing technology. And like I said, cameras
are amazing sensors, right? They are-- you get millions
of time series per second to process, which is wonderful. So instead of most, when you
take a signals and systems course or controls course,
when you say multi-input, multi-output you
usually mean five, but here we have millions, OK? And it's a different kind of
setting-- what multi means. And, I think, I just
remember, and I was trying to come back through this. Like the thing that
seemed to really capture this imagination is this
idea of policies that would map pixels to actions. That was the thing
that was exciting. The question is, what's
the right way to do that? What's the right way
to put pixels inside of complex feedback loops? And this is the question I
don't have a good answer for. So I want to talk
about some things we've been thinking
about in this space, how we've been
thinking about it. I don't think any of them are
at all final answers at all. I mean, I'm not even sure
they're first answers, but they're just trying to poke
at a little bit what makes this hard and how actually can
we start to make progress towards understanding it? So let me begin-- OK, let me get us
to that question. Let me get to that question. So I usually use this silly
cartoon in a lot of my talks where we say, OK,
how do you actually go control a quad-rotor? OK, everybody knows you want to
move the quad-rotor from point A to point B. You write
down everything you know, which includes Newton's laws-- acceleration is the
derivative velocity, Velocity is the derivative
position, and F equals MA, right? These are the-- we
write those down we write down a few other
things about the actual geometry and shape and moments of
inertia of the quad-rotor. And then from that, we maybe
solve an optimal control problem. And there's a lot of work, both
from the theory perspective-- and it's actually
what I was hinting at. A lot of stuff that my group
has done as well, trying to understand
what's the right way to solve this problem, which is
a Markov Decision problem, when I don't know the dynamics. And what I want
to say today is I don't think that problem is
actually that interesting. We've been doing a
lot of work on it. I think a lot of the fun
demos and [INAUDIBLE] like to approach things this
way, but at the end of the day, I told you, it's Newton's laws. Most of these things you know. You get down to like-- there's
parametric uncertainty. It's a real thing. But we know also how to
tune parametric uncertainty. Now, the other thing is we
have this crazy assumption here that I measure state. Right? And in all of these
things, we measure state. And how many systems do you
actually measure state, man? Really measure state, right? So the crazy thing is I'm going
to go to a much harder problem. The question is, do need
sophisticated learning in MDPs? And I would say, no. If you measure state,
system identification is least squares. We spent a lot of time showing
that actually just least squares is optimal for this. No matter what situation
you put yourself in, you're not going to
beat least squares if you're measuring the state. That problem's too easy. That problem's too easy
and we've known that. Standard engineering works,
so it has to be true. But it was kind of making that
formal, I think, was important. Do you want to
make this hard, is you say instead, I don't
actually observe state, I observe a picture that's
a function of the state. OK, now I have a weird problem. And all of a sudden I've gone
through the case of an MDP to a POMDP. Because I have to figure
out some way to go back-- I mean, I know
people like to say, some people say, 20 megapixels
and that's over redundant. So I have a complete
copy of the state. I do not think that is helpful. You do not want to have 20
million dimensional states. That's too high dimensional and
that makes control impossible. So we actually have-- most of the problems
that we have, especially if you're
working from images, are POMDP problems. To be completely fair, as soon
as you have a state measured with noise, it is POMDP
problem, because you have to filter to
estimate your state. So POMDP he probably actually
think are much more prevalent. And so, is pixel driven control
actually well modeled by MDPs? I would argue no. You could, and people do when
they do these Atari things, take a chain of these very
highly pixelated small images and then make predictions. And you could do that,
but for the most part, especially in robotic systems,
we know all that state. We know what states are
important to be able to do the kinds of things we want. Not always, there's
exceptions to this, but like the challenging
part is actually fusing some of these
more these more complicated sensing modalities
with that kind of state representation. And I can't say this enough. This is-- actually,
Leslie is not here. She wanted to make sure I
told everybody this again. So that like as soon as you have
imperfect state information, you have POMDP,
not a MDP problem, Leslie would go even
farther than I would, she would say that if you have
a game, that can maybe be an MDP and everything else is POMDPs. Everybody who knows Leslie
knows that she would say that. I'm not putting
words in her mouth. OK, so lastly, one other
thing that I want to point out that is a challenge is that
most people when they do these complex things
with cameras, just take like a standard
off-the-shelf deep net, plug it into their
control system. and then assume that
machine learning is going to help them, right? And actually, I think there is a
really weird, subtle thing that happens once we
start putting machine learning in feedback loop. And, actually, why I like
this problem so much-- this particular problem of when
you have noise in your state-- is that when you have
perceptual sensors, is you see how quickly are IID view
of machine learning breaks. Machine learning,
hopefully everybody has taken a course
machine learning, hopefully they tell
you, the promises of machine learning and learning
theory are actually super weak. I think we've solved them. I do think that machine
learning is effectively solved. I think machine learning was
effectively solved in 1970 and we can talk about
that at the reception. I don't think
anything really has changed from our initial view. But I think-- so the
main difference between-- So what machine
learning promises you is that if I sample a bunch
of data from a distribution and then do something,
minimize empirical risk, that I'm going to
perform well if I sample from that distribution
again and then evaluate. But there's something
weird there. Like, how do-- as
soon as you stop sampling from the
same distribution, machine learning
doesn't work anymore. And there's another
line of work I'm not to talk about
today where we've been studying how quickly those
kinds of shifts can happen. And I think if you
plug that paper-- I'm not going to
talk about it today, but we have a paper where
we created the test set for ImageNet and
you see that even trying to recreate somebody's
own rules for sampling from a distribution causes
distribution shifts that have very, very large unexpected
errors in the models that were trained on the one set. So these distribution
shifts are real, and if you do
something where you train some robotic policy
using your vision system and then turn the
vision system off and deploy without
the vision, you're using a different policy. So some of the work-- I mean, work I really like by
Chelsea Finn, Sergey Levine, Peter [INAUDIBLE],, Trevor
Darrell, on doing-- what do they call it? Guided policy search. They actually train
an optimal controller first using knowledge
of where everything is and then they turn that off
and just use the camera itself. And that is deploying
now a policy that is different than
what you were using when you were collecting the data. And so the camera is
actually seeing something that are immediately
outside of policy and you have to
account for that. OK, so standard generalization
actually does not work and we need something else. So we have some issues. One MDPs, which I spent the
last five years thinking about, which is great,
don't necessarily solve all the hard problems. They don't get us everywhere. And pixel-driven
control in particular is probably not going to be
best modeled by an MDP problem. I would also say that that
first one was amazing to me. If we spent five years
working on this-- we wrote all these hard papers. We bashed our heads
against the wall. And what it looks like now--
there was a paper at the Last [INAUDIBLE] by Stephen Tu,
Horia Mania, and myself, which essentially shows that
for linear control systems-- the best MDPs for linear control
systems-- the best thing to do is you take some data,
you fit your model, and then you treat
the model as true. Like, that's optimal. You could prove, I think-- I don't see Max or Dylan here. Max Simchowitz and
Dylan Foster just proved this minimax optimal. And that's crazy. I mean, of course, it's what
we've been doing since 1920. I don't know, a very long time. It controls. But it's interesting that
from the perspective of MDPs, that problem is relatively
straightforward. And all the extensions,
the tabular cases, you see the same thing. To nonlinear things, we
never have guarantees, but I'd be surprised
if there was something that amazingly different there. But these other two
things are big problems. How do we actually
deal with this kind of imperfect measurement and how
do we actually quantify errors? And so I want to just of
look at a simple case today. Again, I think there's
is a huge problems. As you say POMDP, everybody
should be terrified. Everything is hard as soon
as you move away from MDPs. And so I don't
have good answers, but I want to talk about
one relatively simple thing that we've done that
we're looking at. Again, I do not believe
it's an answer at all. I want to show you a demo
of us trying to do it, which I'm amazed that it
works and that's cool, and maybe where
we go from there. OK, so let me give
you an example. This demo will be at the end. Francesco Borelli, for
his undergraduate course in vehicular dynamics,
built this cool platform called the Berkeley
Autonomous Racing Car System. It is-- what do we have on here? Well, we put a camera on it. They don't usually have cameras. It has a dumb IMU, some
encoders, and an ODRIOD. It's a really cheap,
janky remote control car. Autonomous driving. [CHUCKLES] There it is. It's nice that if it
crashes into something, nobody hurts and nobody
cares too much because-- Vicky, the student
who's been doing most of the engineering on
this, calls this car Oscar because it belongs in the trash. [LAUGHS] It's fine. It's fine. We love Oscar. He's been good for us. So let me give you a goal
to do with this kind of car. It has very limited
sensing capability and I'm going to
actually cripple it. I'm going to tie its
hands behind its back. I want to actually have
this follow a demonstration and find the optimal way
to trace a demonstration as fast as possible. Only given one
demonstration and only using the sensor I wrote here. The camera, an IMU,
and the encoders. OK, so no external-- no global positioning system. And so this is what
that web camera sees. This is the Wozniak
Lounge at Berkeley. And so it's from images like
this and the wheel encoders and the IMU can
we actually give-- get this thing to actually
follow that trajectory and do something faster
than the initial driving. And it's challenging
because there's no depth and all the coordinates
are relative. OK, so it's not-- I mean, I know you
guys can do this. I'm not saying
you can't do this. I want to say, what
does this highlight? All right, I'm sure
anybody here could give you that platform-- well, actually
probably not that platform. You would build a better one. It's MIT. So, you would build
a better car platform that would solve this problem. You would do it in a weekend. But still, let's just
think about what's hard and what makes this
problem challenging. And really, the
reason why is we'd like to scale this to something
more interesting and more large scale. Right, so Francesco does a
lot of this stuff on real cars on a couple proving grounds,
where they can actually learn to improve maneuvers
in real cars using a variety of sensors. But they're usually much better,
more accurate state estimators. There's also a bunch of people--
there's really great platforms. It's funny, this car is
considerably more robust and heavy duty than ours. There's this really cool-- if anybody's visited
Georgia Tech, they have a really
cool dirt track where they race these things. They cost a lot
more, so it goes. They have GPUs on board. It's nice. And then there's also-- there's great work done
by Scaramuzza's lab at ETH where they actually do
racing with quad-rotors. I think racing is one of these
things where I don't actually think it's a real
application-- it's fun. There's fun things that
could come out of it and drone races are
obviously going to be cool. But I think it has a
lot of the character of a lot of robotics
applications and it brings a perception
in in a nice way as well. So I that's kind
of why it's nice. And the videos-- their
videos are better than mine, but that's fine. They have really cool
videos, you get nice demos, and it highlights a
lot of the issues. OK, so how do we
model the abstractions of doing this kind
of autonomous racing? Now, let me give
you a bit of it. And we'll give one view. So one thing would be you have
some unknown locally linear dynamics. I actually do
believe-- in this case, we do have unknown dynamics. And the biggest thing being the
tire forces being the hardest thing identify. You have some
observation model, which is the camera as
some other sensors, and then are our model is that
you also have this thing that takes whatever comes
with the camera and it gives you back
a state estimate. It doesn't even have to be
a state estimate, honestly. It just has to be a estimate
of a linear function of your state. So, for example, it could
just be position estimate without the velocity, right? Something like that. OK, so we get some
kind of measurement. So we're mapping
back to what would be-- this would
be linear control after applying the
perception, but we have this error term
that's induced by the fact that we're using perception. And quantifying what
the heck that error term is actually most of the problem. And it's weird. It's weird. If you look at the errors that
come out your SLAM system, they're weird. They're a little
bit hard to follow. Now we plug a-- we build a controller around
that, and that is our-- that's kind of one
abstraction for this problem. And in particular, today,
let me just ignore-- I'm going to, at least for
now, ignore the fact that we have to learn the dynamics too. We do it in practice. I don't really know how to think
about those things together, but we'll get there. What's wild about
the fact that we have to learn the dynamics is
that you're never getting-- I mean, maybe it's
not that wild-- you're getting an output. All your outputs
don't have any-- there's no global grounding. They're just what your
perception system tells you. So you have to learn
the dynamics model from the output of your-- I'm saying SLAM. Could be a neural
net, could be whatever the heck you want it to be. Random kitchen sinks. I don't care. Whatever is the thing that
maps camera images back to these positions. OK, so we want to
say, we're going to make the machine learning
basically black box. This perception thing
is our machine learning and I'm not going to talk
much about what it does. We're going to try to
quantify what comes out of it. And then we want to
understand how we use it. If someone hands us this
black box that they've tuned, it's either-- actually, we've looked at three
things-- random kitchen sinks, somebody's neural network
that, again, they've trained from images,
or it's just ORB-SLAM. One of those three. And then I think all
of those will give you some kind of position back. And then what does this tell
us about how we do a learning and control co-design? So, in some sense, I think
Russ would be thinking this immediately, he'd be
saying, look, this is just output feedback. This has been studied
for since the '60s. A lot of the development was
here I learned this morning. I know that. I knew that already. But it's always fun to
hear about the origins of robust control and how
MIT was connected to them. And so it is just kind
of we have a state that evolves according to time. We want to minimize some costs. In this case, it would be some
cost to go around the track. We have some linear observation
that is corrupted by noise and we want to
build a controller. And so, if you wanted to do
this, what's the problem? The biggest problem-- well,
it's actually-- honestly, it's a little weird to
be totally honest, right? Solving optimal control in
this setting is a little weird. I think most people
should know that LQG is much harder than LQR. Or is much less robust than LQR. It's also harder, fundamentally. You have this-- all the sudden,
you have this thing where I just take my state and apply-- you have state feedback. Then when you do
LQG-- what is LQG? LQG is just doing linear
quadratic problems, but now I observe things
through some Gaussian process instead of perfectly. At that point, now you have
to build a Kalman filter. And then you take the output
of the Kalman filter as true and you do state feedback. That's the optimal solution. It gets more complicated
as this noise process gets more complicated. Obviously, things
get more complicated, and the hard part here is just
figure out what the heck this E is. We're quantifying how
can we talk about the E. So here's a model. And this was actually
the hardest thing for us and I'm still not happy
about this at all. So once you even just decide
that that's the problem, it's like, now how
do I model the noise? So what I have here is that
the EK is going to be-- now there's this h, which is
my appearance function. I do not remember why I
called it H, but fine. H lifts the-- just because
F and G were taken, I guess. H is going to lift
things up into the high dimensional space. So every state
corresponds to some image. And then maybe
that image is noisy and that noise
depends on the state and maybe we have some
other noise there. This is like too much
crap, but it's fine. We can put all sorts of
stuff in the appearance. And then your neural network,
your ORB-SLAM, your whatever, maps you back into some
function of the state. And the error is just how
off are you from that CX. And we're going to say-- I shouldn't have this be equals. Well, no, I can
have it be equals. This is fine. This is nice. Nice thing about
adversarial noise models, it's some matrix
times your state. So it's a linear
function of your state. I don't know what delta is and
that delta can be time bearing. But I just want to say
it's some kind of map. It's some kind of state related
quantity and then noise. Now, that's the
adversarial part. So it is equal. So you can see that the
balance in these two things actually is what makes
it kind of complicated. I could put everything in eta. I can make something here. I have to make some
assumptions about how all these things play together. And I think that
the way you make those assumptions and
the way that you quantify those assumptions
is actually what makes this really challenging. So, being a control theory
minded person, what I would say is that you have this idealized
system, which we'll just take this error and give
you a beautiful model and then the perception
errors are going to come in. Instead of having this thing
actually give us perfect state, we have this thing kind of
introducing these perception errors and I'm just drawing
it as a block diagram. I don't know if there are any
control theorists in the room. Go ahead. Does the fact that
this error is sort of when you're in your state-- I mean, time-varying
is fine, but a lot of times I think of
those errors being very correlated with what's a
very time-varying SLAM system's going to be more accurate in
certain parts of the world. Exactly. That's what the state-- that's what the-- No, but
you're saying that I could actually index it by state. Yeah, but this is a
linear relationship. Yeah. That's great. Is that Going to be critical
in the treatment here or not? No, this is just going to
be, I'm going to show you how we do it in this case. We're also-- we're not sure--
what we're trying to figure out right now and what we're
trying we're trying to quantify is what's better. Because it's definitely true. You've done SLAM before. So it's definitely
true that SLAM is not spatially homogeneous. And we're trying to actually
quantify that and come up with a good model. I think that the next
revision-- we have a version of this on archive. The next version of it is
going to have a different error model to be totally honest. Because I think this is not
as realistic as it could be. This is just trying to
account for the fact that you go faster,
you're going to have blur. That would do
something like that. It's not going to
account for the fact that you have spatial
and homogeneity, which is definitely a real problem. But, do bear with
me, because I think that the treatment, at least
what we try to do, I think, does carry over. OK. So eta is like the part
that we're just going to say is bounded. It's just bounded. And so that's why we'd like to
have this thing that's state dependent and not have--
because if you just wanted-- you could just have
one unbounded error. We're not-- again, these
are the kind of things. I just want to
show there are lots of choices with how
you model the noise and actually how you fit that. And that's what we're still
trying to figure out-- what the right thing to do is. I think spatial and
homogeneity actually does seem to be bigger--
well, you'll see at the end. The complicated thing is
blur is actually doesn't seem to be the issue with velocity. It's dropouts. I'll show you that at the end. Quantifying the
error, the hard part, and then you can build
some stuff around it. OK. OK, so here's what-- OK, I just want to
give-- let's see. I don't have a ton
of time, but let me give you a very
high bird's level view of what we're
doing here at the end of the day with this thing. What we came down to, again,
is that we're doing control with these
disturbances mapped in and we want to build
robust control around it. And I have a way of
doing robust control now that I've been trained
by Nikolai Matni, who is now at Penn, to do. I like it a lot. I think that the way
that these kind of errors that come from things
machine learning propagate come through in a natural way. I feel like there's
a lot of other places where this is not
the right thing. For some reason
the kinds of errors we get introduced by measurement
error seem to be able-- we are able to handle
them more transparently in this framework. I do not believe it's
the only solution though. I just can't think
about it better. Let me just at least tell you
the way we think about it. You don't have to think
about it this way. I just can't tell you
how we think about it. Russ is skeptical,
but it's fine. I like this view. So here's our classic
view Also for me, it's like I get to be a
convex optimizer again. The classic view of things. You have you have
this abstraction box. I want to divine K. Everybody
knows that K is just going to be some
policy function, but, of course, this is
immediately not convex. Even in the linear
case it's not convex if you write things this way. This is known in robust control. And most of robust
control is fancy tricks to make things convex
in one way or another. I like this particular
fancy trick. It's actually-- there
are traces of this idea throughout the
control literature. It's just not the
most popular thing. But if you look at like Doyle,
Francis, and Tannenbaum, they do this in
the first chapter. Which is to say
that I can actually think of the whole
thing as a big system. When you talk about
[INAUDIBLE] pose in this as an
interconnection, you are thinking about the whole
system and a global view rather than the local view. So in particular, if
I have my controller be a linear time-varying,
possibly time-varying function of the state, then the
map from the disturbance to both the control and the
state is some convolution. And so I can-- the system
actually completely determined by that convolution. I never have to
know anything else other than what those phis are. And so I could just think
of the whole global system as some map that takes
disturbances and maps them into these internal
configurations. OK, so this is, again, not
new, but what's nice about this is that you can actually--
and this is also not new. Again, it's in the first
chapter of Doyle, Francis, and Tannenbaum's
feedback systems. Is that now you can look at it
like compositions of these maps to get out what the controller
in the real world would be. In this case, and this
is kind of obvious, if you take the map that goes
from state to disturbance and then the map from goes
to disturbance to output, that seems like the
composition of those two should be your control. And it is. So it turns out it is. It's that you invert the thing
that maps disturbance to state and then you
multiply by the thing that maps disturbance
to controller. And that actually
is the realization that you would use in the world. And so what this
lets us do is you take this original optimal
control problem, where maybe you want to
enforce some safety, maybe you have
robustness issues, maybe you have
uncertainty, and you map it into this kind of
high dimensional but convex problem
with these mappings. And so this is called systems
level synthesis, because you synthesize the
completion operators and then you take
those out and you build the thing you would
build in the real world. So it's kind of a nice way. It's lifting this thing into
this higher dimensional space, working with essentially--
some people might-- this is essentially
a juiced up version of what's called
disturbance feedback. It's nothing new,
but it's powerful. It lets us make
everything convex and I think because of that. It lets us be very transparent
for how the errors propagate. So what we do in the case
when you have an output is, again, all we care
about are these maps between the disturbances
and the state and the input. And so your perception
errors here would be eta. I'm lumping everything to eta
now, just make my life easier. Your disturbances for
your model are W. OK, so that would be
your equation errors. And then I just have to
make this matrix that maps these the services to X and U. And it turns out that
you can get affine-- there's an affine set of
things that are realizable that map to actual implementations. I could write those down
if I know A, B, and C. And then I can, again,
now it's more complicated, but I can construct
out the controller from the phis themselves. And what's cool and what we use,
and I think that there's lots of the reason why
I like this again, is that when you
want to be robust, what you could do is
say-- instead of saying, I have a perfect A, B, and C-- Let's imagine I have A plus
delta A and B plus delta B and C plus delta C. We
were talking about delta C, this is why it's coming up. Now what you can do is you say,
I just actually do synthesis treating this as true. Treat the A, B, and
C that I fit as true and then account for the
fact that I'll have a delta. And this is this beautiful
thing I love about system level synthesis. In this case, we're only looking
at C plus delta C. And turns out you get some new delta out. This is some operator. It doesn't really matter. But you have some term here. Multiply-- this
phi hat is saying, let's take the
models as true, even though we know they're false. And saying, how do I
have to transform them to get the actual realization? When I go to the real
world, is you end up multiplying by this
map "I" minus delta. And if I can quantify
the size of this delta, I can now quantify the
suboptimality of this solution. And that's kind of-- that's why I like SLS,
because you always get equations like this. And usually it's just
by dumb linear algebra. And then now if I want to
bound the norm of this, I bound the norm of this one
times the norm of that one. And if I want to bind
the normal 1 minus delta, we know how to do that. Or like sometimes you'll
get 1 minus delta inverse. Again, you start to plate
these kind of rules. So we're using this
lemma from a paper by Ross Boczar, Nik Matni,
and myself in this paper to deal with the delta
C. And so, as I said, the actual map is "I" minus
delta times phi hat times WN eta. And so now we can
actually see how the trajectories
that are realized, the true trajectories, are this. And so we can see how the
true trajectories arise as the design system
response, the actual noise, and these errors in perception. And the errors in perception
are all delta and eta in this case, which is cool. OK, so now you're like,
OK, I should be done. But this is where the
machine learning comes in and this is where the
generalization error comes in and this is what's weird. OK, so you take
this thing and you have these perception errors. And you synthesize
your controller using those perception errors. And now the question
is actually how do you show that the new
controller is actually going to respect everything,
because I have perception errors that I have in-- yeah, Russ? [INAUDIBLE] The cost? What cost do you want? We do all-- I mean-- [INAUDIBLE] worst case? No, so we are actually doing-- [WHISTLING] See the skip slide? [CHUCKLES] I'm
going to skip slide. Just for Russ, I had that ready. So there are lots you can
do, and they really just depend on how you
want to characterize the different disturbances. So typically the objective
is related to how we think about the noise. So LQR tends to be
things where you're thinking you have
either sensor noise or some kind of natural
stochastic process. You could do worse case. We could do L1. I actually think L1 is
actually surprisingly useful in a lot of cases,
because like for saturation limits, L1 seems to be
like the right model. But we can do-- it's just a norm. You end up just
having a norm and then you have to just propagate
the error through it. So you can treat them all-- it's whichever one you
feel most comfortable with. [INAUDIBLE] Yeah. Oh, for your cost? It probably does. So I think you have near both
design decisions though, right? Which makes it aggravating,
because everything now is coupled. Yeah. Where was I? Get back out of here. I was down here. OK, this did bring me to
this idea of generalization. So right, the classic
machine learning, I've mentioned this before,
has this annoying property that like the
generalization results on-- we rely on statistical
arguments about closeness of the training and test. You assume the
same distribution. And the thing is,
as soon as you have a closed-loop distribution--
so you collected this data and open-loop somehow
or the different sensor, I put things in closed-loop, I
have a different distribution. And so you end up kind of
moving from the something close to something far away. And I think the most
important thing in our paper, and the only thing I
think is probably really-- because we've done this in
these different settings. Russ is asking about
what costs we look at. The gentlemen in the
front here is asking me about how actually model
my perception errors. In both cases, actually,
the way that we actually prove that we get suboptimal
but not terrible control is to leverage this
idea that I can actually build a controller that
keeps me close to my data. I could build a
controller to say-- so you know that if you put this
controller out in the world, it's going to do
something different. I'm going to see different data. So you can actually
impose as a design and impose as a constraint I
should stay close to my data, I should try to
make the controller move into parts of space
that I've already seen. Which is weird, but at
the same time, that's kind of what we want to do. We don't want to be surprised. We don't want to be surprised. We would like to be boring. So the way that that actually
pops up in our theory is that you end up having
this thing which says control. You want to really make
sure that the mapping from the sensor
errors to your state. Which is something, again, this
is one of the design variables, Those are small. And they're bounded
by the things that depend on the noise. And, again, this is the
part that will change. You change your
homogeneous noise thing. Whatever is on the left hand
side is going to change. Go ahead. I guess I have a question. Yeah. This is assuming what you see
is depending on your state. But most of the
things we worry about is like you saw something
crazy in the road, it doesn't really
matter if you're driving in the middle of
the lane or on the left or on the right. There's a crazy thing
there you can't control. That's right. That's right. I'm not talking about
that problem today, but I do think that
this gets you there. So if you build a
controller that's designed to stay in the regime
of things that you believe are true, and all
of a sudden you get a spurious sensor
measurement that doesn't map to what
you saw before, shouldn't that be good signal? I haven't thought about it. I don't know yet. I don't know yet that's
a great question. That is a great question
because I totally agree. I totally agree. That's I think where
you want to go, right? It's hard to simulate
those things. And I don't think we want
to rely on simulation to-- like, I really do not
believe in this mindset that we just capture all
edge cases by simulation and that will solve our
robust control problem. And we know that's
not true, right? Because Tesla had this
thing where this guy drove his car under a trailer truck
in Florida on a two lane highway when the truck was taking
an unprotected left. And then two years
later some guy drove his car in
Florida under a truck that was taking unprotected
left in Florida. It's like, you saw that
edge case already, guys. That's kind of dark joke there
because both of them died, but still, that's like,
[CHUCKLES] anyway. OK, so let me now go-- So basically what this
means is that now we're stuck with how we
train these things. The training actually
has to be done in this way where
either you have a dense sampling of the
space, so I could stay close to my sampling-- for racing
we could probably do that as long as the track
doesn't change. Or, again, for racing,
imitation learning is a possibility, where you
want to stay close to the things you've seen in
previous laps, but you can improve as you move along. And, indeed, that's kind of
the demo I'd like to show. Let me skip out of that again. I forgot which slide to go to. I'm going to skip it. These simulations are boring. This simulation is less boring. Let's go back to this one. This is actually like-- Here, we're trying to fuse a
bunch of these things together. It's not perfect synergy. As everybody knows,
right, the theory and practice our farther in-- well, actually, I don't know. Are they father in
practice or in theory? We'll see. We'll see. But we're fusing a
bunch of these ideas together and trying to bring
these two worlds together. So in particular, for
this car demo that we did, we built a single imitation. A single thing to track. And that actually
was really important. We do see, as you try
to move this thing away from your demonstration,
that everything goes to hell. So we have one
demonstration from a human. And then we use--
the way this works is used to laps to generate
more data that I could follow. So in some sense, while the
math isn't quite the same for our synthesizer here,
you'll see in a second, we move to MPC, of course we do. Because you go from
beautiful control theory to MPC, because of course we do. But, the lesson
from generalization stays the same, which is
that I start with data I get from a human and
then the first step is to do something dumb and
slow to get more data that I can stay close to. Once I have that, we implement
what's called learning model predicted control, which
was a brilliant idea by Ugo Rosolia that I feel
like everybody should know, so I wanted to talk about it. It's like one of the most
amazing reinforcement learning ideas that nobody in
reinforcement learning knows. So I just wanted to talk
about that very briefly. I'll show you how we do it. And we kind of are
just at this point just using a smart data structure
to stay close to the data. We are using ORB-SLAM here. We're recording more data
to stay close to the data. And again, what you'll
see in the second is, again, using previous data
to stay close to the data. In all of these cases
what we're trying to do is stay close
to what we did before and allow ourselves
a little bit-- inside that boundary doing a
little bit to add improvement. So it really is just kind
of imitation learning setup that I was talking about. Let me skip SLAM. You guys all know SLAM right? If you don't, we could
talk at the reception. Don't want to go through SLAM. I didn't want to talk
about [INAUDIBLE].. And why are we not
using a neural net? I don't know. We couldn't. I don't care. I actually, OK, we're
not using neural net, because, I don't
know if you guys know Vicky E, graduate
of MIT, if anybody has met Vicky before. She's amazing. She worked with Bill Freeman. Bill has induced some biases,
so she doesn't like neural nets. So we'll blame Bill. No, actually, I don't
have neural nets either. Also, I think people just-- SLAM is really good
for a lot of things, so we'll just stick with that. Anyway, it works
for a lot of things. So iterative learning MPC. I just wanted to
tell you about this. Everybody should know about
it because it's amazing. Standard MPC, hopefully
everybody knows. You want to maximize reward
subject to your dynamics. And what you do is you build
this terminal constraint. Somehow, this magical terminal
cost function that induces robustness and allows you to
work on short time horizons and extrapolate to
large time horizons. That's standard MPC. This Q function, your terminal
Q function, you usually design. And there's lots of
tricks to designing it. There are whole books about it. This is this kind of like-- the
goal to make MPC really robust is how you pick that Q.
Learning MPC learns the Q, learns the Q function. It is a Q learning algorithm,
but does not look at all like standard Q learning. It's really beautiful. The idea is you let
SS, you safe set, be the set of all the data
you've ever seen before. Since we're doing
an iterative task, we actually know
what the value is, because you know
how long it takes to get to the end of the task. So I can always
have the value which is just the last
trajectory you did if you saw that safe point
before, because there's always stuff that's been previous
to you on this track. OK, so what you say is I
have to land in a safe point and the value is just whatever
the value was there before. So I'm just constraining
myself to land in a state I've seen before. But in between I can explore. And this a super weird. The exploration is
now just saying, I give myself a
horizon to explore, but I want to end up somewhere
that I feel like is reliable. And that's kind of our
area of learning MPC. It's such a brilliant idea. I love this idea. And it works really well. And it's like weirdly,
this kind of Q learning is the opposite of
standard Q learning, because standard Q
learning explores in the Q. It tries to visit
places-- it uses Q and says, where am I uncertain about Q
and go to places that I haven't been before. Here, it's saying, no, you have
to only go to high certainty places and I'm going to
allow myself some exploration before I get there. It's just a weird
turning on the head and we haven't been able
to find a good connection in most reinforcement
learning, including all of Dimitri's book. We had to look through it. OK, and so for
autonomous racing, the cost that you have to
get is actually really nice. It's just the time to get
to the end of the track. That's pretty easy. You pay a penalty 1
if you're not the end. You pay a penalty and
then 0 once you get there. That's the way that-- you want to minimize
the amount of time it takes to get to the end. And I did want to-- I don't want to talk
about this too much, but do I want to say that, while
we can write down the vehicular dynamics, we never use them. We just fit-- and
this is another part that I didn't understand,
but we put it in there it so much better. Rather than having
this nice model that we know, we
fit all the tire forces and this
complex interaction between the headings
and the velocities and this other weird thing
with the moment of inertia and governing the [INAUDIBLE]
car, we fit that all. Just like look at the previous
data we've seen before and fit locally linear dynamics. And then we'll take
these guys as given. This is just conversion from
local to global so that's fine. OK, linear [INAUDIBLE] OK. Oh, there's a video. Great, there's video. So here is the car
driving in the lounge. I got to fast forward
over one part in a second. All right, so this is
the first demonstration given by the human. And this thing is actually
annoying to drive, so it's slow. It's actually not as easy. And so this is what the
dashboard looks like. Poor Oscar. I say so may mean
things about him. This is the kind of
thing that you're seeing. Obviously, there's a lot of
good clutter in this room so it allows us to get a
lot of key points for SLAM. You guys can probably
all see where those are. Should have done
another jump cut. There it is. Key points. Obviously, on all
the table legs. By the way, that white
rope in the middle was there for Vicky,
because she's driving. It's not there for-- it is really just there to-- the car never sees it. You'll see it's
never in the frame. So the white rope is there
for us and for Vicky. It's actually not part of--
the car has no idea it's there. OK, and we had this first thing
where it does the PID lap. This PID lap is
super, super boring. I can't remember
how long it takes, so I'm going to skip to here. Now we learn from
previous lap data. Some parts of this
video, I mean, if you guys want to
stick around at the end, I will show it to you. But I wanted to show you
here, these green dots are the things out in front
that we think are safe. And those are the places
that's trying to go. And the red heading
is the trajectory that it picks to explore. So what you're seeing
there is the green places are places that it
thinks it can go. The red heading
is how it's trying to optimize to get there. And it's stitching through-- the red that's going around here
is the initial training data. And finally, [INAUDIBLE]
I have to go. There's some more data. And I think right-- they're just going to let it go. And here's after 20 laps. Here we go. Got my cue. Now it drives much faster. And there, now you see actually
that's trying extrapolate a lot further into the future. I think what also
is fascinating here is that we gave it a certain
amount of space it could drive. Like, we said you have
to stay inside the-- we draw the red and then we
draw a [INAUDIBLE] on the red, and it finds that actually
it's much better to not-- if it wasn't just
minimize the time, it's much better to create
this weird ellipse-- well, it's not weird-- it creates an
ellipse instead of the initial trajectory, which I think is-- It just learns it. And this is what the
camera view looks like. Again, this is the main sensor. I think it's going to
show-- there's Vicky. And there is-- now you'll
see the dashboard camera, and as I mentioned before,
watch how it flashes. So we're already starting to
see the state dependents here. It's not like there's
a blur effect, that you lose tracking effect. So try to actually quantify
what's happening in the sensor is something that
we're doing right now. I'm going to skip that one. And just in case you
don't believe me, we did it in a different room. You could just go-- we did this in three rooms. Here's room two. Driving around. It does work in
different environments let's just put it that way. We didn't just do one room. OK. [INAUDIBLE] Yeah, Russ is right. It does have a Kalman
filter inside of it, yeah. So that's fair too. So Russ is saying it's not
using the actual car dynamics. We didn't plug that in. But it does have this kind
of that kinematic smoothing. That's Newton's laws. So that is a fair point. That's a fair point. I can turn it off though. If that would make
you happier, we could do it with a turned off. I don't know. OK, so how do we sum this up? OK I do think there's something
interesting about trying to understand the uncertainty
of these perception maps as a visual sensor. And I think that the
main thing that we found that's
interesting is that I can get some
suboptimality guarantees and I can get some kind of
predicted safe execution as long as I don't
try to be too crazy. And that the perception
still will be a bit of a-- these sensors are not panaceas. We still kind of have
to tie our hands, because, sadly, what we're
handed from machine learning, is that I can only replicate
that what I've seen before. I think knowing that
and knowing a little bit about how to do the
uncertainty quantification, does allow us to
use these sensors. So how we go beyond
that, I'm not sure. We have this one idea. We like this system
level synthesis idea. What's been interesting
in the group is we've figured out
kind of nice ways with the same framework to kind
of include all of these things. Although, it starts to-- if
have all of these things maybe we shouldn't be
deploying a robot. But in different
ways, I do think that the hard part, and we're
still nowhere near close, is really understanding what the
right way to do this last one is. I feel like the uncertain
dynamics and safety constraints, these are
things we can handle. These are things we can do. The perceptual
sensing is a huge one. And also dealing with the fact
that your perceptual sensor is supposed to be designed, as
we pointed out-- as my friend here in the front
has pointed out, is supposed to be
designed to also be able to give us error signals. So as they're both-- I mean, most people don't
use cameras to guide their low level control. That's kind of insane. We're doing it though. That's fine. Let's just see what happens. That's what we did today. But we do use cameras. We're supposed to use them for
detecting static objects that possibly shouldn't
be in our scene and moving them out or
getting out of their way. And I don't think
we're anywhere really towards understanding how
to integrate the low level control with those
kind of detections in a safe and reliable way. But that's why life is exciting. That's why we have
lots of things to do. I just close one more thing,
which is a plug for the fact that many of you may have came
to learning for decision-- sorry, Learning for
Dynamics and Control 2019, Learning for Dynamics
and Control 2020 will be in Berkeley. I'll give the plug. We're going to take-- we're going to take contributed
submissions this time. So if you have
something that you think is cool that you
would like to share, please consider sending it. The deadline is
not November 15th. Oh, I forgot to fix that. I should fix that on the fly. Let's say it is December 6th. Sorry. Man, that was an
ambitious early version. Things take longer. But, yes, they're
just six page papers. They're extended abstracts. We're going to
have the best ones. The ones we like the most. We'll get orals. And then everybody
gets a poster and we're excited see how this goes. All right, with that, I'll stop. Thank you very much. [APPLAUSE] Any questions? Two questions about the error. So you have the term
C times x plus eta. So first, your
system is nonlinear. So you're basically controlling
a local time variant. Yeah. So when you just-- your x is large. First, your nonlinear terms
are also crammed into place. That CK is a second
order term, so why did you consider that to
begin with if your x is large, the non-linear terms
are also in error? And my second question
is, so you talked about eta is a bounded error. Is it like a-- do you consider
like [INAUDIBLE] or is it-- I mean, if it's coming
from a neural network, is it something more-- weird. So I know other
people have asked it. So when it comes from a neural
network, what is the error? Let's start there. Does anybody know? Nobody knows. So I think this is
actually a great question. Honestly, even-- I'll say this-- even with ORB-SLAM or
just normal SLAM methods, what the errors look like
under Gaussian assumptions they're kind of
easy to write down. Under real assumptions, when
you kick them out of a camera, they don't look Gaussian. And they're definitely
not spatially homogeneous. So, actually, I
think quantifying what the heck's perceptual
sensors are doing is super, super
interesting and super hard. I think this is kind of like
their next step of what we do with these things that are-- qualifying them with data
itself is also really hard. It's something I
would like to do. With regards to
non-linearity question, I will say that we're trying
to account for part of that and the delta of x. The other thing we're
doing, right, we're making these local
linear assumptions. So it's not like we're
linearizing around equilibrium. In the car, what you
saw, we would only do a linearization
over the last two laps. So we're only linearizing
at the higher speeds. Because, otherwise,
you're right, I mean, the errors
get really huge. Yeah, go ahead, Russ. So if you-- I'll repeat the question. OK, that's fine. All right. So your ORB-SLAM, it's like
you're taking and sticking a Kalman filter in a priori. So if you took an LQG problem
and you imposed your perception system as your Kalman filter
and you did SLS on that, would you get K? Would you get LQR out? That's an awesome question. OK, so the answer is-- OK, sorry, let me put this way. If you do SLS and you solve
LQR in the standard thing-- sorry, if you do LQG and
you solve it using SLS you do recover the
separation principle. This is with you imposing
the Kalman filter. No, without imposing
the Kalman filter. The analogy here. Quadratic cost and Gaussians. Right, right, but
here the analogy is you've written in
the Kalman filter. That's your ORB-SLAM, right? [INAUDIBLE] Sorry, what was the question? You've a priori constructed
your Kalman filter-- Right. --into the-- That's a different question. So now, we know that the
separation principle, which is like what everybody does,
it's what everybody does and it's cool. Right, it was a brilliant idea. Who do we attribute that to? I always forget. Simon. Simon? OK, so but the
idea is and it's-- for LQG? [INTERPOSING VOICES] But for-- Kalman. Fry Kalman, right? I was going to say
maybe [INAUDIBLE],, but, OK, anyway, it's old. But the idea is that,
right, so, amazingly, the structure goes
filter, Kalman filter, and that optimal solution
is do a Kalman filter, treat the output of the
Kalman filter as true, and then do state feedback. And that's a miraculous
thing that happens. In most systems,
we do that anyway. We build a filter that
gives a state estimator. We treat the output of the
state estimator as true and we do some kind of
optimal control around it. Right, kind of what we do. We can, using SLS, bound
the suboptimality of that. So we can incorporate
that model. We're not doing
that in the demo. But you're asking SLS to
solve an output feedback problem here. And It does not
give you a state. It does not give you-- It gives you time-varying,
whatever, crap out, right. The K that's coming out of SLS
is a more complicated thing. It's much more complicated. Well-- But it should, in your setup, it
should converge the K from LQR if that was-- It does if you have a quadratic. If you have the H2 cost. If you have a quadratic cost and
you're assuming Gaussian noise, that is what SLS gives you. But if you don't-- so you put an L1-- you get a different
solution out. You get a different solution. It's still something
you can implement and it will be some
kind of filter design, but it's more complicated. [INAUDIBLE] Yeah, K is not a matrix. Yeah. My question, how does the-- if we want to do alpha
feedback at K [INAUDIBLE] these phis, like the
whole history [INAUDIBLE],, how does that relate to building
like a safe set [INAUDIBLE] on having state. Like, I can't [INAUDIBLE]
history of [INAUDIBLE].. So how do I connect those two? First question. OK, the question
is in Q-learning. Q-learning comes from NDP
land which depends on states. So how did you actually
use Q-learning ideas when you have outputs? Great question. My question to you
is why can't you. I had [INAUDIBLE]. Oh, good. That's great. That's great. So, OK, we can
come back to that, because maybe I wasn't clear. The first part of the talk in
the second part of the talk are connected yet. And I think that's the
really important part. [LAUGHTER] What connected them? What connected them? [INAUDIBLE] We were trying to figure
that out because-- You were asking. Sorry, I didn't say it clearly. The only thing that
connected the first part and the second part,
really, was the fact that the only thing we could
do is imitation learning. And I that it actually informed
the design, because we got stuck a bunch of times
here trying to do this. Really, the only
thing that kind of connects those two and
the thing I'm really trying to connect together,
the theory says all that should work is either you
completely density sample any kind of point of existence. Which, to be fair, is
the Elon musk model. Or it's also the Waymo
model, to be really fair. So Waymo and Elon Musk
say we map everything. Dense complete coverage mapping
of everything, all weather conditions, all animals,
all obstructions. Or we imitate things
that we've seen before. That's what the theory said. That's the only thing
that we could bound and we could construct examples
where you don't do that and everything goes to hell. And that's how we kind
of-- that actually led to the engineering
design of this system. Was that we tried
to put everything in as close to an imitation
learning setting as possible. Now how do I actually
start to connect everything back together, we'll get there. We're not there yet. So sorry, I should-- that was kind of the
point of this slide. The only thing we know how
to do is imitation learning. And actually, if you look
at all the other racing examples I gave you,
that's all they do too. So, there is something. I mean, there might
be something there. But again, I don't
have the answers yet. Any other questions? If not, I guess we'll
continue in the reception. OK, great. Outside. Thank you very much. [APPLAUSE]