Drago Anguelov (Waymo) - MIT Self-Driving Cars

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

One of the best AI talks I've seen recently. This is a lot more relevant than many "pie in the sky" approaches, imo -- This is robustly building real systems that interact in the real world at full complexity (and weirdness), interacting with people in real time and pretty challenging control requirements. And it works.

One topic many dream about is meta-learning, and it's interesting to see it used here effectively, but you also get a sense at the gigantic scale meta learning needs. If training one large network is difficult, try training tens of thousands of large networks. That's only viable because of the scale of the problem.

Maybe one day governments and companies will pool resources and create a massive Meta-Learning-Architecture-Searcher, the scale requirements are truly colossal w.r.t. the speed of current computers, the speed of silicon.

At least until we can improve algorithmic efficiency at the higher levels... (e.g. more human-like reasoning)

Also it makes me pretty confident in estimating just about any task is already almost within reach of Hybrid ML/Non-ML already. It will just take lots of engineering effort. More general intelligence could possibly necessitate more computing which we don't have (per Moore's law limitation), and beside for a few systems in the world most AIs doing those valuable tasks will be hybrids with huge capital behind them (e.g. one huge company makes LawyerBot, one makes MedicDiagnosisBot, and probably eventually ProgrammerBot (probably further specialized in specific fields like FrontEndDesignBot, BackEndBot, etc.)) and TheoremProverBot. The tasks that will be tackled first are the ones that have a large payoff product

P = Salary x Number of human workers

(note for driving cars this number is huge), for a more or less uniform task.

I don't think computational difficulty puts any approximately "uniform" existing task outside the reach of this kind of approach, given the technology we already have -- as long as there is a large payoff to be had.

Humans are quite general thinking, environmentally-aware, etc. because we needed it given our natural background and natural environments. It's not clear, actually quite the opposite, that general AIs are something economically so desirable. Unless of course you're trying to design them per se, as a new form of creature.

👍︎︎ 2 👤︎︎ u/darkmighty 📅︎︎ Feb 13 2019 🗫︎ replies

I've mentioned this before but the improvement in YouTube's captioning thanks to neural networks is huge. Drago has an accent and this is just a random lecture, no special audio, but the captioning is still dead-on and even the mistakes like the transcription of 'NAS' or 'CIFAR10' make a lot of sense.

👍︎︎ 1 👤︎︎ u/gwern 📅︎︎ Feb 12 2019 🗫︎ replies

Captions

all right welcome back to 6sz ro9 for deep learning for self-driving cars today we have Drago and glial of principal scientists at way mo aside from having the coolest name in autonomous driving Drago has done a lot of excellent work in developing applying machine learning methods to autonomous vehicle perception and more generally in computer vision and robotics he's not helping way mo lead the world in autonomous driving 10 plus million miles achieved autonomously to date which is an incredible accomplishment so it's exciting to have Drago here with us to speak please give him a big hand [Applause] hi thanks for having me I will tell you a bit about our work and the the exciting nature of self-driving and the problem and our solutions so my talk is called taming the long tail of autonomous driving challenges my background is in perception in robotics so I did PhD at Stanford with Daphne Koller and worked closely with one of the pioneers in the space professor Sebastian Thrun I spent eight years at Google doing research on perception also work on Street View developing deep models for detection neural net architectures I was briefly zooks I was heading the 3d perception gaming jokes were built another perception system for autonomous driving and I've been leading the research team at way more in most recently so I want to tell you a little bit about Weimar when we start way more actually this month has its 10-year anniversary it started with Sebastian throng convinced the Google leadership to try an exciting new moonshot and the goal that they set for themselves was to drive 10 different segments that were 100 miles long and later that year they succeeded and drove an order of magnitude more than anyone has ever driven in 2015 we brought this car to the road it was built ground up as a study in what fully driverless mobility would be like in 2015 we put this vehicle in Austin and it completed the world's first fully autonomous ride on public roads and the person inside this car is a fan of the project that he is blind so we did not want this to be just a demo fully driverless experience we worked hard and in 2017 we launched a fleet of fully self-driving vehicles on the streets of in Phoenix metro area and we have been doing driverless fully driverless operations ever since so I wanted to give you a feel for what fully driverless experience is like [Music] [Music] [Music] [Music] and so we continued last year we launched our first commercial service in the metro area of Phoenix there people can call a web on their phone it can come pick them up and help them with errands or go to school and we've been already learning a lot from these customers and we're looking to grow and expand the service and bring it to more people so in the process of drawing the service we have driven 10 million miles on public road is like said and driverless lis in Enmore also with with human drivers to collect data and we've driven all kinds of scenarios cities capturing a diverse set of conditions and a diverse set of situations in which we develop our systems I want to tell you I mean about the long tail of events this is all the things we need to handle to enable truly sub driver this future and I guess all the problems that come with this and offer some solutions and show you how has been thinking about these issues so as we drove 10 million miles of course we still find scenarios new ones that we have not seen before we still keep collecting them right and so when you think about self-driving vehicles they need to have the following properties first a vehicle needs to be capable it needs to be able to handle the entire task of driving so you cannot just a subset and remove the human operator from the vehicle and also all of these tasks obviously need to do well and safely and that is the requirement to achieving so driving at scale and when you think about this now the question is well how many of these capabilities and how many scenarios do you really need to handle well it turns out well the world is quite diverse and complicated and there is a lot of rare situations and all of them need to be handled well right and they call this the long tail the long tail of situations you it's it's it's one type of effort to get yourself driving for the common cases and then it's another effort to tame this the rest and they really really matter and so I'll show you some for example this is us driving in the street and let's see if you can tell what is unusual in this video you see so this I can play it one more time so there's a bicyclist and he is carrying a stop sign and I don't know where he picked it up but it's certainly not a stop sign we need to stop for unlike others right and so you need to understand that let me show you another scenario this is another case where we are happily staying there and then the vehicle stops and a big pile of poles comes our way right and you need to potentially understand that and learn to avoid it generally well different types of objects can fall on the road it's not just pose here's another interesting scenario this is happens a lot it's called construction and there's various aspects of it one of them is someone changed clothes Delaine put a bunch of cones and we learn and this is our vehicle correctly identifying where it's supposed to be driving between all of these cones and and successfully executing it so yeah we drive for a while and this is this is something that is happens fairly often if you drive a lot another case is this one I think you can you can understand what happened here and you can notice actually so we hear the siren so we we have the ability to understand sirens to special vehicles and you can see we hear it and stop and some guys are much later than us breaking at the last moment letting the emergency vehicle pass and here's another scenario potentially I want to show you let's see if you can understand what happened so let me play one more time did you guys see so we stopped at there's a green light we're about to go and someone goes at high speed running a red light without any remorse right and we successfully stop and prevent issues right and so sometimes you have the rules of the way and you have your road and people don't always abide by them and that's something that you know I don't want to just directly go in front of that person even if they're breaking the law so hopefully with this I convince you that the situations that can occur a diverse and challenging and there's quite a few of them and I want to take a little bit on a tour of what makes this challenging and then tell you some ways in which we think about it and how we're handling it and so to do this we're going to delve a little bit more into the main tasks for sub driving which is perception prediction and planning so I'll tell you a little bit about those right and perception these are the core AI aspects of the car usually this task there's others we can talk about others as well in a little bit but that let's focus on this person so perception is mapping from sensory inputs in potentially prior knowledge of the environment to seen representation and that same presentation can contain objects it contains in semantics potentially you can construct the map you can learn about objects or relationships and so on and perception the space of things you need to handle in perception is fairly hard it's a complex mapping right so you have sensors the pixels come later points come or radar scans come and you have multiple axis of variability in the environment so obviously there's a lot of objects they have different types appearance pose is I don't know if you see this well they're a bunch of people dressed as dinosaurs in this case people generally are fairly creative in how they dress vehicles can also be different types people come in different poses and we have seen it all right so that's one of prospects there's different environments that these objects appear in so there are times of day seasons day night different for example highway environment suburban street and so on and then there's a different variability axis and this is a little more slightly more abstract that different objects can come in these environments in different configurations and can have different relationships and so things like occlusion there's a guy carrying a big board there is reflections there is smell people riding on horses and so on and so what am i showing this because I just want to show you the space right so in most cases you care about most objects in most environments in most reasonable configurations and that's a space that you need to map from from the sensor inputs to a representation that makes sense and you need if you need to learn this mapping function or represent it somehow right and so let's go to the next step which is prediction so apart from just understanding what's happening in the world you need to be able to anticipate and predict what some of the actors in the world are going to do the actors being mostly people and people is honestly what makes driving quite challenging this is one of the aspects that do so it's you know vehicle needs to be out there and be a full-fledged traffic scene participant and this anticipation of agent behavior sometime needs to be fairly long-term so sometimes when you want to make a decision you want to validate or convince yourself it does not interfere what what anyone else is going to do and it can go from one second to maybe ten seconds or more you need to anticipate the future so what goes into anticipating the future well you can watch it past behavior some ones I'm going this way maybe I will continue I'm going there maybe I'm very aggressively walking and maybe I'm more likely to do aggressive motions in the future high levels in semantics well I'm in a presentation room I'm sitting here at the front giving a talk I'll probably stay here and continue even though stranger things have happened and of course there's subtle appearance skills so for example if a person's watching our vehicle and moving towards them we can be fairly confident they're paying attention and not going to do anything particularly dangerous if someone's not paying attention or being distracted or you know there is a person in the car waving at us various gesture skills the blinkers from the vehicles these are all signals and and subtle signals that we need to understand in it in order to be able to behave well and last but not least even when you predict how other agents behave agents also affected by the other agents in the environment as well so everyone can affect everyone else and you need to be mindful of this so I'll give you an example of this I think this is one of the issues that really needs to be thought about we are all interacting with each other so here's the case our way move vehicle is driving and there is two bicyclists in red going around a parked car and what happens is we correctly anticipate that as day bike they will go around the car and we slow down and let them pass right so we reasoning that they will interact with the parked car this is the this is the prediction our most likely prediction for the rear bicyclists we anticipate that they will do this and we correctly handle this okay so this illustrates prediction and here planning this is our decision-making machine it produces vehicle behavior typically ends up in control commands to the vehicle accelerate slow down steer the wheel any to generate behavior that ultimately has several properties to and it's important to think of them which is safe safety comes first comfortable for the passengers and also sends the right signals to the other traffic participants you because they can interact with you and they will react to your actions you need to be mindful and you need to of course make progress you need to deliver your passengers so you need to trade all of these in a reasonable way right and it and it can be fairly sophisticated reasoning and complex environments I'll show you just one scene this is this is the complex I think school gathering there's bicyclist trailing us vehicles really close the hand within as a bunch of pedestrians and we need to make progress and here is us we're driving and reasonably well in crowded scenes and that is part of the prerequisite of bringing this technology to in all the deaths urban environments being able to do so how are we going to do it well I gave it up I'm a machine learning person I think when you have this complicated models and systems machine learning is a really great tool to model complex actions complex mapping functions features right and so we're going to learn our system and we've been doing this I mean we're not the only one so obviously this this is now a machine learning revolution and machine learning is permeating all parts of the way imma stack all of these systems that I'm talking about it helps us perceive the world it helps us making decisions about what others are going to do it helps us make our own decisions and machine learning is a tool to handle the long tail right and now tell you a little more on this how so I have this allegory about machine learning that I like to think about so there is a classical system and there is a machine learning system and to me a classical system and I've been there I've done well early machine learning also systems also can be a bit classical you're the artisan you're the expert you have your tools and you need to build this product and you have your craft and you go and take your tools and build it right and it can fairly quickly get something reasonable but then it's harder to change it's harder to evolve if if you learn new things now I need to go back and maybe the tools don't quite fit and you need to essentially keep keep tweaking it and starts becoming the more complicated the product becomes the harder it is to do and machine learning modern machine learning is like a factory right so machine learning you build the factory which is the machine learning infrastructure and then you feed data in this Factory and get nice models to solve your problems right and so kind of infrastructure is at the heart of this new paradigm you need to build the factory all right once you do it now you can iterate it's scalable right just keep the right data keep feeding the machine keeps giving you good models so what is the ml factory for self-driving models well roughly it goes this we have a software release we put it on the vehicle we're able to drive we drive we collect data we collect it and we store it and then we select some some parts of this data and we send it to labelers and the label is labeled parts of the data that we find interesting and that's the knowledge that we want to extract from the data these are the labels they are notations the results we want for our models right there is and then what we're going to do is we're gonna train machine learning models on this data after we have the models we will do testing and validation validate that they're good to put on our vehicles once they're good to put on our vehicles we go and collect more data and then the process starts going again and again right so you collect more data now you select new data that you have not selected before right you add it to your data set you keep training the model and iterate iterate iterate it's a nice scalable set up of course this needs to be automated it needs to be scalable itself it's a game of infrastructure right and at Weimer we have the beautiful advantage to be really well set up with regards to the machine learning infrastructure and I'll tell you a bit about its ingredients and how we how we go about it so ingredient one is computing software infrastructure and we're part of alphabet Google and we are able to first of all leverage tensorflow the deep learning framework we have access to the experts the throat pans the flow and know it in-depth we have data centers to run large-scale parallel compute and also train models we have specialized hardware for training models which you know make it cheaper and more affordable and faster so you can iterate better ingredient to high quality label data we have the scale to collect and store hundreds and thousands and more miles to millions of miles and just collecting a store and convenience miles is not necessarily the best thing you can do right because there is a decreasing utility to the data so most of the data comes from common scenarios you may be already good at them and that's where the long tail comes right so so it's really important how you select the data and so this is important part of this pipeline so while you're running release on the vehicle we have a bunch of models we have a bunch of understanding about the world and you can we annotate the data as we go and you can use this knowledge to decide what data is interesting how to store it which data we can potentially even ignore so then once we do that again we need to be very careful how to select data we want to select data for examples that are interesting in some way and complement capture these long tail cases that we potentially may not be doing so well on and so you know for this there is we have active learning and data mining pipelines given exemplars find the rare examples look for parts of your system which are uncertain or you know inconsistent over time and and go and label those cases last but not least we also produce auto labels so how can you do that well when you collect data you also see the future for many of the objects what they did and so because of that now knowing the past and the future you can annotate your data better and then go back to your model that does not know the future and try to replicate that with that model right and so you need to do all of this is part of the system ingredient number three high quality models we're part of larger alphabet and Google and deepmind and generally alphabet is the leader in AI when I was at Google we were very early on the deep learning revolution I happen to have the chance to be there at the time it was 20 2013 when I got on to do deep learning and a lot of things were not understood and we were there working on it earlier than most people and so through that we had the opportunity and the chance to develop some of the in my time the team I managed to invented neural net architecture like Inception which became popular later we invented at the time the state of the art object detection fast object detector called SSD and we want imagenet 2014 and now if you go to the conference is Google and deep mine the leaders in perception and reinforcement learning and smart agents and you know there is like state of the art say semantic segmentation networks pose estimation and so on the object detection of course goes without saying and so we collaborate with Google in deep mountain projects improving our models and so this is my factory for self-driving models and I want to tell you something that kind of captures all of these ideas infrastructure data and models in one this is a project we did recently and today we put online in our blog about automatic machine learning for tuning and adjusting architectures of neural networks so so what what did we do so there is a team at Google working on auto ml automatic machine learning and usually networks themselves a complex architecture they're crafted by practitioners - artisans of networks in some way and sometimes you know we have very high latency constraints in the models we have some compute constraints the network's is specialized it takes often people months to find the right architecture that's most performant low latency and so on and so there's a way to offload this work to the machines you can have machines themselves once you suppose the problem go and find your good network architecture that's both low latency and high performance right and so that's what we do and we drive in a lot of scenarios and we as we keep collecting data and finding your cities or new examples the architectures may change and we want to recently find that and keep evolving that without too much effort right so so we worked with the Google researchers and they had a strong work where they invented well they developed a system that searched the space of architectures and found a set of components of neural networks it's a small sub Network called mast cell and this is a diagram of a nerve cell it's a such set of layers put together that you can then replicate in the network to build a larger Network and they discovered in a small vision dataset it was called C 410 it has its it's from the early days of deep learning it was a very popular date set and you can quickly trade models and and explore the large search space so the first thing we did is it took some problems in that we have for our stack one of them being lighter segmentation so you have a map representation and some lighter points and you essentially Sigma and the lighter points you say this is this point is part of a vehicle that point is part of vegetation and so on this is a standard problem so what we first did it way mo is we explored several hundred mast cell combinations to see what performs better on this task and we thought one of two things happened for the various versions that we found one of them is we can find models with similar quality but much lower latency and less compute and then there is models of a bit higher quality at the same latency it's essentially we found better models than the human engineers did and similar results were obtained for added problems Lane detection as well with this transfer learning approach of course you can also do entrant architecture search so there's no reason why what was found on C 410 is best suited for our more specialized problems and so we went about this more from the ground up so let's find exactly deeper search much much larger space not limited to the nest cells themselves and so the way to do this is because our networks are trained on quite a lot of data and take quite a while to converge and it takes some compute we went to define the proxy task this is a smaller task simplified but correlates with the larger task and we do this by some experimentation of what would be a proxy task and once we establish a proxy task now we execute the search algorithms developed by the Google researchers and so we train up to 10,000 architectures with different topology and capacity and once we find the top hundred models now we train the large networks on those models all the way and pick the best ones right and so this way we can explore much larger space of network architectures so what happened so on the Left this is 4,000 different models spanning the scale and latency and quality and in red was the transfer model so act after the first round of search we actually did not produce the better model than the transfer which already leveraged their insight so then we took the learnings and the best models from this search and did the second round the search which was in yellow which allowed us to beat it in third is we also executed reinforcement learning algorithm developed by their researchers on 6,000 different architectures and that one was able to significantly improve on the red dot which also significantly improves on the in-house algorithm so that's one example where infrastructure data and models combine and shows how you can keep automating the factory that is all good but we keep finding new examples in the world and for some situations we have fairly few examples as well right and so there are cases where the models are uncertain or potentially can make mistakes and you need to be robust to those I mean you cannot put the product and say well our network just don't handle some case and it's so so we have designed a system to be robust even when ml is not particularly confident and how do you do this so one part is of course you want redundant in complementary sensors so we have given 360-degree field of view on our vehicles both in camera lighter and radar and they're complementary modalities first of all you know an object is seen in all of them second of all they all have different strengths and different modes of failure and so whenever one of them tends to fail the others usually work fine and so that that helps a lot make sure we do not miss anything also we design our system to be a hybrid system and this is the point I want to make right so I mean some of these mapping problems or you know problems with nutria player models are very complicated they're high dimensional the image has a lot of pixels lighter has a lot of lighter points right the networks can end up pretty big and it may not be so easy to train with very few examples with the current state of the art and so the state of the art keeps improving of course so this is their zero short and one-shot learning but we can also well the state of the art is improving in the models we can also leverage expert domain knowledge and so what does that do so humans can help develop the right input representations they can put an expert bias that constrains the representation to fewer parameters that already describe the task and then with that bias it is easier to learn models with fewer examples and there is also of course experts can put in their knowledge in terms of designing the algorithm which incorporates it as well right and so our system is this hybrid it's an example of what that looks for perception is well with no matter if the there's cases where the machine learning system may be not confident we still have tracks and obstacles from leather and radar scans and we make sure that we we drive relative to those safely and in prediction and planning if we're not confident in our predictions we can drive more conservatively and over time as the factory is running and our models become more powerful of course improve and we get more data of all the cases the scope of ml grows right and the sister the the set of cases that you can handle with it increases and so there's two ways to attract attack the tail you both protect against it but you also keep growing ml and making a system more performant I'm going to tell you now how we deal with large-scale testing which is another key problem it's very important in in the pipeline and also in getting the vehicles on the road so how do you normally develop a self-driving algorithm well the ideal thing you're gonna do is you make your algorithm change and you would put it on the vehicle and drive a bunch and say now it looks in great alright let's make the next one the problem is I mean we have a big fleet we have a lot of data but some of the conditions and situations occur very very rarely and so if you do this you're gonna wait a long time furthermore you don't just want to take your code and put it on a vehicle you need to test it even before that you don't want to like you want very strongly tested code in public streets so you can do structured testing we have a 90 acres air force base place where we can test very important situations and situations that occur rarely it's an example of such a situation and so you can do this as well so you can select and deliberately staged safely conditions occur but now again you cannot do spore all situations so what do you do a simulator right and so how much we need to simulate well we simulate a lot so we simulate the equivalent of 25,000 cars virtual cars driving ten million miles a day and seven over seven billion miles simulated it's a key part of our release process so why do you need to simulate this much right well I hopefully I convinced you there is a variety of cases to worry about and that you need to test right through so far and furthermore it goes all the way bottom-up so as a change perception for example slightly different segmentation or detection the changes can go through the system and you know the results can change significantly and you need to be robust to this you need to test all the way so what to simulate one thing you can do is Teaneck scenarios from scratch working with safety experts Nitsa and analyzing water conditions in which typically lead to accidents so you can do that of course you can do it manually you can create them what else could you do well you want to leverage your driving data you have all your logs you have a bunch of situations there right so you can pick interesting situations from your logs and furthermore what you can do is to take all these situations and you any create variations of this situation so you get even more scenarios so here's an example of a log simulation I'll play Twice first time look at the image this is what happened in the real world the first time so in the real world we mostly stayed in the middle lane and stopped if you see what's happened in simulation simulation our algorithm decided this time to merge to the left lane and stopped and everything was fine things were safe things were happy what can go wrong in simulation from logs well let's say this is another scenario slightly different visualization our vehicle when it drove the real world was where the green vehicle is now in simulation we drop differently and we have the blue vehicle right and so we're driving BAM what happened well there is a purple they're pasty purple agent who in the real world saw that we passed them safely and so it was safe for them to go but it's no longer safe because we changed what we did so the insight is in simulation our actions affect the environment and it need to be accounted for so what does that mean if you want to have effective simulations on a large scale you need to simulate realistic driver and pedestrian behavior so you know you could think of a simple model well how do you do oxy or what's a good approximation of a realistic behavior well you can do a break and swerve model so you just say well there is some normal way reactions happen you know I have a reaction time and braking profile it may be swerving profile so if an agency someone in front of them maybe they just apply it is an algorithm all right hopefully I convinced you that behavior can be fairly complicated in this will not always produce a believable reaction especially is complex interactive cases such as merges lane changes intersections and so on right so what could you do you could learn an agent from real demonstrations well you went and collected all this data in the world you have a bunch of it information of how vehicles pedestrians behave you can learn the model and use that okay so what is an agent let's look a little bit an agent receives sends the information maybe context about the environment and it develops a policy it develops a reaction that's the driver agent in applies acceleration is steering then gets new sensor information new map information place in the map and it continues and if it's our own vehicle then you also have a router that's in explicit intent generator which says well the passenger wants you to go over there why don't we try to make a right turn now so you also get an intent and this is an agent you know it could be in simulation it could be in the real world roughly this is the picture and this is an end-to-end agent end to end learning is popular right to its best approximation if you learn a good policy this way you can apply it and have very believable agent reactions right and so I'm going to tell you a little bit about work we did in this direction so we put a paper on archive about a month ago I believe on we took 60 hours of footage of driving and we try to see how well we can imitate it using a deep neural network all right and so one option is to do exactly the same to antigen policy but we wanted to make a task easier how well we have a good perception system at Weymouth so why don't we use its products for that agent also can simplify the input representation a bit that is good if bigdhaas becomes easier controllers are well understood we can use an existing controller so no need to worry about acceleration and arcs we can generate trajectories now if you want to see in a little more detail to understand the representation is so we have this is our our agent vehicle which is sub driving vehicle in this case but could be a simulation agent and we render an image with it at the center and potentially we augment it with some we can we can generate a little bit of rotation to the image just so we don't over bias the orientation a specific way all right and it's an 80 by 80 box so we roughly see about 60 meters in front of us and 40 meters to the side in the center and now we render a road map in this box which is the map like which lanes you're allowed to drive on these traffic lights and generally at intersections we render what lanes are allowed to go and what lanes and how the traffic lights permitted or do not permit it then you can render speed limits the objects result of your perception system you render your current vehicle where it believes it is and you render the post history so you you give an image of where the agents been in the last for a few steps and so you want and last but not least you render the intent so the intent is where you want to go so the conditions on this intent and this input you want to predict the future waypoints for this vehicle right so that's the task and you can praise it as a supervised learning problem man just learn to learn a policy with this network that approximates what you've seen in the world with 60 hours of date course learning agents there is a well-known problem it's identified it's called paper dagger by Stephane Ross who is actually way more now and Andrew Pannell so it's easy to make small errors over time so even though in each step if you do if you could do a relatively good estimate if it strings 10 steps together you can end up very different from where agents have been before right and there is techniques to handle this right one thing we did is synthesize perturbations so you have a trajectory and we synthesize the form the trajectory and force the vehicle to learn to come back to the middle of the way so that's something you can do that's reasonable now you know if you just have direct imitation based in supervision we are trying to pass the vehicle in the street and it's stopping and never continuing so now we did perturbations and well it kind of ran through the vehicle right so that's not enough so we need more right it's not actually an easy problem so in addition to having this agent RNN which essentially takes the past and keeps creates memory of its past decisions and keeps iterating predicting multiple points in the future so it predicts the trajectory piecemeal in the future how about we also learn about collisions and staying on the road and so on so we've meant the network and now the network starts also produce predicting a mask for the road and now we have a loss here I don't know if I can point so here you have a road mask loss you say hey if you driver generate motions that take outside the road that's probably not good hey if you ever cause collisions where your perception network which takes takes the other objects and predicts their motions to predict here our motion where the road is in the other agents motion in the future and they're trying to make sure there's no collisions in that we stay on the road so you add this structural that adds a lot more constraints to the to the system as it trains so it's not just limited but what's it with what it's explicitly seeing it allows it to reason about things it has not explicitly seen as well and so now here's an example of us driving with this network and it can now it can you can see that we're predicting the future it with the yellow boxes and we're driving safely to intersections and complex scenarios actually handles a lot of scenarios very well I if you interested I welcome you to go read the paper it handles most of the simple situations fine so now we have our past two approaches the passing a parked car one of them stops in every starts the other one hits the car now it actually handles it fine and beyond that afterwards we can stop at a stop sign happily which is the red line over there and it does all of these operations and what we did beyond this is we took the system has learned to an imitation data and we actually draw our real bueno car with it so we took it to castle their force base staging grounds and this is it driving a road it's never seen before and stopping at stop signs and so so that's all great we could use it as an agent simulation world and we could drive a car with it but it has some issues so let's look on the left so here it is driving and then it was driving too fast so because our range is limited it didn't know it had to make a turn in it over and the third so it just drove off the road that's one thing that can happen so you know when one area of improvement more range hears it is another time so yellow is by the way what we did in the real world and green is what we do in the simulation in that example and here we're trying to execute a complex maneuver a u-turn we're sitting there and we don't try to do it and we almost do it but not quite and at least we end up in the driveway and there is that the interactive situations when they get really complex this network also does not do too well right and so what does that tell us well long tale came again in testing right there's again you can learn the policy for a lot of the common situations but actually in testing some of the things you really care about is the long tail you want to test to the corner cases you want to test in the scenarios where someone is obnoxious and adversarial and there's something not too kosher right so one way to think of it is this right this is the distribution of human behavior and of course it goes in multiple axis it could be you know aggressive and conservative right and then somewhere in between you could be super expert driver is super inexperienced and somewhere in between and so on so like our end-to-end model it's fairly it's an ambassador's Entei ssin meaning it could in theory learn any policy right I mean if you see everything you want to know about the environment by and large but it's complex and this is similar a bit to the models as well some of the models we talked about before you can end up with complex model if you have complex input this is images that are 80 by 80 with multiple channels it's a large input space the model can have tens of millions of parameters now if you have an example if you have a case where you have two or three examples in your whole 60 hours of driving there's no guarantee that your 10 million parameter model will learn it well right and so it's really good when you have a lot of examples it's really trying to do well in those and then you have the long tail so what do you do well we can improve the representation you know we can improve our model this is you know there is a lot of room to to keep evolving this and then this area will keep expanding right and that's one good direction there is a lot of interesting questions how to do that and we're working on a lot of them is actually some exciting work hopefully I get to share with you another time something else you can do if you remember from my slide about the hybrid system when you go to the longtail you can you can do essentially a similar thing which is simpler biased expert design input distribution that is much easier to learn with few examples you can also of course use expert design models and so in this case you still will produce something reasonable by inputting this human knowledge and you could have many models I mean there's not one you could just tune to various aspects of this distribution you can have little models for all the aspects you care about you can mix and match it so that's another way to do it so let me tell you about one such a model so the trajectory optimization agent so we take inspiration from a motion control theory and we want to plan a good trajectory for the vehicle the agent vehicle and that satisfies a bunch of constraints and preferences and so one insight to this is that we already know what the agent did in the environment last time so you have fairly strong idea about the intent and that helps you when you specify the preferences because you can say okay well I have give me a trajectory that minimizes some set of costs which are preferences on the trajectory typically called potentials what is the potential well at different parts of the trajectory you can add this attractor potential saying well try to go where you used to be before for example and that's the benefit of in simulation you have observed what was done so this is a bit simpler and of course you can have repeller potential don't hit things don't run into be a cause right so to first approximation that's what the roughly looks like and so now where is the learning right well it's still machine learning model there is a presentation these potentials have parameters it's the steepness of this of this curve there is sometimes they are multi-dimensional right there's there's a few parameters typically we're talking a few dozen parameters or less all right and you can learn them too so there is a technique called inverse reinforcement learning want to learn these parameters that produce trajectories that come close to the trajectories you've observed in the real world so it see if you pick a bunch of trajectories that represent certain type of behavior you want to model the tunia parameters to behave like it then you want to generate reasonable trajectories continuous in all feasible that satisfy this right and this is part of this optimization you can solve this actually and so then you can tune this agents so here's some agents I want to show you so this is a complex interactive scenario to be a course but you can see on the left is on the right is the aggressive guy blue is the agent red is our vehicle we're testing in simulation and so let me play one more time once the sense essentially on the on the left is the conservative driver on the right is the aggressive driver and they pass us and then use very different reactions in our vehicle so the aggressive guy went in pastas and pushed us further into that Lane and we much much later in the other case when you have a conservative driver we are in front of them and they're not bugging us and we execute with much cheerier can switch into the right lane where we want to go all right so this is agents that can test your system well now you have different scenarios in this case depending what agent you put in and I'll show you a little more scenarios so it's not just a - agent game I mean we can do things like merging from one side of the highway to the next and this type of agent can generate fairly reasonable behaviors it slow slowed down for knowing slow vehicle in front let the vehicles on this side pass you and still completes the mission and you can generate multiple futures with this agent so here's an example again on the right will be an aggressive guy right and on the left was the more conservative person the aggressive guy I found a gap between the two vehicles and just went for it right and you can test your stock this way and one more I wanted to show you is is an aggressive motorcycle driving so you can have an agent that tests you can test the reaction to motorcycle that they're weaving in the lane right so I guess what's my takeaway from this story about testing in the longtail you need the Ministry of agents at the moment right so if you think of it right and learning from demonstration is key you can encode some simple models by hand but ultimately it's much better the task of modeling agent behavior is complex and it's much better learned and so here's the space the models so you can have not learned you can just replay the log like a show then you can you can have design trajectories for agents - for this reaction do this for that reaction do that then you can have the break and swirl model that mostly there's someone in front of an agent just does it deterministic break trajectory optimization which I just showed now our mid to mid model and potentially and to end top-down model top-down meaning you have like a top view of the environment there's many other representations possible this is a very interesting space ultimately I wanted to show you there's many possible agents and they have different utility and they have different number of examples you need to train them with and so one other takeaway I wanted to tell you is smart agents are critical photon and it's scale this is something I truly believe working in the space and this line of direction is exciting and ultimately one of the exciting problems that there's still a lot of interesting progress to be made and why well you have accurate models of human behavior of drivers and pedestrians and they help achieve several things first you will do better decisions when you drive yourself you'll be able to anticipate what others will do better and that will be helpful second you can develop a robust simulation environment with those insights also very important third well our vehicle is also one more agent in the environment it's an agent we have more control than the others but a lot of this inside supply and so this is very exciting and interesting so I wanted to finish the talk just maybe as a mental exercise right when you think of a system that is tackling a complex AI challenge like self-driving what is the good properties of the system to have and how do you think a scalable system and to me there's this mental test right we want to grow and handle and you know bring our service to more and more environments more and more cities how do you scale to dozens or hundreds of cities so as we talked about the longtail each new environment can bring new challenges and they can be complex intersections and cities like Paris there's our Lombard Street in San Francisco and from there there's narrow streets in European towns there's all kinds as the long tails keep keeps coming as you keep driving your environments in Pittsburgh people drive the famous Pittsburgh left they take different precedence than usual the local customs of driving of behaving all of this needs to be accounted for as you expand and this makes the system potentially more complex or easier harder to turn to all environments right but it's important because ultimately that's the only way you can scale so how do you what should the scalable process do so in my mind you let's say have a very good sobriety system I mean this very much parallels the factory analogy I'm just going to repeat it one more time you take your vehicles we put a bunch of women cars and we drive a long time in that environment with drivers maybe 30 days maybe more at least that long and you collect all the data right and then your system should be able to improve a lot on the data have collected right so drive a bunch obviously don't wanna don't want to chain the system too much in the real world while it's driving but you want train it active you've collected in data about the environment so it needs to be trainable and collected data it's very important for a system to be able to quantify or have a notion to elicit from it whether it's incorrect or not confident right because then you can take action and this is the important property that I think people should think of when they design systems how they listed this then you can take an action you can ask questions to raters that's fairly legit typical active learning is a bit like this right so and it's usually based in some amount of low confidence or surprise that's the examples you want to to send and even better the system could potentially directly update itself and this is an interesting question how those systems update themselves in light of new knowledge and we have a system that clearly does this right and typically do it with reasoning and what is reasoning right so I have an answer it is one answer there's possibly others right but one way is you can check and enforce consistency of view beliefs and you can look for explanations of the world that are consistent and see if you have a mechanism in the system that can do this this allows the system to improve itself without necessarily being fed purely labeled data it can improve yourself from just collected data and I think it's interesting to think of systems where you can do reasoning and representations that these models need to have right and last but not least we need scalable training and testing infrastructure right this is part of the factory I was talking about I'm very lucky to a mode to have wonderful infrastructure and you know it allows this virtuous cycle to happen thank you appearance trouble thank you so much for the talk really appreciate it so if you were to train off of image and lidar data a synthetic imaging lidar data is there would you wait the synthetic data differently than real word real-world data when training your models so there's actually a lot of interesting research in the field there are people trained on simulator but also trained adaptation models that make simulator data look like real data right so you're essentially you're trying to build consistency or it leads to training on simulator scenarios but if you learn a mapping from simulator scenes to real scenes right you could potentially train on the transformed simulator data already that's transforming with other models there's many ways to do this ultimately right so achieving realism in simulator is an open research problem right I assume no there is a lot of rules that you have to put into a system to mate to be able to trust it you know and so how you find the balance between this automatic models that you don't get work when you're not quite sure what does I would do and rules were your shows it was but it's not scalable I mean through lots and lots of testing and analysis right so you keep you keep keeping track of the performance of your models and you see where they come short right and then those are the areas you most need expert computing to compliment right but the balance can change over time right and it's a natural process of evolution right so evolving your system as you go I mean generally you know the MLP growls is the capabilities in the data sets girl right so you stressed at the end of both the first half and the second half of your talk the importance of quantifying uncertainty and the predictions that your models are making so have you developed techniques for doing that with neural nets or are you using some probabilistic graphical models or something so a lot of the models and neural nets there's many ways to capture this actually I'm just going to give a general answer and not commenting on specifically what way I'll be doing I think first of all there's techniques in neural nets that can predict whether they can predict their own uncertainty fairly well right either directly regress its uncertainty for certain products or using samples of networks or dropout or techniques like this that also provide the measure of uncertainty another way of doing uncertainty is to leverage constraints in the environment so if you have temporal sequences right you don't want for example objects to appear or disappear or generally unreasonable changes and in the environment or inconsistent prediction in your models a good areas to look I'm just wondering do you guys train and deploy different models depending on where the car is driving like what city or do you train and deploy a single model that adapts to most scenarios well ideally you would have a lot of the adapts to most scenarios then you know a complement is needed yeah so first off thanks for your talk I find the simulator work really really exciting and I was wondering if you could either talk about talk more about or maybe provide some insights into simulating pedestrians because as a pedestrian myself I feel like my behaviors a lot less constrained than a vehicle right and I imagine you I mean there's an advantage in that you're sensing from a vehicle and you kind of know your sensors are like the first person from a vehicle but not from a pedestrian and that's correct I mean so if you want to simulate pedestrians far away in an environment right and you want to simulate them as very high-resolution writing you've collected log data you may not have the detailed data on that pedestrian right at the same time the subtle cues for that pedestrian matter less at that distance as well because it's not like you observed them or reacted to them in the first place so there is an interesting question at what fidelity need to simulate things right and there is levels of realism in simulation that at some level need to parallel what your models are paying attention thank you for the talk it was very interesting since you you know titled and talked about it long tail it makes me wonder is the bulk of the problem solved do you think well we're gonna have this figured out and within the next couple of years there can be self-driving cars everywhere or do you think it's closer to you know actually there could be decades before we've really worked out everything necessary what are your thoughts about the future it's a bit hard to that's a good question it's a bit hard to give this prognosis I think I mean I'm not completely sure I think one thing I would say is it will take a while for self-driving cars to roll out at scale right so this is not a technology that just determine a crank and appears everywhere right there's logistics and algorithms and all this tuning and testing needed to make sure it's really safe in the various environments so it will take some time when you were talking about prediction you mentioned looking at a context and saying if a person or if someone is looking at us we can assume that they will behave differently than if they're not paying attention to what we're doing potentially is that's something you're actively doing do you take into consideration as pedestrians or other participants in traffic are paying attention to your vehicles so I can't comment on our model designs too much but I think this is generally cues one needs to pay attention to they're very significant I mean you know even when people drive for example there's someone sitting in the vehicle next to you waving keep going right in these natural interactions in the environment that you know is something you need to think about in one of you first of all thank you it's really cool talk in one of your last slides you talked about resolving certain uncertainties by the means of establishing a set of beliefs and checking to see if they were consist ready that's my own theory by the way right but I feel that the concept of reasoning is under explored in deep learning and what it means right so if you read for sky Kahneman type 1 type 2 reasoning we're really good at the instinctive mapping type of tasks right so likely some law to meet to maybe high level perception of the point but the reasoning part with neural networks right and generally with models that's a bit less explored I think it's long term it's fruitful that's my personal opinion right I guess the question is going to ask is if you could elaborate on that concept in connection with the models you guys are working with I guess that's so they'll to give an example from current work right and there's a lot of work on weekly supervised learning sure and that's kind of been a big topic in 2018 and there were a lot of really strong papers including by Google brain and nearly angular and crew team and so on and essentially if you used to read the books about 3d reconstruction in geometry and so on alright there's a bunch of rules you can encode geometric expectations about the world so when you have video and when you have 3d outputs in your models there is certain amount of consistency one example is ego motion versus depth estimation there is a very strong constraint that if you predict the depth and you predict the gue motion correctly then you can reproject certain things and they will look good right and that's a very strong constraint that's a consistency and notice about the environment the expected this can help train your model right and so more of this type of reasoning may be interesting you mentioned expert design algorithms and I was wondering from your perspective almost from Wayne was perspective how important are those say non machine learning type algorithms or non machine learning type approaches to tackling the challenges of autonomous driving could you could you say how important is which aspect of them of expert design algorithms every now and then you just you sprinkle it in like here we can try expert design algorithms because we actually understand some parts of the problem and I was wondering like what is really important like for the challenges in autonomous driving outside of the field of machine learning I mean generally you want the problem is you want to be safe in the environment that makes that makes it such that you don't want to make errors in perception prediction and planning right and the state of machine learning is not at the point where it never makes errors provided the scope that we're currently addressing and so throughout your start with the current state of machine learning it needs to be complemented right and so we've carefully done it and I think machine learning as it improves I think they'll be less and less need to do it it's somewhat effort intensive bringing especially in an evolving system to do that to have a hybrid system but right now I think this is the main thing that keeps you able to do complex behaviors in some cases for which is very hard to collect data and you still need to handle then it's then it's the right thing to do right so the way I view a time machine learning personal like I like to doing better and better that's it we're not religious it should not be we just need to solve the problem and right now the right mix is a hybrid system is my belief we're really excited to see what Wei MO has in store for us in 19 so please give drugged-up again [Applause] you [Applause] you

Info

Channel: Lex Fridman

Views: 144,119

Rating: undefined out of 5

Keywords: self-driving cars, artificial intelligence, deep learning, machine learning, self driving cars, waymo, google, lex fridman, autonomous cars, computer vision, waymo one, deep learning mit, google self-driving car, robotaxi, driverless cars, reinforcement learning, chauffeurnet, simulation, Drago Anguelov, level 4, level 5, mit lex, self-driving car, autonomous vehicles, self-driving cars 2019, neural networks

Id: Q0nGo2-y0xY

Channel Id: undefined

Length: 65min 15sec (3915 seconds)

Published: Tue Feb 12 2019