Watch Tesla’s Self-Driving Car Learn In a Simulation! 🚘

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This channel is awesome

👍︎︎ 13 👤︎︎ u/GhostAndSkater 📅︎︎ Sep 08 2021 🗫︎ replies

I know the channel 'Two minute papers' well. If this guy is impressed by the innovation and application of new techniques then that says a lot because he is constantly following and reviewing the most interesting papers in the field.

👍︎︎ 10 👤︎︎ u/CybrQuest 📅︎︎ Sep 08 2021 🗫︎ replies

Károly Zsolnai-Fehér's 2 Minute Papers is one of the 3 Youtube Channels I receive notifications about (the other 2 are Chiken Genius Singapore and TierZoo)

👍︎︎ 3 👤︎︎ u/swissiws 📅︎︎ Sep 09 2021 🗫︎ replies

But, but, but…

Navigant is qualified to determine this, not some silly Two Minute Papers guy. /s

👍︎︎ 5 👤︎︎ u/UselessSage 📅︎︎ Sep 08 2021 🗫︎ replies
Captions
Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today we are going to see how Tesla uses no less than a simulated game world to train their self-driving cars. And more. In their AI day presentation video, they really put up a clinic of recent AI research results and how they apply them to develop self-driving cars. And of course, there is plenty of coverage of the event, but, as always, we are going to look at it from different angle. We’re doing it Papers style. Why? Because after nearly every Two Minute Papers episode where we showcase an amazing paper, I get a question saying something like “okay, but when do I get to see or use this in the real world?”. And rightfully so, that is a good question. And in this presentation, you will see that these papers that you see here get transferred into real-world products so fast, it really makes my head spin. Let’s see this effect demonstrated by looking through their system. Now, first, their cars have many cameras, no depth information, just the pixels from these cameras, and one of their goals is to create this vector space view that you see here. That is almost like a map, or a video game version of the real roads and objects around us. That is a very difficult problem. Why is that? Because the car has many cameras. Is that a problem? Yes… kind of. I’ll explain in a moment. You see, there is a bottom layer that processes the raw sensor data from the cameras mounted on the vehicle. So here, in go the raw pixels, and out comes more useful, high-level information that can be used to determine whether this clump is pixels is a car or a traffic light. Then, in the upper layers, this data can be used for more specific tasks, for instance, trying to estimate where the lanes and curbs are. So, what papers are used to accomplish this? Looking through the architecture diagrams, we see, transformer neural networks, BiFPNs, and Regnet. All papers from the last few years. For instance, RegNet is a neural network variant that is great at extracting spatio-temporal information from the raw sensor data. And that is a paper from 2020. From just one year ago. Already actively used in training self-driving cars. That is unreal. Now, we mentioned that having many cameras is a bit of a problem. Why is that? Isn’t that supposed to be a good thing? Well, look! Each of the cameras only sees parts of the truck. So how do we know where exactly it is, and how long it is? We need to know all of this information to be able to accurately put the truck into the vector space view. What we need for this is a technique that can fuse information from many cameras together intelligently. Note that this is devilishly difficult due to each of the cameras having a different calibration, location, view directions, and other properties. So who is to tell that a point here corresponds to which point in a different camera view? And this is accomplished through, yes…a transformer neural network. A paper from 2017. So, does this multi-camera technique work? Does this improve anything? Well, let’s see! Oh yes, the yellow predictions here are from the previous single-camera network, and as you see, unfortunately, things flicker in and out of existence. Why is that? It is because a passing car is leaving the view of one of the cameras, and as it enters the view of the next one, they don’t have this correspondence technique that would say where it is exactly. And, look! The blue objects show the prediction of the multi-camera network that can do that, and things aren’t perfect, but they are significantly better the single-camera network. That is great, however, we are still not taking into consideration time. Why is that important? Let’s have a look at two examples. One, if we are only looking at still images and not take into consideration how they change over time, how do we know if this car is stationary? Is it about to park somewhere? Or, is it speeding? Also, two, this car is now occluded but we saw it second ago, so we should know what it is up to. That sounds great. And what else can we do if our self-driving system has a concept of time? Much like humans do, we can make predictions. These predictions can take place both in terms of mapping what is likely to come, an intersection, a roundabout, and so on. But, perhaps even more importantly, we can also make predictions about vehicle behavior. Let’s see how that works. The green lines show how far away the next vehicle is, and how fast it is going. This green line tells us the real, true information about it. Do you see the green? No? That’s right, it is barely visible, because it is occluded by a blue line, which is the prediction of the new video network. That means that its predictions are barely off from the real velocities and distances, which is absolutely amazing. And, as you see with orange, the old network that was based on single images is off by quite a bit. So now, a single car can make a rough map of its environment wherever it drives, and they can also stitch the readings of multiple cars together into an even more accurate map. Putting this all together, these cars have a proper understanding of their environment and this makes navigation much easier. Look at those crisp, temporally stable labelings. It has very little flickering. Still, not perfect by any means, but this is remarkable progress in so little time. And we are at the point where predicting the behaviors of other vehicles and pedestrians can also lead to better decision making. But, we are still not done yet. Not even close. Look! The sad truth of driving is that unexpected things happen. For instance, this truck makes it very difficult for us to see, and the self-driving system does not have a lot of training data to deal with that. So, what is a possible solution to that? There are two solutions. One is fetching more training data. One car can submit an unexpected event and request that the entire Tesla fleet sends over if they have encountered something similar. Since there are so many of these cars on the streets, tens of thousands of similar examples can be fetched from them, and added to the training data to improve the entire fleet. That is mind blowing. One car encounters a difficult situation, and then, every car can learn from it. How cool is that? That sounds great. So what is the second solution? Not fetching more training data, but creating more training data. What, just make stuff up? Yes, that’s exactly right. And if you think that is ridiculous, and are asking how could that possibly work? Well, hold on to your papers, because it does work… you are looking at it right now! Yes, this is a photorealistic simulation that teaches self-driving cars to handle difficult corner cases better. In the real world, we can learn from things that already happened, but in a simulation, we can make anything happen. This concept really works, and is one of my favorite examples is OpenAI’s robot hand that we have showcased earlier in this series. This also learns the rotation techniques in a simulation, and it does it so well, that the software can be uploaded to a real robot hand, and it will work in real situations too. And now, the same concept for self-driving cars. Loving it. With these simulations, we can even teach these cars about cases that would otherwise be impossible or unsafe to test. For instance, in this system, the car can safely learn what it should do if it sees people and dogs running on the highway. A capable artist can also create miles and miles of these virtual locations within a day of work. This simulation technique is truly a treasure trove of data, because it can also be procedurally generated, and the moment the self-driving system makes an incorrect decision, a Tesla employee can immediately create an endless set of similar situations to teach it. Now, I don’t know if you remember, we talked about a fantastic paper a couple months ago that looked at real-world videos, then, took video footage from a game, and improved it to look like the real world. Convert video games to reality if you will. This had an interesting limitation. For instance since the AI was trained on the beautiful lush hills of Germany and Austria, it hasn’t really seen the dry hills of LA. So, what does it do with them? Look, it redrew the hills the only way it saw hills exist, which is, covered with trees. So, what does this have to do with Tesla’s self-driving cars? Well, if you have been holding on to your papers so far, now, squeeze that paper, because they went the other way around! Yes, that’s right! They take the video footage of a real, unexpected event where the self-driving system failed, use their automatic labeler used for the vector space view, and what do they make out of it? A video game version! Holy mother of papers. And, in this video game, it is suddenly much easier to teach the algorithm safely. You can also make it easier, harder, replace a car with a dog, or a pack of dogs, and make many similar examples so that the AI can learn from these “what if” situations as much as possible. So, there you go. Full tech transfer into a real AI system in just a year or two. So, yes, the papers you see here are for real. As real as it gets. And yes, the robot is not real, just a silly joke. For now. And two more things that make all this even more mind-blowing. One, remember, they don’t showcase the latest and greatest that they have. Just imagine that everything that you heard today is old news compared to the tech they have now. And two, we have only looked at just one side of what is going on, for instance, we haven’t even talked about their amazing Dojo chip. And if all this comes to fruition, we will be able to travel cheaper, more relaxed, and also, perhaps most importantly, safer. I can’t wait. I really cannot wait. What a time to be alive! Thanks for watching and for your generous support, and I'll see you next time!
Info
Channel: Two Minute Papers
Views: 417,782
Rating: undefined out of 5
Keywords: tesla ai day, tesla fsd, self-driving cars, tesla learning in simulation, tesla dojo, tesla bot, full self-driving, elon musk, tesla robot, tesla autopilot
Id: 6hkiTejoyms
Channel Id: undefined
Length: 13min 28sec (808 seconds)
Published: Wed Sep 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.