Tesla Autonomy Day 2019 - Full Self-Driving Autopilot - Complete Investor Conference Event

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi everyone I'm sorry for being late welcome to our very first analyst day for autonomy I really hope that this is something we can do a little bit more regularly now to keep you posted about the development we're doing with regards to autonomous driving about three months ago we were getting prepped up for our q4 earnings call with Elon and quite a few other executives and one of the things that I told the group is that from all the conversations that I keep having with investors on a regular basis the the biggest gap that I see with what I see inside the company and what the outside perception is is our ability of autonomous driving and it kind of makes sense because for the past couple of years we've been really talking about model three ramp and you know a lot of the debate has revolved around model three but in reality a lot of things have been happening in the background we've been working on the new full self-driving chip we've had a complete overhaul of our neural net for vision recognition etc so now that we finally started to produce our full self-driving computer we thought it's a good idea to just open the veil invite everyone in and talk about everything that we've been doing for the past two years so about three years ago we wanted to use we wanted to find the best possible chip for full autonomy and we found out that there's no chip that's been designed from ground up for neural nets so we invited my colleague Pete Bannon the VP of silicon engineering to design such chip for us he's got about 35 years of experience of building chips and designing chips about 12 of those years where for a company called PA semi which was later acquired by Apple so he worked on dozens of different architectures and designs and he was the lead designer I think for Apple iPhone 5 but just before joining Tesla and he's gonna be joined on the stage by Elon Musk thank you actually I was gonna introduce Pete but what's done so he's just the the best trip and and System Architect that that I know in the world and and it's a honor to have you and your and your team at Tesla and we'll take away just tell him up incredible work at you and her team are done thanks Elon it's a pleasure to be here this morning and a real treat really to tell you about all the work that my colleagues and I've been doing here at Tesla for the last three years I think will tell you a little bit about how the whole thing got started and then I'll introduce you to the full self-driving computer and tell you a little bit about how it works we'll dive into the chip itself and go through some of those details I'll describe how the custom neural network accelerator that we design works and then I'll show you some results and hopefully it will all still be awake by then I was hired in February of 2016 I asked Elon if he was willing to spend all the money it takes to do full custom system design and he said well are we gonna win and I said well yeah of course so he said I'm in and so that got us started we hired a bunch of people and started thinking about what a full what a custom-designed chip for full autonomy would look like we spent eighteen months doing the design and in August of 2017 we released the design for manufacturing we got it back in December it powered up and it actually worked very very well in the first try we made a few changes and released a B zero Rev in April of 2018 in July of 2018 the chip was qualified and we started full production of production quality parts in December of 2018 we had the autonomous driving stack running on the new hardware and we're able to start retrofitting employee cars and testing the hardware and software out in the real world just last March we started shipping of the new computer in the Model S and X and just earlier in April we started production in model 3 so this whole program from the hiring of the first few employees to having it in full production in all three of our cars is just a little over three years and it's probably the fastest system development program I've ever been associated with and it really speaks a lot to the advantages of having a tremendous amount of vertical integration to allow you to do concurrent engineering and speed up deployment in terms of goals we were totally focuses exclusively on Tesla requirements and that makes life a lot easier if you have one and only one customer you don't have to worry about anything else one of those goals was to keep the power under 100 watt so we could retrofit the new machine into the existing cars we also wanted a lower part cost so we could enable for redundancy for safety at the time we had a thumb in the wind I submit that it would take at least 50 trillion operations a second of neural network performance to drive a car and so we wanted to get at least that much and really as much as we possibly could batch sizes how many items you operate on at the same time so for example Google's TPU one has a batch size of 256 and you have to wait around until you have 256 things to process before you can get started we didn't want to do that so we designed our machine with a batch size of one so as soon as an image shows up we process it immediately to minimize latency which maximizes safety we needed a GPU to run some post-processing at the time we were doing quite a lot of that but we speculated that over time the amount of post-processing on the GPU would decline as the neural networks got better and better and that has actually come to pass so we took a risk by putting a fairly modest GPU in the design as you'll see and that turned out to be a good bet security is super important if you don't have a secure car you can't have a safe car so there's a lot of focus on security and then of course safety in terms of actually doing the chip design as Elon alluded earlier there was really no ground-up neural network accelerator in existence in 2016 everybody out there was adding instructions to their CPU or GPU or DSP to make it better for inference but nobody was really just doing it natively so we set out to do that ourselves and then for other components on the chip we purchased industry standard IP for CPUs and GPUs that allowed us to minimize the design time and also the risk to the program another thing that was a little unexpected when I first arrived was our ability to leverage existing teams at Tesla Tesla had wonderful power supply design teams signal integrity analysis package design system software firmware board designs and a really good system validation program that we were able to take advantage of to accelerate this program here's what it looks like over there on the right you see all the connectors for the video that comes in from out the cameras that are in the car you can see the two self-driving computers in the middle of the board and then on the left is the power supply and some control connections and sorry I really love it when a solution is boiled down to its barest elements you have video computing and power and and it's straightforward and simple here's the original hardware 2.5 enclosure that the computer went into and we've been shipping for the last two years here's the new design for the FSD computer it's basically the same and that of course is driven by the constraints of having a retrofit program for the cars I'd like to point out that this is actually a pretty small computer it fits behind the glove box between the glove box and the firewall in the car it does not take up half your trunk as I said earlier there's two fully independent computers on the board you can see them they're highlighted in blue and green to either side of the large SOC you can see the DRAM chips for that we use for storage and then below left you see the flash chips that represent the file system so these are two independent computers that boot up and run their own operating system yeah if I can add something that general principle here is that if any part of this could fail and the call will keep driving so you're gonna have cameras fail you could have power circuits fail you can have one of the Tesla pulse rifle self-driving computer chips fail car keeps driving the probability of this computer failing is substantially lower than somebody losing consciousness that's that's the key metric least an order of magnitude yep so one of the things that we additional thing we do to keep the machine going is to have redundant power supplies in the car so one one machine is running on one power supply and the other ones on the other the cameras are the same so half of the cameras run on the blue power supply of the other half run the green power supply and both chips receive all of the video and process it independently so in terms of driving the car the basic sequence is collect lots of information from the world around you not only do we have cameras we also have radar GPS maps the I M use ultrasonic sensors around the car we have wheel ticks steering angle we know what the acceleration and deceleration of the car is supposed to be all of that gets integrated together so form a plan once we have a plan the two machines exchange their independent version of the plan to make sure it's the same and assuming that we agree we then act and drive the car now once you've driven the car with some new control you have what Klaus want to validate it so we validate that what we transmitted it was what we intend to transmit to the other actuators in the car and then you can use the sensor suite to make sure that it happens so if you ask the car to accelerate or break or steer right or left you can look at the accelerometers and make sure that you are in fact doing that so there's a tremendous amount of redundancy and overlap in both our data acquisition and our data monitoring capabilities here moving on to talk about the full self-driving chip a little bit it's packaged in the 37 point five millimeter BGA with 1600 balls most of those are used for powering ground but plenty for signal as well if you take the lid off it looks like this you can see the package substrate and you can see the dye sitting in the center there if you take the dye off and flip it over it looks like this there's 13,000 C four bumps scattered across the top of the dye and then under net underneath that are 12 metal layers and if you which is obscuring all the details of the design so if you strip that off it looks like this this is a 14 nanometer FinFET CMOS process it's 260 millimeters in size which is a modest-sized iso for comparison typical cell phone chip is about a hundred millimeters square which so we're quite a bit bigger than that but a high end GPU would be more like six hundred eight hundred millimeters square so so we're sort of in the middle I would call it the sweet spot it's it's a comfortable size to build there's 250 million logic gates on there and a total of six billion transistors which even even though I work on this all the time that's mind boggling to me the chip is manufactured and tested to a ecq 100 standards which is a standard automotive criteria next I'd like to just walk around the chip and explain all the different pieces to it and I'm sort of gonna go in the order that a pixel coming in from the camera would visit all the different pieces so up there in the top left you can see the camera cellular interface we can ingest 2.5 billion pixels per second which is more than enough to cover all the sensors that we know about we have an on-chip network that distributes data from the memory system so the pixels would travel across the network to the memory controllers on the right and left edges of the chip we use industry standard LPD ddr4 memory running at four hundred four thousand two hundred and sixty six gigabits per second which gives us a peak bandwidth the sixty eight gigabytes a second which is a pretty healthy bandwidth but again this is not like ridiculous so we're sort of trying to stay in the comfortable sweet spot for cost reasons the image signal processor has a 24-bit internal pipeline that allows us to do take full advantage the HDR sensors that we have around the car it does advance tone mapping which helps to bring out details and shadows and then it has advanced noise reduction which just improves your overall quality of the images that we're using in the neural network the neural network accelerator itself there's two of them on the chip they each have 32 megabytes of SRAM to hold temporary results and minimize the amount of data that we have to transmit on and off the chip which helps reduce power each array has a 96 by 96 multiply add array with in place accumulation which allows us to to almost 10,000 multiply ads per cycle as dedicated riilu Hardware dedicated pooling hardware and the each of these delivered 306 excuse me each one delivers 36 trillion operations per second and they operate at 2 gigahertz the two of them together on a diet delivers 72 trillion operations a second so we exceeded our goal of 50 tariffs by a fair bit there's also a video encoder we encode video and use it in a very variety of places in the car including the backup camera display there's optionally a user feature for - camp and also for a clip logging data to the cloud which Stewart and Andre will talk about more later there's a GPU on the chip it's modest performance it has a support for both 32 and 16 bit floating point and then we have 12 a 72 64-bit CPUs for a general-purpose processing they operate at 2.2 gigahertz and this represents about two and a half times the performance available in the current solution there's a safety system that contains two CPUs that operate in lockstep this system is the final arbiter of whether it's safe to actually drive the actuators in the car so this is where the two plans come together and we decide whether it's safe or not to move forward and lastly there's a safety system and then basically the job of the safety system is to ensure that this chip only runs software that's been cryptographically signed by Tesla if it's not been signed by Tesla then the chip does not operate now I've told you a lot of different performance numbers and I thought it'd be helpful maybe to put it into perspective a little bit so throughout this talk I'm going to talk about a neural network from our narrow camera it uses 35 Giga 35 billion operations 35 Giga apps and if we use all 12 CPUs to process that network we could do one and a half frames per second which is super slow I'm not nearly adequate to drive the car if we use the 600 gigaflop GPU the same network we'd get 17 frames per second which is still not good enough to drive the car with a cameras the neural network accelerators on the chip can deliver 21 frames per second and you can see from the scaling as we moved along that the amount of computing in the CPU and GPU are basically insignificant to what's available in the neural network accelerator it's it's really is night and day so moving on to talk about the neural network accelerator we're just going to stop for some wire on the left there's a cartoon of a neural network just to give you an idea what's going on the data comes in at the top and visits each of the boxes and the data flows along the arrows to the different boxes the boxes are typically convolutions or D convolutions with real ooze the green boxes are pooling layers and the important thing about this is that the data produced by one box is then consumed by the next box and then you don't need it anymore you can throw it away so all of that temporary data that gets created and destroyed as you flow through the network there's no need to store that off chip and DRAM so we keep all that data in SRAM and I'll explain why that's super important in a few minutes if you look over on the right side of this you can see that in this network of the 35 billion operations almost all of them are convolution which is based on dot products the rest are deconvolution also based on dot product and then riilu and pooling which are relatively simple operations so if you were designing some hardware you'd clearly target doing dot products which are based on multiply ad and really kill that but imagine that you sped it up by a factor of 10,000 so a hundred percent all of a sudden turns into 0.1 percent 0.01 percent and suddenly the riilu and pooling operations are going to be quite significant so our hardware doesn't our hardware design includes dedicated resources for processing riilu and pooling as well now this chip is operating in a thermally constrained environment so we had to be very careful about how we burn that power we want to maximize the amount of arithmetic we can do so we picked integer add it's nine times less energy than a corresponding floating-point add and we picked 8-bit by 8-bit integer multiply which is significantly less power than any other multiply operations and is probably enough accuracy to get good results in terms of memory we chose to use SRAM as much as possible and you can see there that going off chip to DRAM is approximately a hundred times more expensive in terms of energy consumption than using local SRAM so clearly we want to use the local SRAM as much as possible in terms of control this is data that was published in a paper by Mark Horowitz at is SCC where he sort of critiqued how much power it takes to execute a single instruction on a regular integer CPU and you can see that the add operation is only 0.15 cent percent of the total power all the rest of the power is control overhead and bookkeeping so in our design reset to basically get rid of all that as much as possible because what we're really interested in is arithmetic so here's the design that we finished you can see that it's dominated by the 32 megabytes of SRAM there's big banks on the left and right and in the center bottom and then all the computing is done in the upper middle every single clock we read 256 bytes of activation data out of the SRAM array 128 bytes of weight data out of the SRAM array and we combined it in a in a 96 by 96 mole at a rate which performs 9000 multiply ads per clock at 2 gigahertz that's a total of 3.6 336 Point air at 8 Tara ops now when we're done with a dot product we unload the engine so that we shift the data out across the dedicated riilu unit optionally across a cooling unit and then finally into a write buffer where all the results get aggregated up and then we write out 128 bytes per cycle back into the SRAM and this whole thing cycles along all the time continuously so we're doing dot products while we're unloading previous results doing pooling and writing back into the memory if you add it all up to year Hertz you need one terabyte per second of SRAM bandwidth to support all that work and and so the hardware why's that so one terabyte per second a bandwidth per engine there's two on the chip two terabytes per second the chip has the accelerator has a relatively small instruction set we have a DMA read operation to bring data in from memory we have a DMA write operation to push results back out to memory we have three dot product based instructions convolution deconvolution inner product and then two relatively simple a scale is a one input one output up operation and L wise is two inputs and one output and then of course stop when you're done we had to develop a neural network compiler for this so we take the neural network that's been trained by our vision team as it would be deployed in the older cars and when you take that and compile it for use on the new accelerator the compiler does layer fusion which allows us to maximize the computing each time we read data out of the SRAM and put it back it also does some smoothing so that the demands on the memory system aren't too lumpy and then we also do channel channel padding to reduce bang conflicts and we do Bank aware SRAM allocation and this is a case where we could have put more hardware in the design to handle Bank conflicts but by pushing it into software we save Hardware in power at the cost of some software complexity we also automatically insert DMAs into the graph so that data arrives just in time for computing without having to stall the machine and then at the end we generate all the code we generate all the weight data we compress it and we add a CRC check sum for reliability to run a program all the neural network descriptions our programs are loaded into SRAM at the start and then they sit there ready to go all the time so to run a network you have to program the address of the input buffer which presumably is a new image that just arrived from a camera you set the output buffer address you set the pointer to the network weights and then you set go and then the machine goes off and will sequence through the entire neural network all by itself usually running for a million or two million cycles and then when it's done you get an interrupt and can post-process the results so moving on to results we had a goal to stay under 100 watts this is measured data from cars driving around running the full autopilot stack and we're dissipating 72 watts which is a little bit more power than the previous design but with the dramatic improvement in performance it's still a pretty good answer of that 72 watts about 15 watts is being consumed running the neural networks in terms of costs the silicon cost of this solution is about 80% of what we were paying before so we are saving money by switching to this solution and in terms of performance we took the narrow camera a neural network which I've been talking about that has 35 billion operations in it we ran it on the old hardware in a loop as quick as possible and we delivered a 110 frames per second we took the same data the same Network compile it for hardware for the new FST computer and using all four accelerators we can get 2,300 frames per second processed so a factor of 21 I think this this is perhaps the most significant slide it's night and day I've never worked on a project where the performance increase was more than three so this was pretty fun if you compare it to say in videos drive xaviar solution a single chip delivers 21 ter ops our full self-driving computer with two chips is 144 ter ops [Music] so to conclude I think we've created a design that delivers outstanding performance 144 tear ups for a neural network processing has outstanding power performance we managed to jam all of that performance into the thermal budget that we had it enables a fully redundant computing solution has a modest cost and really the important thing is that this FSD computer will enable a new level of safety and autonomy in Tesla's vehicles without impact they're cost or range something that I think we're all looking forward to yeah I think when we do your QA after each segment so if people have questions about the hardware they can ask right now the the reason I asked Pete to do just a detailed far more detailed and perhaps most people would appreciate dive into the Tesla full self-driving computer is because it at first it seems improbable how could it be that Tesla who has never designed a chip before were designed the best chip in the world but that is objectively what has occurred not not best by a small margin best by a huge margin it's in the cars right now old Tesla's being produced right now have this computer we switched over from the Nvidia solution for SMX about a month ago switched over model three about ten days ago all cars being produced have the have all the hardware necessary compute and otherwise for full self-driving I'll say that a game all Tesla cars being produced right now have everything necessary for full self-driving all you need to do is improve the software and later today you will drive the cars with the development version of the improved software and you will see for yourselves questions repeat a trip to three global equities research very very impressive in every shape and form I was wondering like I've I took some notes you are using activation function re Lu the rectify linear unit but if we think about the deep neural network it has multiple layers and some algorithms may use different activation functions for different hidden layers like soft Max or tan H do you have flexibility for incorporating different activation functions rather than Lu in your platform then I have a follow-up yes we do we have inflammation of ten inch and sigmoid for example beautiful one last question like in the nanometers you mentioned 14 nanometers as I was wondering what didn't make sense to come to the lower maybe ten nanometers two years down or maybe seven at the time we started the design not all the IP that we wanted to purchase was available in ten nanometer we had to finish the design in fourteen it's maybe worth pointing out that we finished this design like maybe one and a half two years ago and began design of the next generation we're not talking about the next generation today but we're about halfway through it that will all the things that are obvious for next generation chip we're doing oh hi you talked about the software's the piece now you did a great job I was blown away understood ten percent of what you said but I trust that it's in good hands Thanks so it feels like you got the hardware pieces done and that was really hard to do and now you have to do the software piece now maybe that's outside of your expertise how should we think about that software piece what can ask for better introduction to so Andre and Stuart I think yeah are there any fun dating questions for the chip part before the next part of the presentation is neural nets and software so maybe I'm the chip side the last slide was 144 trillions of operations per second versus was it Nvidia 21 that's right and maybe can you just contextualize that for a finance person why that's so significant that gap thank you well I mean it's a factor of seven and performance Delta so that means you can do seven times as many frames you can run neural networks that are seven times larger and more sophisticated so it's a it's a very big currency that you can spend on on lots of interesting things to make the car better I think that Savior power usage is higher than ours Xavier powers I have a comparable don't know that I believe it's like the the best my knowledge the D power requirements would increase at least to the same degree of factor of seven and and costs would also increase by a factor of seven great sure yeah I mean how power is a real problem because it also reduces range so it has the healthful power is very high and then you have to get rid of that power by the thermal problem becomes really significant because you had to go to get rid of all that power thank you very much I think we have you know a lot of quite a bit this app ask the questions so if you guys don't mind the day of running a bit long just we're gonna do that the drive demos afterwards so if you've got if you're if you if anybody needs to pop out and do drive demos a little sooner you're welcome to do that I do want to make sure we answer your questions yep Pradeep Romani from UBS Intel and AMD to some extent have started moving towards a chip lab based architecture I did not notice a chaplet based design here do you think that looking forward that would be something that might be of interest to you guys from an architecture standpoint a chip based architecture yes we're not currently considering anything like that I think that's mostly useful when you need to use different styles of technology so if you want to integrate silicon germanium or DRAM technology on the same silicon substrate that gets pretty interesting but until the die size gets obnoxious I wouldn't go there okay to be clear like the strategy here in it the started you know basically three little over three years ago where's design and build a computer that is fully optimized and aiming for full self-driving then write software that is designed to work specifically on that computer and get the most out of that computer so you have tailored to hardware that is that is a master of one trade self-driving the Nvidia is a great company but they have many customers and so when as they as they apply their resources they need to do a generalized solution we care about one thing self-driving so that it was designed to do that incredibly well the software is also designed to run on that hardware incredibly well and the combination of the software in the hardware I think is unbeatable I the chip is designed to process video input in case you use let's say lidar would it be able to process that as well or or is that is it primarily for video I explained to you today is that lidar is is a fool's errand and anyone luck relying on lidar is doomed doomed expensive expensive sensors that on are unnecessary it's like having a whole bunch of a expensive pen just fantasies like an pet one appendix is bad well now don't put a whole bunch of them that's ridiculous you'll see so just two questions on just on the power consumption is there way to maybe give us like a rule of thumb on you know every watt is reduces range by certain percent or certain amount just so we can get a sense of how much a Model 3 the the target consumption is 250 watts per mile it depends on the nature of the driving as to how many miles that effect in city it would have a much bigger effect than on highway so you know if you're driving for an hour in a city and you had a solution hypothetically that you know was it was it was a kilowatt you'd lose four miles on a model three so if you're only going say 12 miles an hour then then that's like that would it be a 25 cent impact in range in city it's a basically powers up the power that the the power of the system has a massive impact on city range which is where we think of most most of the Robo taxi market will be its own power is extremely important I'm sorry thank you what's the primary design objective of the next-generation ship we don't want to talk too much about the next-generation ship but it's it'll be at least let's say three times better than the current system has about two years away is is the chip being main you you don't mean you facture the chip you contract that out and how much cost reduction does that save in the overall vehicle cost but the 20% cost reduction I cited was the the piece cost per vehicle reduction not that wasn't a development cost I was just the action yeah I'm saying but like if I'm manufacturing these in mass is this saving money in doing it yourself yes a little bit I mean most chips are made for most people don't make chips with their art valve it's very unusual I think you don't see any supply issues without getting the chip mass-produce the cost saving pays for the development I mean the basic strategy going to Elon was we're gonna build this chip it's gonna reduce the cost an Elon said hmm times a million cars a year deal that's correct yes sorry if they're really chip specific questions we can answer them others there will be a Q&A opportunity after after Andre talks and and after Stuart talks so there will be two other Q&A opportunities this is very tough specific then also I'll be here all afternoon yeah and exactly if people be here at the end as well so okay and your REO Thanks um that died photo you had there's the neural processor takes up quite a bit of the dye I'm curious is that your own design or there's some external IP there yes that was the custom design for by Tesla okay and then I guess the follow-on would be there's probably a fair amount of opportunity to reduce that footprint as you tweaked the design it's actually quite dense so in terms of reducing it I don't think so it'll will greatly enhance their functional capabilities in the next generation okay and then last question can you share where your your vaping at this part what what where are we found yet Oh is Samsung yes Texas Thank You Graham Tanaka Tanaka Apple I'm just curious how defensible your chip technologies and design is from it from a IP point of view and hoping that you won't won't be offering a lot of the IP the outside for free Thanks we have files on the order of a dozen patents on this technology fundamentally it's linear algebra which I don't think you can patent ah I'm not sure I I think if somebody started today and they were really good they might have something like what we have right now in three years at but in two years we'll have sometimes something three times better talking about the intellectual property protection you have the best intellectual property and some people just steal it for the fun of it I was wondering if we look at a few interactions with Aurora that companies to industry believe they stole your intellectual property I think the key ingredient that you need to protect is the weights that associate to various parameters do you think your chip can do something to prevent anybody maybe encrypt all the weights so that even you don't know what the weights are at the chip level so that your intellectual property remains inside it and nobody knows about it and nobody can just feel it man I'd like to meet the person that could do that because they were I would hire them in a heartbeat yeah it's a real hard problem yeah joining it I mean we do encrypt the the it's it's a hard trip to crack so if they can crack it's very good so give any crack it and then also also figure out the software and the neural net system and everything else they can design it from scratch like that's that's all it's our intention to prevent people from stealing all that stuff I mean if they do we hope it at least takes a long time it will definitely take them a long time yeah I mean I felt like if we were if it was our goal to do that how would we do it be very difficult but the thing that's I think a very powerful sustainable advantage for us is the fleet nobody has the fleet those weights are constantly being updated and improved based on billions of miles driven Tesla has a hundred times more cars with the full self-driving Hardware than everyone else combined you know we we have my the end of this quarter will have five hundred thousand cars worth the full eight camera setup twelve ultrasonics someone will still be on hardware two but we still have the data gathering ability and then by a year from now we'll have over a million cars with full self-driving computer hardware everything yeah should we have fun it's just a massive data advantage it's similar to like you know how like the Google search engine has a massive advantage because people use it and people the people are programming effectively program Google with the queries and the results yeah just pressing on that and please reframe the questions I'm tackling and if it's appropriate but you know when we talked to weigh Moe or Nvidia they do speak with a equivalent conviction about their leadership because of their competence in simulating miles driven can you talk about the advantage of having real-world miles versus simulated miles because I think they express that you know by the time you get a million miles they can simulate a billion and no Formula One racecar driver for example could ever successfully complete a real-world track without driving in a simulator can you talk about the advantages it sounds like the that you perceived to have associated with having data ingestion coming from real-world miles versus simulated miles absolutely the simulator we have a quite a good simulation too but it's just it does not capture the long tail of weird things that happen in the real world if the simulation fully captured the real world well I mean that would be proof that we're living in a simulation I think yeah it doesn't I wish but it simulations do not capture the real world they don't the real world is really weird and messy you need the you need the priority cars in the road and we actually get it get into that too in ondrea's to his presentation yeah so okay when we move on to 200 great thank you [Applause] the last question was actually a very good Segway because one thing to remember about our F is the computer is that it can run much more complex neural nets for much more precise image recognition and to talk to you about how we actually get that image data and how we analyze them we have our senior director of AI Andre Potty who's gonna explain all of that to you Andre has a PhD from Stanford University where he studied computer science focusing on oxidation recognition and deep learning Andre why don't you just talk do your own intro if there's a lot of PhD from Stanford that's not important yes okay very care come on Thank You Andre started the computer vision class at Stanford that's much more significant that's what matters just a said if you please talk about your background in I work a way that is not bashful just tell me talk really tell what about the SEC redun yeah I mean sure yeah so yeah I think I've been training neural networks basically for what is now a decade and these neural networks were not actually really used in the industry until maybe five or six years ago so it's been some time that I've been trained these neural networks and that included you know institutions at Stanford at at open e I at Google and really just training a lot of neural networks not just for images but also for natural language and designing architectures that coupled those two modalities for for my PhD so every computer computer science class oh yeah and at Stanford actually taught the convolutional neural Norks class and so I was the primary instructor for that class I actually started the course and designed the entire curriculum so in the beginning it was about 150 students and then it grew to 700 students over the next two or three years so it's a very popular class it's one of the largest classes at Stanford right now so that was also really a successful I mean Andre is like really one of the best computer vision people in the world arguably the best okay thank you yeah so hello everyone so Pete told you all about the chip that we've designed that runs neural networks in the car my team is responsible for training of the these neural networks and that includes all of data collection from the fleet neural network training and then some of the deployment to that so what do then you know that works do exactly in the car so what we are seeing here is a stream of videos from across the vehicle across the car these are eight cameras that send us videos and then these neural networks are looking at those videos and are processing them and making predictions about what they're seeing and so the some of the things that we're interested in there's some of the things you're seeing on this visualization here our lane line markings other objects the distances to those objects what we call drivable space shown in blue which is where the car is allowed to go and a lot of other predictions like traffic lights traffic signs and so on now for my talk I will talk roughly into in three stages so first I'm going to give you a short primer on neural networks and how they work and how they're trained and I need to do this because I need to explain in the second part why it is such a big deal that we have the fleet and why it's so important and why it's a key enabling factor to really training this neural networks and making them work effectively on the roads and in the first stage I'll talk about a vision and lidar and how we can estimate depth just from vision alone so the core problem that these networks are solving in the car is that a visual recognition so four unites these are very this is a very simple problem you can look at all of these four images and you can see that they contain a cello about an iguana or scissors so this is very simple and effortless for us this is not the case for computers and the reason for that is that these images are to a computer really just a massive grid of pixels and at each pixel you have the brightness value at that point and so instead of just seeing an image a computer really gets a million numbers in a grid that tell you the brightness values at all the positions it makes arrows if you will it really is the matrix yeah and so we have to go from that grid of pixels and brightness values into high level concepts like iguana and so on and as you might imagine this iguana has a certain pattern of brightness values but iguanas actually can take on many appearances so they can be in many different appearances different poses and different brightness conditions against the different backgrounds you can have a different crops of that iguana and so we have to be robust across all those conditions and we have to understand that all those different brightness palette patterns actually correspond to a goannas now the reason you and I are very good at this is because we have a massive neural network inside our heads there's processing those images so light hits the retina travels to the back of your brain to the visual cortex and the visual cortex consists of many neurons that are wired together and that are doing all the pattern recognition on top of those images and really over the last I would say about five years the state-of-the-art approaches to processing images using computers have also started to use neural networks but in this case artificial neural networks but these artificial neural networks and this is just a cartoon diagram of it are a very rough mathematical approximation to your visual cortex we'll really do have neurons and they are connected together and here I'm only showing three or four neurons in three or four in four layers but a typical neural network will have tens to hundreds of millions of neurons and each neuron will have a thousand connections so these are really large pieces of almost simulated tissue and then what we can do is we can take those neural networks and we can show them images so for example I can feed my iguana into this neural network and the network will make predictions about what it's seen now in the beginning these neural networks are initialized completely randomly so the connection strengths between all those different neurons are completely random and therefore the predictions of that network are also going to be completely random so it might think that you're actually looking at a boat right now and it's very unlikely that this is actually an iguana and during the training during a training process really what we're doing is we know that that's actually in iguana we have a label so what we're doing is we're basically saying we'd like the probability of iguana to be larger for this image and the probability of all the other things to go down and then there's a mathematical process called back propagation a stochastic gradient descent that allows us to back propagate that signal through those connections and update every one of those connections and update every one of those connections just a little amount and once the update is complete the probability of iguana for this image will go up a little bit so it might become 14% and the property of the other things will go down and of course we don't just do this for this single image we actually have entire large data sets that are labeled so we have lots of images typically you might have millions of images thousands of labels or something like that and you are doing forward backward passes over and over again so you're showing the computer here's an image it hasn't and then you're saying this is the correct answer and it Tunes itself a little bit you repeat this millions of times and you sometimes you show images the same image to the computer you know hundreds of times as well so the network training typically will take on the order a few hours or a few days depending on how big of a network you're training and that's the process of training a neural network now there's something very unintuitive about the way neural networks work that I have to really get into and that is that they really do require a lot of these examples and they really do start from scratch they know nothing and it's really hard to wrap your head around it around this so as an example here's a cute dog and you probably may not know the breed of this dog but the correct answer is that this is a Japanese spaniel now all of us are looking at this and we're seeing Japanese spaniel more like okay I got it I understand kind of what this Japanese spaniel looks like and if I show you a few more images of other dogs you can probably pick out other Japanese spaniels here so in particular those three look like a Japanese spaniel and the other ones do not so you can do this very quickly and you need one example but computers do not work like this they actually need a ton of data of Japanese spaniels so this is a grid of Japanese spaniels showing them you need a source of examples showing them in different poses different brightness conditions different backgrounds different crops you really need to teach the computer from all the different angles what this Japanese spaniel looks like and it really requires all that data to get that to work otherwise the computer can't pick up on that pattern automatically so with us all this imply about the setting of self-driving of course we don't care about dog breeds too much maybe we will at some point but for now we really care about Ling line markings objects where they are where we can drive and so on so the way we do this is we don't have labels like iguana for images but we do have images from the fleet like this and we're interested in for example England markings so we a human typically goes into an image and using a mouse annotates the lane line markings so here's an example of an annotation that a human could create a label for this image and it's saying that that's what you should be seeing in this image these are the lane line markings and then what we can do is we can go to the fleet and we can ask for more images from the fleet and if you ask the fleet if you just do an Evo of this and you just ask for images at random the fleet might respond with images like this typically going forward on some highway this is what you might just get like a random collection like this and we would annotate all that data if you're not careful and you only annotate a random distribution of this data your network will kind of pick up on this this random distribution on data and work only in that regime so if you show it a slightly different example for example here is an image that actually the road road is curving and it is a bit of a more residential neighborhood then if you show the neural network this image that network might make a prediction that is incorrect it might say that okay well I've seen lots of times on highways lanes just go forward so here's a possible prediction and of course this is very incorrect but the neural network really can't be blamed it does not know that the Train on the the tree on the left whether or not that matters or not it does not know if the car on the right matters for not towards the lane line it does not know that the that the buildings in the background matter or not it really starts completely from scratch and you and I know that the truth is that none of those things matter what actually matters is that there are a few white lane line markings over there and in a vanishing point and the fact that they curl a little bit should pull the prediction except there's no mechanism by which we can just tell the neural network hey those Lin line markings actually matter the only tool in the toolbox that we have is labelled data so what we do is we need to take images like this when the network fails and we need to label them correctly so in this case we will turn the lane to the right and then we need to feed lots of images of this to the neural net and neural that over time will accumulate will basically pick up on this pattern that those things there don't matter but those leg line markings do and we learn to predict the correct lane so what's really critical is not just the scale of the data set we don't just want millions of images we actually need to do a really good job of covering the possible space of things that the car might encounter on the roads so we need to teach the computer how to handle scenarios where it's light and wet you have all these different specular reflections and as you might imagine the brightness patterns in these images will look very different we have to teach a computer how to deal with shadows how to deal with Forks in the road how to deal with large objects that might be taking up most of that image how to deal with tunnels or how to do with construction sites and in all these cases there's no again explicit mechanism to tell the network what to do we only have massive amounts of data we want to source all those images and we want to annotate the correct lines and the network will pick up on the patterns of those now large and varied datasets basically make these networks work very well this is not just a finding for us here at Tesla this is a ubiquitous finding across the into our industry so experiments and research from Google from facebook from Baidu from alphabets deepmind all show similar plots where neural networks really love data and love scale and variety as you add more data these neural networks start to work better and get higher accuracies for free so more data just makes them work better now a number of companies have number of people have kind of pointed out that potentially we could use simulation to actually achieve the scale of the data sets and we're in charge of a lot of the conditions here maybe you can achieve some variety in a simulator now at Tesla and that was also kind of brought up into question questions just before this now at Tesla this is actually a screenshot of our own simulator we use simulation extensively we use it to develop and evaluate the software we've also even used it for training quite successfully so but really when it comes to training data for neural networks there really is no substitute for real data the simulator simulations have a lot of trouble with modeling appearance physics and the behaviour of all the agents around you so there are some examples to really try that point across the real world really throws a lot of crazy stuff at you so in this case for example we have very complicated environments with snow with trees with wind we have various visual artifacts that are hard to simulate potentially we have complicated construction sites bushes and plastic bags that can go in that can kind of go around with the wind a complicated construction sites that might feature lots of people kids animals all mixed in and simulating how those things interact and flow through this construction zone might actually be completing completely intractable it's not about the movement of any one pedestrian in there it's about how they respond to each other and how those cars respond to each other and how they respond to you driving in that setting and all of those are actually really tricky to simulate it's almost like you have to solve the self-driving problem to just simulate other cars in your simulation so it's really complicated so we have dogs exotic animals and in some cases it's not even that you can't simulate it is that you can't even come up with yeah so for example I didn't know that you can have truck on truck on truck like that but in the real world you find this and you find lots of other things that are very hard to really even come up with so really the variety that I'm seeing in the data coming from the fleet is just crazy with respect to what we have in a simulator we have a really good simulator it's everything like simulation you're fundamentally a grain you're grading your own homework so you you know you if you know that you're gonna simulate it okay you can definitely solve for it but as laundry is saying you don't know what you don't know the world is very weird and has millions of corner cases and if you somebody can produce a self-driving simulation that accurately matches reality that in itself would be in a monumental achievement of human capability they can't there's no way yeah so I think the three points are I really try to drive home until now are to get neural networks to work well you require these three essentials you require a large data set a very data set and a real data set and if you have those capabilities you can actually train your networks and make them work very well and so why is Tesla is such a unique and interesting position to really get all these three essentials right and the answer to that of course is the fleet we can really source data from it and make our neural network systems work extremely well so let me take you through a concrete example of for example making the object detector work better to give you a sense of how we develop these in all that works how we iterate on them and how we actually get them to work overtime so object detection is something we care a lot about we'd like to put bounding boxes around say the cars and the objects here because we need to track them and we need to understand how they might move around so again we might ask human annotators to give us some annotations for these and humans might go in and might tell you that ok those patterns over there are cars and bicycles and so on and you can train your neural network on this but if you're not careful the neural network hole will make miss predictions in some cases so as an example if we stumble by a car like this that has a bike on the back of it then the neural network actually went when I joined would actually create two deductions it would create a car deduction and a bicycle deduction and that's actually kind of correct I guess both of those objects actually exist but for the purposes of the controller and a planner downstream you really don't want to deal with the fact that this bicycle can go with the car the truth is that that bike is attached to that car so in terms of like just objects on the road there's a single object a single car and so what you'd like to do now is you'd like to just potentially annotate lots of those images as this is just a single car so the process that we that we go through internally in the team is that we take this image or a few images that show this pattern and we have a mechanism a machine learning mechanism by which we can ask the fleet to source us examples that look like that and the fleet might respond with images that contains those patterns so as an example these six images might come from the fleet they all contain bikes on backs of cars and we would go in and we would annotate all those as just a single car and then the performance of that detector actually improves and the network internally understands that hey when the bike is just attached to the car that's actually just a single car and it can learn that given enough examples and that's how we've sort of fixed that problem I will mention that I talked quite a bit about sourcing data from the fleet I just want to make a quick point that we've designed this from the beginning with privacy in mind and all the data that we used for training is anonymized now the fleet doesn't just respond with bicycles on backs of cars we look for all the thing we look for lots of things all the time so for example we look for boats and the fleet can respond with boats we look for construction sites and the fleet can send us lots of construction sites from across the world we look for even slightly more rare cases so for example finding debris on the road is pretty important to us so these are examples of images that have streamed to us from the fleet that show tires cones plastic bags and things like that if we can source these at scale we can annotate them correctly and the neural network will learn how to deal with them in the world here's another example animals of course also a very rare occurrence an event but we want the neural network to really understand what's going on here that these are animals and we want to deal with that correctly so to summarize the process by which we iterate on neural network predictions looks something like this we start with a seed data set that was potentially sourced at random we annotate that data set and then we train your networks on that data set and put that in the car and then we have mechanisms by which we notice inaccuracies in the car when this detector may be misbehaving so for example if we detect that the neural at work might be uncertain or if we detect that or if there's a driver intervention or any of those settings we can create this trigger infrastructure that sends us data of those inaccuracies and so for example if we don't perform very well on lane line detection on tunnels then we can notice that there's a problem in tunnels that image would enter our unit tests so we can verify that we've actually fixing the problem over time but now what you do is to fix this inaccuracy you need to source many more examples that look like that so we asked the fleet to please send us many more tunnels and then we label all those tunnels correctly we incorporate that into the training set and we retrain the network redeploy and iterate the cycle over and over again and so we refer to this iterative process by which we improve these predictions as the data engine so iteratively deploying something potentially in shadow mode sourcing inaccuracies and incorporating the training set over and over again and we do this basically for all the predictions of these neural networks now so far I've talked about a lot of explicit labeling so like I mentioned we asked people to annotate data this is an expensive process in time and also we respect oh yeah it's just an expensive process and so these annotations of course can be very expensive to achieve so what I want to talk about also is really to utilize the power of the fleet you don't want to go through this human annotation bottleneck you want to just stream in data and automate it automatically and we have multiple mechanisms by which we can do this so as one example of a project that we recently worked on is the detection of currents so you're driving down the highway someone is on the left or on the right and they cut in in front of you into your lane so here's a video showing the autopilot detecting that this car is intruding into our lane now of course we'd like to detect a current as fast as possible so the way we approach this problem is we don't write explicit a code for is the left blinker on is the right blinker on track the keyboard over time and see if it's moving horizontally we actually use a fleet learning approach so the way this works is we ask the fleet to please send us data whenever they see a car transition from a right lane to the center lane or from left to Center and then what we do is we rewind time backwards and we automatically can annotate that hey that car will turn will in 1.3 seconds cut in in front of the on preview and then we can use that for training than your lat and so the neural net will automatically pick up on a lot of these patterns so for example the cars are typically Yod then moving this way maybe the blinker is on all that stuff happens internally inside the neural net just from these examples so we ask the fleet to automatically send us all this data we can get half a million or so images and all of these would be annotated for currents and then we train the network and then we took this cut in network and we deployed it to the fleet but we don't turn it on yet we run it in shadow mode and in shadow mode the network is always making predictions hey I think this vehicle is going to cut in from the way it looks this vehicle is going to cut in and then we look for mispredictions so as an example this is an clip that we had from shadow mode of the cutting Network and it's kind of hard to see but the network thought that the vehicle right ahead of us and on the right is going to cut in and you can sort of see that it's it's slightly flirting where the lane line is trying to it's sort of encroaching a little bit and the network got excited and they thought that that was going to be cut in that vehicle will actually end up in our center lane that turns out to be incorrect and the vehicle did not actually do that so what we do now is we just turn the data engine we source that ran in the shadow mode is making predictions it makes some false positives and there are some false negative detections so we got overexcited and sometimes and sometimes we missed a current when it actually happened all those create a trigger that streams to us and that gets incorporated now for free there's no humans harmed in the process of labeling this data incorporated for free into our training set we retrain the network and redeploy the shadow mode and so we can spin this a few times and we always look at the false positives and negatives coming from the fleet and once we're happy with the false positive false negative ratio we actually flip the bit and actually let the car control to that network and so you may have noticed we actually shipped one of our first versions of a copy detector architecture approximately I think three months ago so if you've noticed that the car is much better at detecting cutters that's fleet learning operating at scale yes it actually works quite nicely so that's plate learning no humans were harmed in the process it's just a lot of neural network training based on data and a lot of and looking at those results another very sensually like everyone's training the network all the time is what it amounts to whether that whether order to order pilots on or off the network is being trained every mile that's driven for the car that's harder to or above is training the network yeah another interesting way that we use this in a scheme of fleet learning at the other project that I'll talk about is a prediction so while you are driving a car what you're actually doing is you are entertaining the data because you are steering the wheel you're telling us how to traverse different environments so what we're looking at here is a some person in the fleet who took a left through an intersection and what we do here is we we have the full video of all the cameras and we know that the path that this person took because of the GPS the inertial measurement unit the wheel angle the wheel ticks so we put all that together and we understand the path that this person took through this environment and then of course this this we can use this for supervision for the network so we just source a lot of this from the fleet we train a neural network on the on those trajectories and then the neural predicts paths just from that data so really what this is referred to typically is called imitation learning we're taking human trajectories from the real world I'm just trying to imitate how people drive in real worlds and we can also apply the same data engine crank to all of this and make this work over time so here's an example of path prediction going through a kind of a complicated environment so what you're seeing here is a video and we are overlaying the predictions of the network so this is a path that the network would follow in green and some yeah I mean the crazy thing is the network is predicting paths it can't even see with incredibly high accuracy they can't see around the corner but it would but it's saying the probability of that curve is extremely high so that's the path and it nails it you will see that in the cars today but we're gonna turn on augmented vision so you can see the the lane lines and the path predictions of the cars overlaid on the video yeah there's actually more going on under to hood that you can even tell me it's kind of scary you know yeah of course there's a lot of details I'm skipping over you might not want to annotate all the drivers you might annotate just you might want to just imitate the better drivers and there's many technical ways that we actually slice and dice that data but the interesting thing here is that this prediction is actually a 3d prediction that we project back to the image here so the path here forward is a three dimensional thing that we're just rendering in 2d but we know about the slope of the ground from all this and that's actually extremely valuable for driving so path prediction actually is live in a fleet today by the way so if you're driving clover Leafs if you're in a clover leaf on the highway until maybe five months ago or so your car would not be able to do clover leaf but now it can that's a path prediction running live on your cars we've shipped this a while ago and today you are going to get to experience this for traversing intersections a large component of how we go through intersections and your drives today is all sourced from a prediction from automatic labels so I talked about so far is really the three key components of how we iterate on the predictions of the network and how we make it work over time you require large varied and real data set we can really achieve that here at Tesla and we do that through the scale to fleet the data engine shipping things in shadow mode iterating that cycle and potentially even using fleet learning where no human annotators are harmed in the process and just using data automatically and we can really do that at scale so in the next section of my talk I'm going to especially talk about depth perception using vision only so you might be familiar that there are at least two sensors in the car one is vision cameras just getting pixels and the other is lidar that a lot of a lot of companies also use and lidar gives you these point measurements of distance around you now one one thing I'd like to point out first of all is you all came here you drove here many of you and you used your your neural net and vision you were not shooting lasers out of your eyes and you still ended up here we might as well so clearly the human neural net derives distance and all the measurements and a 3d understanding of the world just from vision it actually uses multiple keys to do so I'll just briefly go over some of them just to give you a sense of roughly what's going on and inside as an example we have two eye pointed out so you get two independent measurements at every single time step of the roll ahead of you and your brain stitches this information together to arrive at some depth estimation because you can triangulate any points across those two viewpoints a lot of animals instead have eyes that are positioned on the sides so they have very little overlap in their visual fields so they will typically use structure from motion and the idea is that they Bob their heads and because of the movement they actually get multiple observations of the world and you can triangulate again depths and even with one eye closed and completely motionless you can still have some sense of depth perception if you did this I don't think you would notice me coming two meters towards you or a hundred miles back and that's because there are a lot of very strong monocular cues that your brain also takes into account this is an example of a pretty common visual illusion where you have you know these two blue bars are identical but your brain the way it stitches up the scene is it just expects one of them to be larger than the other because of the vanishing lines of this image so your brain does a lot of this automatically and and neural Nets artificial neural knots can as well so let me give you three examples of how you can arrive at depth perception from vision alone a classical approach and two that rely on your own networks so here's a video going down I think this is San Francisco of a Tesla so this is these are our cameras are sensing and we're looking at all I'm only showing the main camera but all the cameras are turned on the eight cameras of the autopilot and if you just have this six second clip what you can do is you can stitch up this environment in 3d using multi-view stereo techniques so this oops this is supposed to be a video is it not a video although I know it's oh there we go so this is the 3d reconstruction of those six seconds of that car driving through that path and you can see that this information is purely is it's very well recoverable from just videos and roughly that's through process of triangulation and as I mentioned multi-view Syria and we've applied similar techniques on it slightly more sparse and approximate also in the car so it's remarkable all that information is really there in the sensor and just a matter of extracting it the other project that I want to briefly talk about is as I mentioned there's nothing about neural network Neos are very powerful visual recognition engines and if you want them to predict depth then you need to for example look for labels of depth and then they can actually do that extremely well so there's nothing limiting networks from predicting this monocular depth except for label data so one example project that we've actually looked at internally is we use the forward-facing radar which is shown in blue and that radar is looking out and measuring depths of objects and we use that radar to annotate the what vision is seen the bounding boxes that come out of the neural networks so instead of human annotators telling you okay this this car and this bounding box is roughly 25 meters away you can annotate that data much better using sensors so use sensor annotation so as an example radar is quite good at that distance you can annotate that and then you can train your let work on it and if you just have enough data of it this neural network is very good at predicting those patterns so here's an example of predictions of that so in circles I'm showing radar objects and in and the keyboards that are coming out of here are purely from vision so the keyboards here are just coming out of vision and the depth of those cuboids is learned by a sensor annotation from the radar so if this is working very well then you would see that the circles in the top-down view would agree with the cuboids and they do and that's because neural networks are very competent at predicting depths they can learn the different sizes of vehicles internally and they know how big those vehicles are and you can actually derive depth from that quite accurately the last mechanism I will talk about very briefly is a slightly more fancy and gets a bit more technical but it is a mechanism that has recently there's a few papers basically over the last year or two on this approach it's called self supervision so what you do in a lot of these papers is you only feed raw videos into neural networks with no labels whatsoever and you can still learn you can still get neural networks to learn depth and it's a little bit technical so I can't go into the full details but the idea is that the neural network predicts depth at every single frame of that video and then there are no explicit targets that the neural network is supposed to regress to with the labels but instead the objective for the network is to be consistent over time so whatever depth you predict should be consistent over the duration of that video and the only way to be consistent is to be right as the neural network automatically predicts the correct depth for all the pixels and we've reproduced some of these results internally so this also works quite well so in summary people drive with vision only no no lasers are involved that seems to work quite well the point that I'd like to make is that visual recognition and very powerful result nishan is is absolutely necessary for autonomy it's not a nice to have like we must have neural networks that actually really understand the environment around you and and lidar points are much less information rich environment so vision really understands the full details just a few points around are much there's much less information in those so as an example on the left here is that a plastic bag or is that a tire the lidar might just give you a few points on that but vision can tell you which one of those two is true and that impacts your control is that person who is slightly looking backwards are they trying to merge into your lane on the bike or are they just or are they just going forward in the construction sites what do those signs say how should I behave in this world the entire infrastructure that we have built up for roads is all designed for human visual consumption so all the size all the traffic lights everything is designed for vision and so that's where all that information is and so you need that ability is that person distracted and on their phone are they going to work walk into your lane those answers to all these questions are only found in vision and are necessary for level 4 level 5 autonomy and that is the capability that we are developing at Tesla and through this is done through combination of large scale neural hour training through data engine and getting that to work over time and using the power of the fleet and so in this sense lidar is really a shortcut it's sidesteps the fundamental problems the important problem of visual recognition that is necessary for autonomy and so it gives a false sense of progress and is ultimately ultimately a crutch it does give like really fast demos so if I was to summarize the entire my entire talk in one slide it would be this every all of autonomy because you want level 4 level 5 systems that can handle all the possible situations and in 99% of the cases and chasing some of the last few nights is going to be very tricky and very difficult and is going to require a very powerful visual system so I'm showing you some images of what you might encounter in any one slice of that line so in the beginning you just have very simple cars going forward then those cars start to look a little bit funny then maybe you have black stone cars then maybe have cars and cars they maybe you start to get into really rare events like cars turned over or even cars airborne we see a lot of things coming from the fleet and we see them at some rate like really good rate compared to all of our competitors and so the rate of progress at which you can actually address these problems iterate on the software and really feed the neural hours with the right data that rate of progress is really just proportional to how often you encounter these situations in the wild and we encountered them significantly more frequently than anywhere else which is why we're going to do extremely well thank you [Applause] good it's all super impressive thank you so much how much data how many pictures are you collecting on average from each car per period of time and then it sounds like the new hardware with the dual dual active active computers gives you some really interesting opportunities to run in full simulation one copy of the neural net while you're running the other one learning the other one drive the car and compare the results to do quality assurance and then I was also wondering if there are other opportunities to use the computers for training when they're parked in the garage for the 90 percent of the time that I'm not driving my Tesla around thank you very much yep so for the first question how much data do we get from the fleet so it's really important to point out it's not just a scale of the data set it really is the variety of that data set that matters if you just have lots of images of something going forward on the highway at some point the knurl it just gets it you don't need that data so we are really strategic and how we pick and choose and the trigger infrastructure that we've built up is is quite sophisticated analysis to get just the data that we need right now and so it's not a massive amount of data is just very well picked data for the second question with respect to redundancy absolutely you can run basically the copy of the network on both and that is actually how it's designed to achieve a little 405 system that is redundant so that's absolutely the case and your last question I'm sorry I did not training the the the car is an inference optimized computer we do have a major program at Tesla which we don't have enough time to talk about today called dojo that's a super powerful training computer the go of dojo will be to be able to take in vast amounts of data and train at a video level and do unsupervised massive training of vast amounts of video with the dojo program dojo computer but that's for another day test pilot in a way because I drive the 405 10 and all these really tricky really long tail things happen every day but the one challenge that I'm curious to how you're gonna solve is changing lanes because whenever I try to get into a lane with traffic everybody cuts you off and so human behavior is very irrational when you're driving in LA and the car just wants to do it safely and you almost have to do it unsafely so I was wondering how you're gonna solve that problem yeah so one thing I will point out is I spoke about the data engine as iterating on neural networks but we do the exact same thing on level of software and all the hyper parameters that go into the choices of when we actually link change how aggressive we are we're always changing those potentially random in shadow mode and seeing how well they work and so to tune our heuristics around when it's ok to lane change we would also potentially utilize the data engine and a shadow mode and so on ultimately actually designing all the different heuristics for when it's ok to lane change is actually a little bit intractable I think in the general case and so ideally you actually want to use fleet learning to to guide those decisions so when do humans lane change in what scenarios and when do they feel it's not safe to lane change and let's just look at a lot of the data and train machine learning classifiers for distinguishing when it is too safe to do so and those machine learning classifiers can can write much better code than humans because they have the maximum data backing that so they can really tune all the right thresholds and agree with humans and do something safe well we will probably have a mode that goes beyond Mad Max mode to LA traffic mode yeah well you know Mad Max would have a hard time in LA traffic I think yeah so it really took trade off like you don't want to create unsafe situations but you want to be assertive but that little dance of how you make that work as a human is actually very complicated it's very hard to write in code but I think we really do you it really does seem like machine learning approaches kind of like the right way to go about it where we just look at a lot of ways that people do this and try to imitate that we're just being like more conservative right now and then as we gain higher higher confidence will allow users to select a more aggressive mode that'll be up to the user but in in the more aggressive modes in trying to merge in traffic there is a slight I mean you know very no matter how many new there's a slight chance of like a fender bender or not a serious accident but you basically will have a choice if do you want to have a nonzero chance of a bavander bender on freeway traffic which is unfortunately the only way to navigate LA traffic yes yeah yes yes yeah I mean yes yes and it was right good like la story there's great movie it's all because this is a game of chicken that's going on yeah it will offer more aggressive options overtime that they will be user-specified yes Mad Max Plus exactly oh yes hello hi Jed or Hummer from Canaccord Genuity thank you and congratulations on everything that you've developed when we look at the alpha 0 project it was a very defined and limited variable in terms of the parameters on that which allowed for the learning curve to be so quick the risk or want what you're trying to do here is almost developed consciousness in the cars through the neural network and so I guess the challenge is how do you not create a circular reference in terms of the pulling from the centralized model of the fleet to that handoff where the car has enough information where is that line I guess in terms of the the point of the learning process to handing it off where there's enough information in the car and not having to pull from the the the fleet well look at the car can operate if it's completely disconnected from the fleet it just it just the it uploads that the training that's you know better and better as the free feed gets better and better so simply if just connected from the fleet from that point onwards it would stop getting better but it was so functional fine in the heart the previous version and talked about a lot of the power benefits of not storing a lot of the images and so in this portion you're talking about the learning that's going on by pulling from the fleet I guess I'm having a hard time reconciling how if there was a situation where I'm driving up the hill as you showed and I'm predicting where the road is going to go that's coming from all of the other fleet variables that led to that that intelligence how I'm not how I'm getting the benefit of the the low power using the camera is with the neural network that's where I'm losing the the - maybe it's just me but I guess that's I mean the compute power and the full self-driving computer is incredible and it maybe we should mention that if it had never seen that road before it would still have made those predictions provided it was a road in the United States in the case of lidar the march of nines isn't there an example I won't just get to your slam on lidar because it's pretty clear you don't like light are in this latter flame lighter is name that isn't there like a case where at some point nine nine nine nine nine down the road we're actually lidar may be helpful and why not have it as some sort of a redundancy or backup sets my first question and the second so you can still have your focus on computer vision but just have it as a redundant my second question is if that is true what happens to the rest of the industry that's building their autonomy solutions on lidar they're all gonna dump lidar that's my prediction mark my words I should point out that I don't actually super hate lidar as much as they sound but at SpaceX its basics dragon uses lidar to navigate to the space station or dock not only that we SpaceX developed its own lidar from scratch to do that and I spearheaded that effort personally because in that scenario lidar makes sense and it carves is friggin stupid it's expensive and unnecessary and as laundry was saying once you saw a vision it it's bit worthless so you have expensive hardware that's worthless on the car the we do have a forward radar which which is low-cost and is helpful especially for occlusion situations so if there's like fog or dust or you know snow the radar can see through that if you're going to use active photon generation don't use visible wavelength because once you with passive optical you've taken care of all visible wavelength stuff you wonder if you want to use a wavelength that is occlusion penetrating like radar so so right Lana is just active photon generation in the visual spectrum if you're gonna do active photon generation do it outside the visual spectrum in the radar screen in the radar spectrum so like at twenty point eight millimeters versus 400 to 700 nanometers you're gonna be much better occlusion penetration and that's why we have a forward radar and then we also have I'll twist 12 ultrasonics for for near-field information in addition to the eight cameras and the ford radar you only need the radar in the four direction because that's the only direction you're going real fast so it's uh I mean we've gone over this multiple times like always sure we have the right sensor suite should we add anything more no hi so right right here so you had mentioned that you asked the fleet for the information that you're looking for for some of the vision I have two questions about that well it sounds like the cars are doing some computation to determine what kind of information to send back to you is that is that a correct assumption and are they doing that in real-time or are they doing based on stored information so they absolutely do do computation in in real-time on the car over there and we would wait to basically specify condition that we're interested in and then those cars do that competition there if they did not then we'd have to send all the data and do that offline in our back-end we don't want to do that so all that competition have us on the car so it's based on that question it sounds like you guys are in a really good position to have currently half a million cars in the future potentially millions of cars that are essentially computers representing free almost free data centers for you to do computational is is that a huge future opportunity for Tesla car it is current a current opportunity and that's not really factored in for anything yet that's incredible thank you we have four hundred twenty five thousand cars with hardware two and beyond which is means they've got all eight cameras the right of the radar on ultrasonics and they've got at least a video computer which is enough to essentially figure out what information is important what is not compress the information that is important to the most salient elements and upload it to the network for training so it's a massive compression of real-world data you have these sort of network of millions of computers which is like massive data centers essentially that are distributed data centers for computational capacity do you see it use being used for other things besides self-driving in the future I suppose it could possibly be used for something besides self-driving we're going to were focused on self-driving so you know as we as we get that really nailed maybe there's gonna be some other use for you know millions and then tens of millions of computers with Hardware three or four self-driving computer yeah maybe there would be it could be it could be maybe this like some sort of AWS angle here it's possible hello hi you on Matt choice loop ventures I own a model 3 in Minnesota where it snows a lot since camera and radar cannot see road markings through snow what is your technical strategy to solve this challenge is it involve high precision GPS at all yeah so actually like today actually autopilot will do a decent decent job in snow even when Layla markings are covered even when Alana markings are faded covered or when there's lots of rain on them we still seem to drive relatively well we didn't specifically go after snow yet with our data engine but I actually think this is this is completely tractable because in a lot of those images even when things are snowy when you ask a human annotator where are the lane lines they actually could tell you they actually almost like relatively consistent in training those little eyes as long as the annotators are consistent on your data then I have there's the neural network will pick up on those patterns and will do just fine so it's really just about is the signal there even for the human annotator if that is the answer to that he's yes then your neural network can do it just fine yeah there's actually there are number of important signals as under saying so Lane lines are one of those things but one of them what the most important signals is drive space so what is a drivable space and what is not drivable space and what actually really matters the most is is driveable space more than lane lines and the and the prediction of drivable space is extremely good and I think especially after this upcoming winter will be incredible it'll it's like it will be like how could it possibly be that good that's crazy the other thing to point out is maybe it's not even only about human annotators as long as you as a human can drive through that impairment exam cleat learning we actually know the path you took and you obviously use vision to guide you through that path you did not just use the lane line markings you use the entire geometry of the entire scene so you see like you know you see how the world is roughly curling you see how the cars are positioned around you you know that work will pick up on all those patterns automatically inside it if you just have enough of the data people traversing those environments yeah it's an extremely important that things not be rigidly tied to GPS because GPS error can vary quite a bit and the if the the actual situation for a road can vary quite a bit so they can agree construction that could be a detour and if the car is is using GPS as primary this is a real bad situation as asking for trouble it's fine to use GPS for like tips and tricks so it's like you can drive your your home neighborhood better than a neighborhood enough in like you some other country or some other part of the country so you know your own neighborhood well and you use kind of like the knowledge of your neighborhood to drive with more confidence to maybe have counterintuitive shortcuts and that kind of thing but you you it's the GPS overlay data should only be helpful but never primary if it's ever primary it's problem so question back here in the back corner I just wanted to follow up partially on that because several of you competitors in the space over the past few years have made you know have talked about how they are augmenting all of their perception and path planning capabilities that are kind of on the car platform with high-definition maps of the areas that they our driving does that play a role in your system do you see it adding any value are there areas where you would like to get more data that is not collected from the fleet but is more kind of mapping style data I think the high precision of the high type high precision GPS maps and lanes are a really bad idea the system becomes extremely brittle so any change like this this might any change to the system makes it it can't adapt so if it if it locks on to GPS and high precision Lane lines and and does not allow vision override in fact this vision should be the thing that that does everything it's like Lane lines are a guideline but but they're they're not the main main thing we we briefly barked up the tree of high precision Lane lines and then realized that was a huge mistake and reversed it out it's not good so this is very helpful for understanding annotation where the objects are and how the car drives but what about the negotiation aspect for parking and roundabouts and other things where there are other cars on the road that are human driven where it's more art than science it is pretty good actually like with cut ends and stuff it's doing really well yeah so I'll connection we're using a lot of machine learning right now in terms of predicting kind of creating an explicit representation of what the roll looks like and then there's an explicit planner and a controller on top of that representation and there's a lot of heuristics for how to traverse and negotiate and so on there is a long tail just like in what the visual environs look like there's a long tail in just those negotiations and a little game of chicken that you play with other people and so on and so I think we have a lot of confidence that eventually there must be some kind of a fleet learning component to how you actually do that because writing all those rules by hand is going to is going to quickly Lotto I think yeah we've dealt with this issue with cut-ins and it's like will allow gradually more aggressive behavior on the part of the user they can just dial the setting up and say be more aggressive be less aggressive you know Drive easy chill mode aggressive yeah incredible progress phenomenal two questions first in terms of platooning do you think the system is geared because somebody asked about when there is snow on the road but if you have large platooning feature you can just follow the car in front does your system is your system capable of doing that and I have to follow ups so you're asking about platooning so I think like we could absolutely build those features but again if you just use if you just train your own networks for example on imitating humans humans already what followed the car ahead and so that neural neural network actually incorporates those patterns internally it's just it figures out that there's a correlation between the way the car ahead of you faces and the path that you are going to take but that's all done internally in the net so you're just concerned with getting enough data and the tricky data and the neural training process actually is quite magical does all the other stuff automatically so you turn all the different problems into just a one problem just collect your data set and using your own upper training yeah this there's three steps to self driving you know this being future complete then there's being future complete to the degree that where we think that the the person in the car does not need to pay attention and then there's being at a reliability level we've also convinced regulators that that is true so it's correct there's kind of like three levels we expect to be set feature complete in self-driving this year and we expect to be confident enough from our standpoint to say that we think people do not need to touch the wheel look out of the window sometime probably around I don't know second quarter of next year and then we start to expect to get regulatory approval at least in some jurisdictions for that towards the end of next year what is that that's a roughly the timeline that I expect things to go on and probably four four trucks the platooning will be approved by regulators before anything else and you could have like maybe if you're a long-haul doing long haul Freight you can have one driver in the front and then have four semis trailing behind in a platooning manner and I think that probably the regulators will be quicker to approve that than other things regarding of course you don't have to convince us lidar is a technology in my opinion which has an answer looking for a question probably dead though I mean this is very impressive what we saw today and probably demo could show something more I'm just wondering what is the maximum dimension of a matrix that you may be having in your training or in your deep learning pipeline ballpark figure that's information of the matrix so you know do you know what matrix multiply operations inside a neural network you're asking about the like there's many different ways to answer that question but I'm not hundred percent sure if they're they're useful they're useful answers these neural hours were typically have like I mention about tens to hundreds of millions of neurons each of them are on average have about a thousand connections to neurons below so these are the typical scales that are kind of used across the industry and also that that will reduce as well yeah I've been actually very impressed by the rate of improvement on autopilot the past year on my model three the two scenarios I wanted your feedback on last week the first scenario was I was on the right-hand most lane of the freeway and there was a highway on-ramp and then my model 3a she was able to detect two cars on the side slow down and let the car go in front of me and one car go behind me and I was like oh my gosh this is like insane like I didn't think my model three could do that so that was like super impressive but the same week another scenario which is I was on the right-hand lane again but my right-hand Lane was merging with the left lane and it wasn't a on ramp it's just a normal highway freeway lane and my Model T wasn't able to detect really that situation and I wasn't able to slow down or or speed up and I had to intervene kind of so can you from your perspective kind of share kind of the background on how a neural net would how Tesla might adjust for that and you know like how that could be improved over the Union over time yeah so like I mentioned we have a very sophisticated trigger infrastructure if you have intervened it's actually potentially likely that we received that clip and that we can actually analyze it and see what happened and and tune the system so it probably enter some statistics over okay at what rate are we are we correctly merging the traffic and we look at those numbers and we look at the clips and we see what's wrong and we tried to fix those clips and make progress against those benchmarks so yes so we would potentially go through a phase of categorization and then we looked at some of the biggest kind of categories that actually seemed to semantically be related to a simple to the same problem and then we would look at some of those and then try to develop software against that okay we do have one more presentation which is the the software's there's like essentially the the autopilot Hardware with with Stewart there's the the sort of neural net vision with Andre and then there's the software engineering at scale that's a compute presenter by Stewart so thanks and will be up to date afterwards to ask questions so yeah thanks [Applause] oh I just wanted to very briefly say if you have an early flight and you want to do a test ride with our latest development software if you could please speak to my colleague and/or drop her an email and we can take you out for a test ride and Stuart over to you [Music] all right so that's actually from a clip of a longer than 30-minute uninterrupted drive with no interventions navigate an autopilot on the highway system which is in production today on hundreds of thousands of cars so I'm Stewart and I'm here to talk about how we build some of these systems at scale just like a really short induction kind of where I'm coming from what I do so I've been in a couple companies or less I've been writing software profession about twelve years the thing that excites me most and I'm really passionate about is taking the cutting edge of machine learning and actually connecting that with customers through a bustah sand scale so at Facebook I worked initially inside of our ads infrastructure to build some of the machine let's really really smart people and why she tried to build into a single platform that we could Denix Kayle to all the other aspects of the business from how we rank the newsfeed to how we deliver search results to how we make every recommendation across the platform and that became the applied machine learning group as somebody was incredibly proud of and a lot of that wasn't just the core algorithms and the really important improvements that happened there those that matters a lot of actually the engineering practices to build these systems at scale the same thing was true at snap where I went where we were really really excited to sort of actually help to monetize this product but the hardest part we were using Google at the time and they were effectively you know running us and up fairly small scale and we wanted to build that same infrastructure we take understanding of these users connect out with cutting-edge machine learning build that at a massive scale and hender billions and then trillions of both predictions and auctions every day in which is really robust and so when the opportunity came to come to Tesla that's something I'm just incredibly excited to do which is specifically take the amazing things that are happening both in the hardware side and the computer vision and AI side and actually package that together with all the planning that controls the testing the kernel patching of the operating system all of our continuous integration our simulation and actually build that into a product we get onto people's cars in production today and so I want to talk about the timeline for how we did that with navigate on autopilot and how we're going to do that as we get a navigator on pot off the highway and onto city streets so we're at 770 million miles already for navigate on autopilot is something really really really cool and I think one thing that is worth kind of calling out on this is that we're continuing to accelerate and keep learning from this data like Andre talked about this data engine as this accelerates up we actually do make more and more assertive lane changes we are learning from these cases or we will intervene either because they fail to detect emerge correctly or because they wanted the car to be a little more peppy in different environments and we just want to keep making that progress so to start all of this we begin with trying to understand the world around us and we talked about the different sensors in the vehicle but I want to like dig in a little bit more here we have eight cameras but then we also have additionally twelve ultrasonic sensors or radar an inertial measurement unit GPS and then one thing we forget about we also the pail and steering actions so not only can we look at what's happening around the vehicle we can look at how humans chose to interact with that environment and so I'll talk to this clip right now this basically is showing what's happening today in the car and we'll continuing to push this forward so when you start with a single neural network we see the detections around it we then build all that together multiple neural networks in multiple seductions we bring in the other sensors and we convert that into an Elan calls of vector space an understanding of the world around us and this is something where as we continue to get better and better at this we're moving more and more of this logic into the neural networks themselves and the obvious endgame here is that the neural network looks across all the cars brings in all the information together and just ultimately outputs a source of truth for the world around us and this is actually not like an artist rendering in many senses this is actually the output of one of the debugging tools that we use on the team every day to understand what the world looks like around us so another thing that I think is really really exciting to me I think when I do hear about sensors like lidar a common question is around just having extra sensory modalities like why not have some redundancy on the vehicle and I want to dig in on one thing that it's not it's not always obvious with neural networks themselves so we have a neural network running on our say wide fisheye camera that neural network is not making one prediction about the world it's making many separate predictions some of which actually audit each other so it's a real example we have the ability to detect a pedestrian that's something we train very very carefully on and put a lot of work into but we also have the ability to detect obstacles in the roadway and a pedestrian is an obstacle and it's shown differently to the neural network it says oh there's a thing I can't drive through and these together combine to give us an increased sense of what we can and can't do in front of the vehicle and how to plan for that we then do this across multiple cameras because we have overlapping fields of view and many around the vehicle and front we have a particularly large number of overlapping fields of view lastly we can combine that if things like the radar and the ultrasonic stability is extremely precise understandings which happen in front of the car we can use that both to learn future behaviors that are very accurate we can also build very accurate predictions of how things will continue to happen in front of us so one example he is really exciting is we can actually look at bicyclists and people and not just ask where are you now but where are you going and this is actually the heart of ordinary for art of our next generation automatic emergency braking system which will not just stop for people in your path it'll suffer you'll are going to be in your path and that's running in shadow mode right now we'll go out to the fleet this quarter I'll talk about shadow mode in a second so when you want to start a feature like this for navigate on autopilot on the highway system you can start by learning from data and you just look at how humans do things today what is their assertiveness profile how do they change lanes what causes them to either abort or change it like their maneuvers and you can see things that are not immediately obvious like oh yeah simultaneous merging is rare but very complicated and very important and you can start to build opinions about different scenarios such as a fast overtaking vehicle so this is what we do when we initially have some algorithms you want to try out we can put them on the fleet and we can see what they would have done in a real-world scenario such as this car that's overtaking us very quickly this is taken from our actual simulation environment showing different paths that we have considered taking and how those overlay on the real-world behavior of a user when you get those algorithms tuned up and you feel good about them specifically and this is really taking that out for the neural network putting it in that vector space and building and tuning these parameters on top of it ultimately I think we can do through more and more machine learning you go on to a controlled deployment which for us is our early access program and this is you get this out to a couple thousand people who are really excited to give you highly vigilant but useful feedback about house behaves not an open-loop in a closed-loop way in the real world and you watch their interventions and we talked about like when somebody takes over we can actually get that clip try to understand what happens and one thing we can really do is we can actually play this back again an open-loop way and ask as we build our software are we getting closer or further from how humans behave in the real world and one thing was just super cool with the full self-driving computers we're actually building our own racks and infrastructure so he basically can pit for our full self-driving computers fully wrapped up build these into our own cluster and actually run this very sophisticated data infrastructure to actually understand over time as we tune in fix these algorithms are getting closer and closer to humans behave and ultimately we get can we exceed their capabilities and so once we have this we were really good about it we wanted to do our wide rollout but to start we actually asked everybody to confirm the cars behavior via stock confirm and so we started making lots and lots of predictions about how we should be navigating the highway we asked people to tell us is this right or is this wrong and this is again a chance to turn that data engine and we did spot some really tricky and interesting long tales of in this case I think what really fun example like they're these very interesting cases of simultaneous merging where you start going and then somebody moves either behind her before you not noticing you and what is the appropriate behavior here and what are the tunings of the neural network we need to do to be super precise about the appropriate behaviors here we worked we tuned these in the background we made them better and over the course of time we got 9 million successfully accepted lane changes and we use these again with our continuous integration infrastructure to actually understand what do we think we're ready and this is one thing we're full self-driving is also really exciting to me since we own the entire software stack straight from the kernel patching all the way to the Isis back to the tuning on the image signal processor we can start to collect even more data that is even more accurate and this allows us to do even better and better tuning these faster iteration cycles and so earlier this month we were kind of thought we're ready to deploy an even more seamless version of navigate on autopilot on the highway system and that seamless version does not require a stock confirm so you can sit there relax put your hand on the wheel and just oversee what the car is doing and in this case we're actually seeing over a hundred thousand automated lane changes every single day on the highway system and this is something is just like supercooled us to deploy at scale and the thing that I'm kind of most excited about from all this is the actual life cycle of this and how we actually able to turn that data engine crank faster and faster and faster with time and I think one thing it's really becoming very clear is the combination of the infrastructure we have built the tooling we built on top of that combined power of the full self-driving computer I believe we can do this even faster as we move now to be an autopilot from the highway system onto city streets and so yeah with that I'll hand off dealing it's all right to the best of my knowledge all those lane changes have occurred with zero accidents that is correct yeah I watch every single accident so it's conservative obviously but but it's to have hundreds of thousands going to millions of lane changes and zero accidents is I think a great achievement by the as a team yeah thank you well so let's see you know a few other things that are familiar with mentioning in order to have a self-driving car or Robo taxi you really need redundancy throughout the vehicle at the hardware level so starting in every was October 2016 all cars made by Tesla have redundant power steering so we were done at motors on the power steering so anyone failure of the if a motor fails the car can still steer all of the power and data lines have redundancy so you can sever any given power line or any data line and the call will keep driving the auxilary power system even if the main pack you lose complete power in the main pack the car is capable of steering and braking using the auxiliary power system so you completely lose the main pack and thus the the car is safe the whole system is from a hardware standpoint has been signed to for to be a Robo taxi since basically October 2016 so when we rolled out Hardware I autopilot version to what we do not expect to upgrade cars made before that we think it would actually cost more to make a new car than to upgrade the cars just to give you a sense of how hard it is to do this unless it's designed in its state it's not worth it so we've gone through the future of self-driving where its glitz its hardware its vision and then there's a lot of software and there's that the software problem here should not be minimizing some too massive software problem that that yeah managing vast amounts of data training against the data how do you control the car based on the vision it's a very difficult software problem so going after it going over just like Tesla Tesla master plan obviously we made a bunch of forward-looking statements as they call it and but let's go through some of our look forward looking statements that we've made no way back when we create the company we sit at both Tesla Roadster they said it was impossible and that and that even if we did build it nobody would buy it this is like universal opinion was that building an electric car was extremely dumb and would fail I agree with him that probability of failure was high but but that this was important so we built Tesla Roadster got a bunch of reduction in 2008 and shipping that car it's not collector's item they were pulled a more affordable car with the the Model S we did that again we were told that's impossible I was called a fraud and a liar it was not gonna happen this is all untrue okay famous last words now is we were into production with the Model S and 2012 exceeded all expectations there is still in 2019 no car that can compete with Model S of 2012 it's seven years later so waiting so it will affordable car maybe highly affordable it's affordable more affordable with the model 3 we bought the model 3 we're in production I said we'd get over five thousand cars we've model three but at this point five thousand cars week is a walk in the park for us it's not even hard superdude large-scale solar which we did through the souls to the acquisition and that we're developing to play a solar roof which is going really well we're now in version 3 of the solar tile roof and we expect this will a production of the solar tile roof significantly later this year I have it on on my house and it's great and like I started make the power wall and the power pack we made the power wind power pack in fact the power pack is now deployed in massive grid scale utility systems around the world including the largest operating battery projects in the world that above 100 megawatts and in the next or probably by next the next year cheered at the most we expect to have a gaggle gaggle watt scale battery project completed so all these things I said we're do them we did it say we do it we did it we're gonna do the Robo taxi thing to only criticism and it's a fair one and sometimes I'm not on time but I get it done and the tesla team gets it done so what we're gonna do this year is we're gonna reach combined production of 10,000 a week between SX and three feel very confident about that and we feel very confident about being future complete with self-driving next year we'll expand the product line with model y and semi and we expect to have the first operating Robo taxis next year with no one in them next year it's always difficult to like when things aren't on exponential at an exponential rate of improvement it's very difficult to correct one's mind around it because we're used to extrapolating on a linear basis but when you've got massive amounts of of like as the hardware massive as a hardware on the road that the the the cumulative data is increasing exponentially the software is getting better at an exponential rate I feel very confident predicting autonomous Rover taxi for Tesla next year not an order of stick don't all jurisdictions because we won't have regulatory approval everywhere but I'm confident we'll have least regulatory approvals somewhere literally next year so any customer will be able to add or remove their car to the Tesla Network so expect this to operate is similar sort of like a combination of maybe the uber and airbnb model so if you own the car you can add or subtract it to the Tesla Network and Tesla would take 25 or 30 percent of the revenue and and then in places where there aren't enough people sharing their cars we would just have dedicated Tesla vehicles so when you use the car we'll show you our ride-sharing app it's you're able to summon the car from the parking lot get in and go for a drive it's really simple you just take the same Tesla app that you currently have one student will update the app and add a summin summin Tesla or or commit your car to the fleet so see that summin summin your car or at summon a Tesla or add your add or subtract your car to the fleet you'll be able to do that from your phone so we see potential for smoothing out the demand distribution curve and having a car operates at a much higher utility than a normal car it operates so like typically the use of a car is about 10 to 12 hours a week so most people will drive one and a half to two hours a day typically 10 to 12 hours a week of total driving but if you have a car that can operate autonomously then most likely you could probably most likely have that car operate for if a third of the week or longer so they're 168 hours in a week so probably you've got something on the order of 55 to 60 hours a week of operation maybe a bit longer so the the fundamental utility of vehicle increases by a factor of five so you look at this from a macroeconomic standpoint and say just if this was like some before we were operating some big simulation if you could upgrade your simulation to increase the utility of cars by a factor of five that would be a massive increase in the economic efficiency of the simulation just gigantic so we'll do model 3 SAS 3 and X as taxis but we know we've made an important change to our leases so if you lease a model 3 you don't have the option of buying it at the end of the lease we want them back if you buy the car you can keep you can keep it but if you lease it you have to grid back and as I said we're in any locations where there's not enough to supply for sharing I will tell will just make its own cars and add them to the network in that place so the current cost of roto model three Robo taxi is less than $38,000 we expect that number to improve over time and redesigning the cars at the cars currently being built are all designed for a million miles of operation it's the drive units to design design and test and validated for a million you million miles of operation the current battery pack is about maybe three hundred to five hundred thousand miles the new battery pack that probably go into production next year is designed explicitly for a million miles of operation the entire vehicle battery pack inclusive will well it's designed to operate for a million miles with minimal maintenance maintenance so we'll actually be adjusting tire design and really optimizing the car for a hyper efficient Robo taxi and at some point you won't need steering wheels or pedals and will just delete those so as as as these things become less and less important we'll just delete parts just what they won't be there if you say like probably two years from now we we make a call that has no steering wheels and pedals and if we need to accelerate that time we can always just delete parts easy yeah I probably say long term three years rubber taxis with with eliminated parts it may be it ends up being $25,000 or less and you want a super efficient car so the illustrate electricity consumption is very low so we're currently at four-and-a-half miles per kilowatt hour but we can we'll improve that to five and beyond and there's just really no company that has the full stack integration we've got the the vehicle design and manufacturing but the computer of hardware in-house we've got the in-house Sauk development the at and AI and regret by far the biggest suite it's extremely difficult not impossible perhaps but extremely difficult to catch up when Tesla has 100 times more miles per day than everyone else mind cuz this is these these today this is the cost of running a gasoline car the average cost of running a car in the US this is taken from triple-a so it's currently about a sixty-two cents a mile the inter thirteen half thousand miles from fifty million vehicles adds up to two trillion a year these literally just taken from the Triple A website cost of ride-sharing is according to her left is two to three dollars a mile the cost to run a rover taxi we think less than eighteen cents a mile and and dropping like this is car this this would be current this is current cost future cost will be lower if you say what would be the probable gross profit from a single Robo taxi we think probably something on the order of $30,000 per year and expect that word literally designer words we're designing the cars the same way that commercial semi-trailer a semi trucks are designed commercial semi trucks are all designed for a million mile life and we're designing the cars for a million my life as well so now in nominal dollars that would be you know a little over three hundred thousand dollars over the course of 11 years might be higher I think these consumptions actually relatively conservative and this assumes that 50 percent of the miles driven are art there's nothing are not useful so this is only at 50% utility by the middle of next year we'll have over a million Tesla cars on the road with full self-driving hardware feature complete at a reliably level that we would consider that no one needs to pay attention meaning you could go to sleep in your from our standpoint if you fast for a year with a little maybe a year maybe a year in three months but next year for sure we will have over a million Robo taxis on the road the fleet wakes up with an over-the-air update that's all it takes you say what what is the net present value of a Rover taxi probably on the order of a couple hundred thousand dollars so buying a model 3 is good deal questions well I mean in our own fleet I don't know I guess what long term we have probably on the order of 10 million vehicles I mean our production rates generally if you look at a compound annual production rate since 2012 which is like the that's our first full year of Model Model S production we went from 23,000 vehicles produced in 2013 to around 250,000 vehicles produced last year so in the course of five years we increased output by a factor of ten I would expect that something similar occurs over the next five or six years as for sharing sharing buses I don't know but the nice thing is that essentially customers are fronting us the money for the car it's great so um in terms of the one thing is the snake charger I'm curious about that and also um how did you determine the pricing looks like you're undercutting the average lyft or uber ride by about 50 percent so I'm curious if you could talk a little bit about that the pricing strategy I'm sure we're expected to solving the solving for the snake charger is is pretty straightforward hits from a vision problem standpoint it's like a no-win situation any kind of known situation with with vision is like like a charge port it's a trivial so so yeah the covers were just automatically Park but and automatically plug in there would be no one no human supervision required yeah so sorry what was a pricing huh yeah we just threw some numbers on there I mean I think it's like definitely plug in whatever pricing you think makes sense we're just kind of randomly said okay maybe a dollar and the things like it's there's like on the order of two billion cars and trucks in the world so Robo taxis will be an extremely high demand for a very long time and from my observation thus far is the order industry is very slow to adapt I mean I said there's still not a car on the road that you can buy today that is as good as the Model S was in 2012 so that suggests a pretty slow rate of adaptation for the car industry and so probably a dolla is conservative for the next 10 years because I keep people sort of think like there's like actually not enough appreciation for the difficulty of manufacturing manufacturing is insanely difficult a lot of people I talk to think like if you just have the right design you can like instantly make as much of that thing as the world wants this is not true it's extremely hard to design a new manufacturing system for new technology I mean outies having major problems from infection to eat Ron and they are extremely good at manufacturing and if they're having problems what about others so the you know there's this on the order of two billion cars trucks in the world on the order of about a hundred million units per year of production capacity of vehicles but but only of the old design it will take a very long time to convert all of that to full self-driving cars and there really needs to be electric because the cost of operation of a gasoline diesel car is much higher than electric car and any any any robotex that is an electric will absolutely not be competitive Elin it's a Colin rush from Oppenheimer over here you know obviously we appreciate that the customers are fronting some of that cash for this this fleet getting built up but it sounds like a massive valincia commitment from the organization over the course of time can you talk a little bit about what that looks like what your expectations are in terms of financing over the next called three years three four years for building up this fleet and and to monetize it with with your you know customer base well we're aiming to be approximately cash flow neutral during the fleet build-up phase and then I respect the extremely cashflow positive once the Robo taxis are enabled but I don't wanna talk about financing around so a beautiful talk about financing rounds up in this in this venue but I think will make the right moves oh wait I think we'll make the move so you think think what you made um I have a question if I'm a burr why wouldn't I just buy all your cars you know why would I let you put me out of business there's a there's a clause that we put into our cars I think it was about three or four years ago they can only be used in the tesla network so even a private person like if I go out and buy ten model threes I can't I can run on the network that's a business now right you're only had already used it doesn't network right but if I use the test the network in theory I could run a car sharing Robo taxi business with my ten model threes yes but it's like the App Store the the you can only you can add only add or remove them through the Tesla Network and then tells it gets a revenue share but but similar to Airbnb though in that I have this home my car and now I can just rent them out so I can make an extra income from owning multiple cars and just renting them out like I have a model three I aspire to get this roadster here next when you build it and I'm gonna just rent my model three hour why would I give it back to you you know I guess you could just upgrade a rental car fleet but I think this is very unwieldy yeah I don't it seems easy okay try it in order to operate at a robo taxi now orchid it sounds like you have to solve certain problems like for example autopilot today if you over steer it it lets you take over but if it's you know if it's a ride-sharing product that someone else is getting in the passenger seat like moving the steering can't let that person take over the car for example because they might not even be in the driver's seat so is the hardware already there for it to be a robo taxi and it might get into situations such as a cop pulling it over where some human might need to intervene like using central fleet of operators that remotely sort of interact with humans or I mean it's all of that type of an infrastructure already built into each of the cars does that make sense I think there will be sort of a phone home thing where if the car gets stuck it'll just phone home - to Tesla and ask for a solution things like being pulled over for you know by a police officer that's that's easy for us to program in that's not a problem the it will be possible for somebody to take over using a steering wheel or at least for some period of time and then probably down the road we'll just cap the steering wheel so that there's no steering control we'll just take steering wheel off put a cap on and if you end along here give it like a couple years hardware modification to the to the car in order for it to enable that or now we're literally just unbolt the steering wheel and put a cap on where the steering will handle Carly is but but that that is a like future car that you would put out but what about today's cars where the steering wheel is a mechanism to take over autopilot like so if it's in a row boat taxi mode would someone be able to take it over by just simply moving the steering wheel type yes I think it'll be a transition period where people will take over and should be able to take over from the Robo taxi and then once regulators are comfortable with us not having a steering wheel we'll just delete that and for cars that are on the that are in the fleet you know it obviously with the permission of the owner if it's owned by somebody else we would just take a steering wheel off and put a cap where the steering world currently attaches so there might be like two phases to Robo taxi one where the the service is provided and you come in as the driver but could potentially take over and then in the future there might not be a driver option is that how you see it as well or like in the future that will in future will the probability of the steering wheel being taken away in the future as one hundred percent people consumers will demand it but but initially you would follow this is not clear this is not me prescribing a point of view about the world this isn't me predicting what consumers will demand yeah a consumers will demand in the future that people are not allowed to drive these two-ton death machines I don't but totally agree with that yeah but but in order for a model tree today to be part of the Robo taxi Network when you call it you would then get into the driver's seat essentially because yeah just to be on the safe okay right sense things like like you know there were amphibians you know but then pretty much that things just become like land creatures little bit a little bit of severe Vivienne phase hi sorry okay yes the strategy we've heard from other players in the Robo taxi space is to select a certain municipal area to create geo-fenced self-driving that way you're using an HD map to have a more confined area with a bit more safety a we didn't hear much today around the importance of HD maps to what extent is an HD map necessary for you in a second we also didn't hear much about deploying this into specific municipalities where you're working with the municipality to get the buy-in from them and you're also getting a more defined area so what's the importance of HD maps and to what extent are you looking at specific municipalities for rollout I think HT maps are a mistake we actually had HTML for a while actually can't can that because you either need HD maps in which case if anything changes about the environment the car will will will break down or you don't need h TTX in which case why you're wasting your time during HD maps so the HD maps thing is like the two main crutches that offer that should not be used and wilt in retro speak just retrospect the obviously false and foolish or lidar and HD maps mark my words if you need a geofence to area you don't have real self-driving just it sounds like maybe battery supply could be the only bottleneck left towards this vision and also could you just clarify how you get the battery packs to last a million miles I think cells will be a constraint that's all that's that's a subject for a whole separate there's a whole separate subject and I think we're actually going to want to push out sort of standard range plus battery more than our long range battery because the energy content in the long range act is 50% higher kilowatt hours so essentially you can make you know 1/3 more cars if you if you just if if they all sort of standard range plus instead of the long range pack there's ones like around 50 kilowatt hours the other ones around 75 kilowatt hours so we're actually probably in a bias our sales intentionally towards the smaller battery family in order to have a higher volume of of what basically want to eat but the obvious max thing thing to do is to maximize the number of autonomous units or the number of maximize the output that will subsequently result in the biggest autonomous leap down the road so we're making during a number of things in that regard but it's just one for today's the million my life is basically just about getting these the cycle length of the the pack to you know you need basically you know on the order it like let's say you've got a basic math if you've got a 250 mile range pack you know you're gonna need four thousand cycles so very treatable we already do that with our stationary storage so much they stay stationary storage solutions like power pack or we're ready to ready to play power pack with 1000 cycle life capability yeah can I ask sorry yeah it's like ventriloquism I a yeah it's obviously significant very constructive margin implications to the extent you can drive the Tatra it's much higher of the full self driving option I just be curious if we can level side kind of where you are in terms of those attach rates and how you expect to educate consumers about the Robotech scenario so that attach rates do materially improve over time sorry I'll put hard to hear your question yeah do just curious where where we are today in terms of full self-driving at at rates in terms of the financial implications I think it's hugely beneficial if those attached rates materially increase because of the higher gross margin dollar that that flow through to the extent people do sign up for full FST just curious how you see that ramping or what the attach rates are today versus you know when do you expect how do you expect to educate consumers and get them aware that they should be attaching FSD to to their vehicle purchases we ramp that up massively after today yeah I mean if the fundamental really fundamental message that consumers should be taking today is that it's financially insane to buy anything other than a Tesla they will be like owning a horse in three years I mean fine if you're known a horse but you should go into it with that expectation if you buy a car that it does not that does not have the hardware necessary for full self-driving it was like buying a horse and the only car that has the hardness or except for self-driving is Tesla like people should really think about their approaches any any other any other vehicle it's basically crazy to buy any other car than Tesla yeah we need to make that convey that our argument clearly and we will have today thanks for bringing the future to present very strong informational time today I was wondering like you did not talk much about Tesla pickup and let me give a context for that I could be wrong but the way I am looking at test on network it will as an early adopter and something as a test bread I think Tesla's pickup may be the first phase of putting the vehicles and network because the utility of Tesla pickup would be pretty much people who are either loading a lot of stuff or are in the profession of construction or little here and there odd items like picking up stuff from Home Depot sure I would say that you know maybe it needs to have a two-stage process pickup troughs exclusively for test all Network as a starting point then people like me can buy them later but what are your thoughts on that well today was really just about autonomy there's there's a lot that we could talk about such as cell production pickup truck and future vehicle vehicles but today was just focus on autonomy but I agree it's a major thing I'm very excited for the tells a pickup truck unveil later this year it's gonna be great Colin Lang and UB s just so we understand the definitions we need to refer to a feature complete self-driving it sounds like you're talking level five no geofence is that what's expected by the end of year just so and then the regulatory process I mean have you talked to regulators about this this seems quite an aggressive timeline from what other people have put out there I mean are they you know what are the hurdles that are needed and what is the timeline to get approval and do you need things like in California no they're tracking miles that you know what's an operator behind there do you need those things but what does that process gonna look like yeah I mean we talk to regulators around the world all the time as we introduce you know additional features like navigate on autopilot we you know this requires like in regulatory approval on a jurisdictional basis so right but I think fundamentally regulators in my experience are convinced by data so if you have a massive amount of data that shows that autonomy is safe they listen to it they may take up they may take time to digest the information their process may take about a bit of time but they have always come to the right conclusion from what I've seen oh um I have a question over here like it's got lights and lies and a pillar okay um I just wanted just to you know some of the work we've done trying to better understand the ride Heil market it looks like it's very concentrated in major dense urban centers so is the way to think about this that the Robo taxis would probably deploy more into that area and the additional fault full self driving for personally owned vehicles would be in the suburban areas I think like probably yeah like Tesla owned Robo taxis would be in dense urban areas along with customer vehicles and then as you get to medium and low density areas it would tend to be more that people own the car and occasionally lend it out yeah there are a lot of edge cases in Manhattan and say downtown San Francisco but those are you know and there are areas around the world that that have a challenging urban environments but we do not expect this to be a significant issue but when I say future complete I mean it will work in downtown San Francisco and downtown Manhattan this year hi I have a neural net architecture question done do use different models for say path planning and perception or different types of AI and sort of how do you split up that problem across the different pieces of autonomy well essentially the right now a iron neural Nets we used really for object recognition and we're still basically just using it as still frames so identifying objects that slow frames and tying it together in a perception path planning layer thereafter of the but but what's happening is steadily is that the neural net is kind of eating into the software base more and more and so over time we expect the neural net to do more and more now from a computational cost standpoint there are some things that are very simple for a for a heuristic and very difficult for a neural net and so it probably makes sense to maintain some level of heuristics in the system because they're just computationally a thousand times easier than a neural net I can do that it's like a cruise missile and if you're trying to swat a fly just use a flyswatter or not a cruise missile so but over time I would expect that it moves really to just training it on against video and then a video in car is steering and pedals out well basically it's video and that lateral longitudinal acceleration out almost entirely that's that's what we're gonna use the dojos for there's no system that can currently do that maybe over here just going back to the sensor suite discussion you on the the one area I'd like to talk about is is a lack of side radars in a situation where you have an intersection with a stop sign where there's maybe a 35 40 mile-per-hour cross traffic are you comfortable with the sensor suite that side cameras being able to handle that let me talk about that yeah no problem essentially the cars going to do kind of what human would do thinking of a human is like basically a camera on a slow gimbal and if it's a quite remarkable that people are able to drive the car in the way that they are because if you know what you can't look in all directions at once the car can literally look at all directions at once with multiple cameras so humans are able to drive just by sort of sort of looking this way looking that way they're actually stuck in their drivers seat they can't really get out of the driver's seat so it's like kind of one camera on a gimbal and is able to drive a conscientious driver to drive with very high safety the the cameras in the cars have a better vantage point than the person so they're like up in the up and the b-pillar or at in front of the rearview mirror they're got the really got a great vantage point so if you're turning onto a road that's got a lot of quite a lot of high-speed traffic you can just do what person does just like graduate like turn a little bit don't go fully into the road that the cameras see what's going on and if things look good and then the rear cameras don't show on any oncoming traffic or if you go and if it looks sketchy you can just pull back a little bit just like a person if the behaviors like remarkably it starts to become remarkably lifelike it's like quite eerie actually it's a good car just starts behaving like a person over here here we go um then trouble course right here okay given all the value you're creating in your auto business by wrapping all of this technology around yourselves I guess I'm curious as to why you would still be taking some of your cell capacity and putting it the power wall and power pack wouldn't it make sense to put every single you know unit you can make into this part of your business we're already stolen almost all the cell lines for that we're meant to go to power wall and power pack and and use them for Model 3 I mean last year in order to make our model 3 production and not be sell stuff we had to convert all of the 2170 lines at the gigafactory to took to car sells the and its are actually output in in total gigawatt hours of stationary storage compared to vehicles is an order of magnitude different and for stationary storage we can we can basically use a whole bunch of miscellaneous cells out there so we can just gather cells of from multiple suppliers all around the world and you know you don't have a homologation issue or a safety issue like you have with cars so that's basically our stationary battery business has been just kind of feeding off scraps for for quite a while yeah so but like it we really think of like the production as being a there are many many constraints of a massive constrictor production system it's like IKEA but like the degree to which manufacturing a supply chain is underappreciated is amazing they're a whole series of constraints and what is the constraint in one week may not be the constraint in another week it's insanely difficult to make a car especially one which is rapidly evolving so yeah but I'll just take a few more questions and then I think wishes break for so you can try out the cars all right Elon Adam Jonas and questions on safety what what data can you share with us today how safe this technology is which would obviously be important in a regulatory or insurance discussion well we published the accidents per mile every quarter and what we see right now is that what a pilot is about twice as safe as a normal you know normal driver on average and we expect that to increase quite a bit over time say it like said in the future it will be consumers will want to outlaw I don't think they will succeed nor am I saying I agree with this position but in the future consumers will want to outlaw people driving their own cars because it is unsafe if you think of like elevators elevators used to be operated on a big lever like up and down the floor and there's like a big relay and yet elevator operators but then periodically they would get tired or drunk or something and then that turn the lever at the wrong time and sever somebody in half so now you do not have televator operators and you be quite alarming if you went into an elevator that had a big lever that could just move between floors Orbitz rarely so there's just buttons and in the long term again not a value judgment well it's saying I want the world to feel this way I'm saying consumers will most likely demand it that big people are not a lot of Drive cars follow up can you share with us how much Tesla's spending on auto pilot or autonomous technology by order of magnitude on an annual basis thank you it's basically our entire expense structure question on the economics of the Tesla Network just so I understand it looked like so you get a model three off lease $25,000 goes on the balance sheet would be an asset and then you it would cashflow a $30,000 a year roughly is that the way to think about yeah something like that yeah and then just in terms of financing of it there's a question earlier you mentioned you would do it is it cash flow neutral to the Robo taxi program or cash flow neutral to Tesla as a whole sorry but cash flow in terms of he asked a question about financing the Robo tax yet it looks to me like they're self financing but yeah you mentioned they would be basically cash flow neutral is that what you were referring to I'm just saying between now and when the robber taxis are fully deployed throughout the world the sensible thing for us is to maximize rate and drive the company to cash flow neutral once the once the Robo taxi fleet is active I would expect to be extremely cashflow positive and that's so you were talking about production yeah do to produce annele okay thanks maximize the number of autonomous units made thank you okay maybe one last question yeah if I if I add my Tesla to the Robo taxi network who is liable for an accident is it Tesla's at me if the vehicle has an accident and harms I'm probably Tesla it's probably Tesla yeah I could the right thing to do is make sure there are very very few accidents all right thanks everyone please enjoy the price thank you [Applause]
Info
Channel: TopSpeed
Views: 1,339,869
Rating: 4.7330155 out of 5
Keywords: elon musk, tesla autopilot, tesla autonomy day, full event, press conference, tesla model 3, tesla model 3 autopilot, tesla model s, tesla model x, tesla model y, tesla roadster, autonomous cars, latest autopilot update, autopilot beta, tesla robotaxi
Id: -b041NXGPZ8
Channel Id: undefined
Length: 154min 59sec (9299 seconds)
Published: Tue Apr 23 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.