Tesla Bot: Elon Musk answers all your questions (full AI team Q&A)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and uh yeah great um so we have to answer any questions you have about anything on the software hardware side where things are going and uh yeah fire away um we have because the lights are like interrogation lights so we actually cannot see ah there we go great all right cool i can just okay there we go first off i mean thanks to all the presenters that was just super cool to see everything i'm just curious at a high level and this is kind of a question for really anyone who wants to take it um to what extent are you interested in publishing or open sourcing anything that you do for the future um well i mean it is fundamentally extremely expensive to create the system so somehow that has to be paid for i'm not sure how to pay for it if it's fully open sourced um yeah unless people want to work for free but but i should say that this is if other car companies want to license it and use it in their cars that would be cool this is not intended to be just limited to tesla cars is for the dojo super computer so did you solve the compiler problem of scaling to these many nodes or is or if it is solved is it only applicable to dojo because uh i'm doing research in deep learning accelerators and getting the correct scalability uh or the distribution even in one ship is extremely difficult from the research project's perspective so i was just curious excuse me mike for bill have we solved the problem not yet are we confident we will solve the problem yes we have demonstrated networks on prototype hardware now we have models performance models showing the scaling the difficulty is as you said how do we keep the localities if we can do enough model parallel enough data parallel to keep most of the things local we just keep scaling we have to fit the parameters in our working set in our sram that we have and we flow through the pipe there's plenty of opportunities sorry as we get further scale for further processor nodes have more local memory memory trade also bandwidth we can do more things but as we see it now the applications that tesla has we see a clear path and our our modularity story means we can have different ratios different aspects created out of it i mean this is something that we chose for our applications internally sure locality portion of it given that training is such a soft scaling application even though you have all this compute and have a high bandwidth um high bandwidth interconnect it it could not uh give you that performance because you are doing computations on limited memory at different locations so i was that's very curious to me when you said it's solved because i i just jumped onto the opportunity and would love to know more up given that how much you can open source yep yeah i guess the proof's in the pudding um so we should have dojo operational next year um and um i think we'll we'll obviously use it for uh training video training it's i mean fundamentally this is about like um the the primary application initially is we've got vast amounts of video and how we train vast amounts of video uh as efficiently as possible and um also shorten the amount of time like if you're trying to train train to a task like just in general innovation is um how many iterations and what is the average progress between each iteration and so if if you can reduce the time between iterations uh the rate of improvement is is much better so um you know if it takes like sometimes a couple days for a model to train versus a couple hours that's that's a big deal um but the the asset test here and you know what i've told the dojo team is like it's it's successful if the uh software team wants to turn off the gpu cluster but if they want to keep the gpp cluster on it's not successful so hi um right over here uh love the presentation thank you for getting us out here loved everything especially the simulation part of the presentation i was wondering uh it looked very very very uh realistic are there any plans to maybe expand simulation to other parts of the company in any way mike hi i'm ian glow uh i manage the autopilot simulation team so as we go down the path to full self driving we're gonna have to simulate more and more of the vehicle um currently we're simulating vehicle dynamics but we're going to need bms we're going to need the mcu we're going to need every single part of the vehicle integrated and that actually makes the autopilot simulator really useful for places outside of autopilot so i want to expand or we want to expand eventually to be in a universal simulation platform but i think before that we're going to be spinning up a lot of optimus support and then a little bit further down the line that we have some rough ideas on potentially how to get the simulation infrastructure and some of the cool things we've built into the hands of people outside of the company optimus is the code name for the teslabot oops optimus uprine [Laughter] yeah hi this is ali jahanian thank you for the great presentation and putting all of these cool things together yeah for a while i have been thinking that the car is already a robot so why not a humanity robot and i'm so happy that today you mentioned that you are going to build such thing especially i think that this can give opportunity for rays of uh putting multi modality together for instance we know that in the example that you are showed that there was a dog and with some passengers or running together the language and symbolic processing can really help for visualizing that so i was wondering if i could hear a little more about this type of uh putting modalities together including language and vision because i have been working with for instance mini gpt's and andre put out there and yeah i i didn't hear much about uh other modalities that's going into the car or at least in the simulation is is there any comment that you could tell us well driving is fundamentally uh basically almost entirely vision neural nets uh like we're basically it's running on a biological vision neural net and what we're doing here is a silicon camera neural net so and there are there's there is some amount of audio uh you know you want to hear uh if um there's like emergency vehicles or uh you know i guess converse with the people in the car um you know if somebody's yelling something at the at the car that car needs to understand what that is so you know all the things that are necessary for it to be fully autonomous yeah thank you hi uh thank you for all the great work that you've shown my question is for the team because the data that was shown was seems to be predominantly from the united states that the the fsd computer is being trained on but as it is being as it gets rolled out to different countries which have their own road systems and challenges that come with it how do you think that it's going to scale like like i'm assuming like the ground up is not a very viable solution so how does it transfer to different countries uh well there's we actually do train on using data from you know probably like 50 different countries um but we have to pick you know in as we're trying to advance full self driving we need to pick one country and since we're located here we pick the us and then we get a lot of questions like why not even canada like well because the roads are a little different in canada different enough um and so when trying to solve a hard problem uh you want to say like okay what's the let's not add additional complexity right now uh let's just solve it for the u.s and then we'll extrapolate to the rest of the world but we do use video from all around the world yeah i think a lot of a lot of what we are building is very country agnostic fundamentally all the computer vision components and so on don't care too much about country specific uh sort of features every you know different countries have roads and they have curbs and they have cars and everything we're building is fairly general for that yeah and there's the the prime directive is don't crash right and that's true for every country yes this is the prime directive um and um even right now the car is pretty good at not crashing um and so just basically um whatever it is don't hit it even if it's a ufo that crash landed uh on the highway and uh still don't hit it you should not need to recognize it in order to not hit it that's very important and i want to ask that when you do the photometric process multiview geometry how much of an error do you see is it like one millimeter one centimeter so i'm just if it's not confidential sorry what is the what's it what's the difference between the synthetic sure what is the difference between the synthetically created geometry to the actual geometry yeah it's usually within a couple centimeters three or four centimeters that's the standard deviation merge with different kind of modalities to bring down that error we primarily try to find scalable ways to label um in some occasions we use other sentences to help benchmark but primary tries to use cameras for this system okay thanks yeah i mean i think we want to aim for the car to be positioned uh accurately to the sort of centimeter level um you know some something on that order obviously it will depend on distance like close by things can be much more accurate than farther away things uh because and they would matter less because the car doesn't have to make decisions much farther away and as it comes close it will become more and more accurate exactly a lot of questions thanks everybody my question has to do with sort of ai and manufacturing it's been a while since we've heard about the alien dreadnought concept is the humanoid that's behind you guys is that kind of brought out of the production health timeline and saying that humans are underrated in that process um well sometimes like some you know something that i say is uh taken to too much of an extreme um there um there are parts of the tesla system that are almost completely automated and then there are some parts that are almost completely manual and if you were to walk through the whole production system you would see a very wide range from yeah like i said fully automatic to almost completely manual but the vast majority it's most of it is is already uh automated so and then with the some of the design architecture changes like going to large aluminum high pressure die cast components we can take the entire rear third of the car and cast it as a single piece and now we're gonna do that the front third of the car as a single piece so the the body line uh drops by like 60 to 70 percent in size but yeah the the the robot is not is not prompted by specifically by manufacturing needs it's it's just that we're just obviously making the pieces that are needed for a useful humanoid robot um so i guess we probably should make it and if we don't someone else would well and so i guess we should make it and make sure it's safe i should say like also manufacturing volume manufacturing is extremely difficult and underrated and we've gotten pretty good at that it's also important for that humanoid robot like how do you make the human robot not be super expensive and hi uh thank you for the present presentation and my question will be about skilling of dojo and uh in particular how do you scale uh the compute nodes in terms of thermal uh thermals and power delivery because there is only so much heat that you can dispense and only so much power that you can bring to uh like cluster rack and how do you want to scale it and how do you point to scale it in multiple data centers sure you want to take it oh there hi um i'm bill i one of the dojo engineers the um so from a thermal standpoint and uh power standpoint um we've designed it very modular so what you saw on the compute tile that will that will cool the entire tile so we once we hook it up to it is liquid cooled on both the top and the bottom side um it doesn't need anything else and so when we talk about clicking these together once we click it to power and we once we click it to cooling it will be fully powered and fully cooled and all of that is less than a cubic foot yeah so tesla has a lot of expertise in power electronics and in uh in cooling so we took uh the power electronics expertise from the vehicle powertrain and the sort of the advanced cooling that we developed for the power electronics and for the vehicle and applied that to the super computer uh because as you point out uh uh getting heat out is uh extremely important it's just really heat limited so um yeah so it's just funny that like at the compute level it's operating at less than a volt which is uh a very low voltage there's a lot of amps so therefore a lot of heat i squared r is what really bites you on the ass um hi my question's also similarly a question of scaling um so it seems like a natural consequence of using you know significantly faster training hardware is that you'd be either training models over a lot more data or you'd be training a lot more complex models which would be potentially significantly more expensive to run at inference time on the cars uh i guess i was wondering like if there was a plan to like also um apply dojo as something that you'd be uh using like on the self-driving cars and if so like you know do you foresee additional challenges there i can so as you could see like andre's models are not just for cars like there are auto labeling models there are other models that are like beyond car application but they feed into the car stack so so dojo will be used for all of those too not just the car inference part of the training yeah i mean the dojo's the first application will be consuming video data for training for that would then be run in the inverse inference engine on the car but uh and that i think is an important uh test to see if it actually is good or but is it actually better than gpu cluster or not um so but then beyond that it's basically a general a generalized neural net training computer but it's very much optimized to be a neural net so um you know cpus and gpus uh they're they're they're not made to be um they're not they're not designed specifically for training neural nets um we've been able to make gpus especially very efficient for portraying neural nets but that's not that was never their design intent so it's it's bc gpus are still essentially running it uh you're on that training and emulation mode so um with dojo we're saying like okay let's just let's just asic the whole thing let's just have this thing that's it's built for one purpose and that is neural net training and and just generally any system that is designed for a specific purpose will be better than one that is designed for a general purpose yeah i had a question here hi um so you described two separate systems one was for vision therefore planner and control um does dojo love you train networks that cross that boundary and second thing is if you were able to train such networks would you have the onboard compute capability in the fst system to be able to run that in in the under your tight latency constraints thanks yeah i think we should be able to train panel networks on dojo or any gpus it's really invariant to the platform um and i think if anything once we make this entire thing end to end it'll be more efficient than decoding a lot of these intermediate states so you should be able to run faster if you make the entire thing into an neural networks we can avoid out of decoding of these intermediate states and only decode essential things required for driving the car yep certainly and to endness and as the guiding principle behind a lot of the network developments and over time in the stack neural networks have taken on more and more functionality and so we want everything to be trained end-to-end because we see that that works best but we are building it incrementally so right now the interface there is vector space and we are consuming it in the planner but nothing really fundamentally prevents you from actually taking features and eventually fine-tuning end-to-end so i think that's definitely where this is headed yeah and the discovery really is like what are the right architectures that we need to place in network blocks to make it amenable to the task so like on a describe we can place spatial rnns to help with the perception problem and now it's just neutral network so similarly for planning we need to bake in search and optimization into the planning into the network architecture and once we do that you should be able to do planning very quickly uh similar to c plus plus algorithms um i i think i had a question very similar to what he was asking about um it seems like a lot of neural nets around computer vision and kind of traditional planning you had model predictive control in solving convex optimization problems very quickly and i'd wonder if there's a compute architecture that's more suited for convex optimization or the model predictive control solutions very quickly yeah 100 if you want to bacon like i said earlier if you want to bake in these architectures that do say model protein control but just like replace some of the blocks with neural networks or if we know the physics of it we can also use physics-based models part of the neural network's forward pass itself so we are going to go towards a hybrid system where uh we will have neural network blocks placed together with uh physics based blocks and more neural networks later so it'll be a hybrid stack and what we know to do will be placed with explicitly and what the networks are created we'll use the networks to optimize this uh so builder end-to-end stack with this architecture baked in i i mean i do think that uh so as long as you've got like um surround video uh neural nets for understanding what's going on and can uh convert the surround video into vector space then you basically have a video game um and if you know if you yeah it's like if you're in grand theft auto whatever you can you can make the cars drive around and pedestrians walk around without crashing so um you can do you don't have to have a neural net for control and planning um but it's probably ultimately better so but i think you can probably get to in fact i'm sure you can get to much safer than human with control and planning primarily in c plus plus with perception vision in neural nets hi um my question is we've seen other companies for example use reinforcement learning and machine learning to optimize power consumption and data centers and all kinds of other internal processes my question is are is tesla using machine learning within its manufacturing design or other engineering processes i i i discourage use of machine learning because it's really difficult unless you basically unless you have to use machine learning don't do it um it's usually a red flag when somebody says we want to use machine learning to solve this task i'm like that sounds like so uh 99.9 percent of time you do not need it um so yeah but so it's kind of like a you you reach for machine learning when you when you need to not but it's i've not found it to be a convenient easy thing to do it's a super hard thing to do that may change if you've got a humanoid robot that can you know understand normal instructions but yeah generally minimize use of machine learning in the factory hi um based on your videos from the simulator it looked like a combination of graphical and neural approaches i'm curious what the set of underlying techniques that are used for your simulator and specifically for neural rendering if you can share yeah so we're doing uh at the bottom of the stack it's just traditional game techniques uh just rasterization real time uh you know very similar to what you'd see in like gta um on top of that we're doing real-time ray tracing and then those results were really hot off the press um i mean we had that little asterisk at the bottom that that was from last night uh we're going into the neural rendering space we're trying out a bunch of different things we want to get to the point where the neural rendering is the the cherry on the top that pushes it to the point where the models will never be able to overfit on our simulator um currently we're doing things similar to photorealism enhancement uh there's a paper a recent paper photo enhancing photo realism enhancement but we can do a lot more than what they could do in that paper because we have way more labeled data way more compute and also much we have a lot more control over environments and we also have a lot of people who can help us make this run at real time but we're going to try whatever we can do to get to the point where we can train everything just with the simulator if we had to but we will never have to because we have so much real world data that no one else has it's just to fill in the little gaps in the real world yeah i mean the simulator is very helpful when there's like these rare cases like like um you know like collision avoidance right before an accident um and then ironically the better our cars become at avoiding accidents the fewer accidents there are so then our training set is small so then we have to make them crash in the simulation so it's like okay minimize potential injury to uh pedestrians and people in the car you have five meters you're traveling at you know 20 meters per second um what actions would minimize probability of injury we can run that in some so cars driving down the wrong side of the highway that kind of thing happens occasionally but not that often um for your humanoid contacts i'm wondering if you've decided on what use cases you're going to start with and what the grand challenges are in that context to make this viable well i think for the humanoid for the teslabot um optimus it's basically going to start with just dealing with work that is boring repetitive and dangerous basically what is the work that people would least like to do um hi um so quick question about um your simulations um obviously they're not perfect right now so are you using any sort of domain adaptation techniques to basically bridge the gap between your simulated data and your actual real-world data because i imagine it's kind of dangerous to just deploy models which are solely trained on simulated data so maybe some sort of explicit domain adaptation or something is that going on anywhere in your pipeline so currently uh i mean we're producing the videos straight out of the simulator uh the the full clips of kinematics and everything and then we're just immediately training on them but it's not the entire data set it's just a small targeted segment and we only are evaluating based on real world video um we're paying a lot of attention to make sure we don't ever fit uh and if we have to start doing fancier things we will but currently it's we're not having an issue with it overfitting on the simulator we will as we scale up the data um and that's what we're hoping to use uh neural rendering to bridge that gap to push that even further out um we've already done things where we're using like the same network as in the car but retrain it to detect sim versus rail to driver art decisions um and that's actually helped um prevent some of these things as well yeah just emphasize that overwhelmingly the data set is the real video from the cars on the actual roads uh nothing's weirder or uh or has more corner cases than reality um it's gets really strange out there but uh but then if if we find say a few examples of something very odd and there's some very some some very hard pictures we've seen um then in order to train it effectively we want to um create simulations uh say a thousand simulations that are that are variants of that quirky thing that we saw the foot to fill in the some important gaps and and make the system better and really all of this is about over time just reducing the probability of of a crash or an injury um and uh it's called the march of nines like how do you get to 99.999999 uh safe you know and it yeah each nine is an order of magnitude difficulty increase thanks so much for the presentation i was curious about the tesla bot um specifically i'm wondering if there are any specific applications that you think the humanoid form factor lends itself to and then secondary um because of its human form factor is emotion or companionship at all thought about on the product roadmap at all um we certainly hope this does not feature in a dystopian sci-fi movie but uh you know like really at this point we're saying like maybe this robot can just we're trying we're trying to be as literal as possible can it do um boring dangerous repetitive uh jobs that people don't want to do and you know once you can have it do that then maybe you can do other things too but that's the that's the thing that we really great to have so it could be your buddy too or you can buy one and have it be or your friend and whatever i'm sure that people think of some very creative uses so uh so um so firstly thanks for the the really incredible presentation um my question is on the ai side um so one thing we've been seeing is that with some of these language modeling ais we've seen that scaling has just had incredible impacts in their capabilities and what they're able to do so i was wondering whether you're seeing similar kinds of effects of scaling in in your neural networks and your applications absolutely a bigger network typically we see it performs better provided you have the data to also train it with and this is also what we see for ourselves uh definitely in the car we have some latency consideration to be mindful of and so there we have to get creative to actually deploy much much larger networks but as we mentioned we don't only train neural networks for what goes in the car we have these um auto labeling pipelines that can utilize models of arbitrary size so in fact we've traded a number of models that are not deployable that are significantly larger and work much better because we want 100 like we want much higher accuracy for the auto labeling and so we've done a lot of that and there we definitely see this trend yeah the order labeling is uh an extremely important part of this whole situation without the order labeling i think we would not be able to solve the self-driving problem it's kind of a funny form of distillation where you're using these very massive models plus the structure of the problem to do this reconstruction and then you distill that into neural networks that you deploy to the car but we basically have a lot of neural networks and a lot of tasks that are never intended to go into the car yeah and also as time goes on that you get new frames of information so you really want to make sure your computer is distributed across all the information as opposed to just taking a single frame and hogging on it for say 200 milliseconds you actually have newer frames coming in so you want to like use all of the information and not just use that one frame i think one of the things we're seeing is that the car's predictive ability is um is quite is eerily good um it's really getting better than human in terms of predicting like you say like what predict what this road will look like out when it's out of sight like it's around the bend and it predicts the road with very high accuracy um and uh you know predict pedestrians or cyclists wherever behind you know where it just sees a little corner of the bicycle and a little bit through through the windows of the bus and its ability to predict things is going to be much better than humans like really way way beyond right yeah we see this often where we have something that is not visible but the neural network is making up stuff that actually is very sensible sometimes it's eerily good and you have to like you're wondering this is in the training set and actually actually in the limit you can imagine the neural net has enough parameters to potentially remember earth so in the limit it could actually give you the correct answer and it's kind of like an hd map back baked into the weights of the neural net okay i have a question about the design of the tesla bot specifically in order how important is it to maintain that humanoid form to build hands with five fingers that also respects the weight limits could be quite challenging you might have to use cable driven and then that also causes all kinds of issues um i mean this is just going to be bought version one i mean we'll see so the it's it needs to be able to do things that that people do um and uh you know be a generalized you know humanoid robot um i mean you could make you potentially have it give it like you know two fingers and a thumb or something like that um you know for now we'll we'll give it five fingers and and see see if that works out okay i probably will it doesn't need to be like uh you know have like incredible grip strength um but it needs to be able to work with tools so and you know carry a bag that kind of thing all right thanks a lot for the presentation um so an old professor of mine told me that the thing he disliked a lot about his tesla was that the autopilot ux didn't really inspire much confidence in the system especially when like objects are spinning classifications are flickering i was wondering even if you have a good self-driving system how are you working on convincing tesla owners other road users or other road users and just the general public that your system is safe and reliable well i think that's that's the cars a while back cars used to spin they don't they don't spend any more not in the if you've seen the fsd beta videos they they're they're pretty solid um and they will be getting more solid yeah as you add more and more data and train these multi-camera networks like these are pretty recent actually just like few months old and they're still improving it's not a done product uh and that we've never minds we can clearly see how this is just going to be like perfect but perfect vector space because why not uh all the information is that in the videos it should produce a given lots of data and good architectures and this is just intermittent point in the timeline i mean it's clearly headed to way better than human without question my turn oh hi here um i was wondering if you could talk a little bit about the short to medium-term economics of the bot i guess i understand the long-term vision of replacing physical labor but i also think repetitive dangerous and boring tasks tend to not be so highly compensated and so i just don't see how to reproduce uh you know start with a supercar and then break into like the lower end of the market how do you do that for a robot humanoid well i guess you'll just have to see hello hi um i was curious to know how the car ai prioritizes um occupant safety versus pedestrian safety and what thought process goes into like deciding how to make this into the ai well i mean we the the thing to appreciate is that from the computer standpoint everything is moving slowly i think you know to a human uh things are moving fast to the computer they are not moving fast so i think this is in reality somewhat of a false dichotomy not that it will never happen but it will be very rare you know if you think it was like you know going the other direction like rendering you know with full ray tracing uh neural net enhanced graphics on something like cyberpunk or in any you know advanced video game you know doing 60 frames a second perfectly rendered like how long would it take a person to even render one frame and without any mistakes can't be done i mean it would take like a month just to just render one one frame out of 60 in a second in a video game it's a computer are fast and humans are slow i mean for example uh on on the rocket side the you you cannot steer the rocket to orbit we actually hooked up a joystick to see if anyone could steal the rocket orbit but you need uh to react at roughly six seven hertz uh people can't do it not even now that's pretty low you know we're talking more like aiming for like 30 hertz type of thing hi um with the over here uh with hardware 3 there's been lots of speculation that with larger nets it's hitting the limits of what it can provide how much headroom has the extended compute modes provided and what point would hardware 4 be required if at all well i'm i'm confident that uh hardware 3 or the full stop driving computer one will be able to achieve full self driving at a safety level much greater than a human probably i don't know at least two or three hundred percent better than a human um then obviously there will be a future hardware for or full self-driving computer too um which we'll probably introduce with the cybertruck um so maybe in about a year or so that it probably will be about four times more capable roughly um but it's really just going to be like can we take it from say for argument's sake 300 safer than a person to a thousand percent safer um you know just like there are people on the road who with with varying driving abilities but we still let people drive that you don't have to be the world's best driver to be on the road so as we see so yeah guess what's neat um so are you worried at all since you don't have any depth sensors on the car that people might try like adversarial attacks like printed out photos or something to try to trick the rgb neural network yeah like pull some like wiley cody stuff you know like paint the tunnel on the on the wall it's like oops um we haven't really seen much of that um i mean for sure like like right now if you pro most likely if you had like a a t-shirt with a t-shirt with like a stop sign on it which i actually have a t-shirt with a stop sign on it and and then you like flash the car it will it will stop i i proved that um but we can obviously as we see these uh adversarial attacks then we can we train the cars to you know notice that well it's actually a person wearing a t-shirt the stop sign on it so it's probably not a real stop sign hi uh my question is about um the prediction and the planning i'm curious how do you incorporate uncertainty into your uh you know planning algorithms do you just basically assume you know you mentioned that you run the autopilot for all the other cars on the road do you assume that they're all going to follow those rules or are you accounting for the possibility that well they might be bad drivers for example yeah we do uh account for multi-modal futures it's not that we just choose one we account for this person can actually do many things and uh we use that actual physics and kinematics to make sure that they're not doing a thing that would interfere with us before we act um so if there's any uncertainty we are conservative and then would yield to them of course there's a limit to this because if you're too conservative then it's probably not practical so at some point we have to assert uh and we even then we make sure that the other person can yield to us and um act sensibly i should say like um like before we introduce something into the fleet we will uh run it in shadow mode um and so and we'll see what what would this neural net for example have done in this particular situation um because and then effectively the drivers uh are training it training the net so if the neural net would have uh controlled and you know and say veered right but the person actually went left it's like oh there's a difference why was there that difference yeah and secondly all the human drivers are essentially training the neural net as to what is the correct course of action assuming it doesn't then ended up in a crash you know doesn't count in that case yeah and secondly we have various estimates of uncertainty like flicker and when we observe this we actually say we are not able to see something we actually slow down the car to be again safe and get more information before acting uh yeah we don't want to be brazen and just go into something that we don't know about we only go into places where we know about yeah um yeah so it should be like aspirationally that the car should be the less it knows the sl you know the slower it goes yeah which is not true at some point but now yeah yeah we've yeah should we speed proportionate to confidence i'm sorry thanks for the presentation so i am curious um appreciate the fact that the fsd is improving but if you have the ability to improve one component along the ai stack that presented today whether it is simulation data collections planning control etc which one in your opinion is going to have the biggest impact for the performance of the full self driving system it's really the area under the curve of this like multiple points and if you improve anything uh it should improve the area i mean the short term it's arguably we need all of the nets to be um surround video uh and so we still have some legacy this is a very short term obviously we're fixing it fast but there's there's still some nets that are not using surround video um and i think ideally that all use surround video yeah very yeah i think a lot of puzzle pieces are there for success we just need more strong people to also just help us make it work yeah that is the actual box so that is the actual bottleneck i would say i'm really one of the reasons that we are putting on this event exactly what well said andre that um there's just a tremendous amount of work to do to make make it work so that's why we need talent people to join and solve the problem thank you for the great presentation lots of my questions answered but one thing is uh when imagine that now you have a large amount of data even unnecessary how do you consider that like there's a forgetting problem in neural networks like how are you considering those aspects and also another one are you considering online learning or continuous learning so that maybe each driver can have their version of uh self-driving software i think i think i know the literature that you're referring to that's not some of the problems that we've seen and we haven't done too much continuous learning uh we trained the system once we fine-tuned a few times that sort of goes into the car we need something stable that we can evaluate extensively and then we think that that's good and that goes into cars so we don't do too much learning on spot or continuous learning and don't face the forgetting problem but there will be settings that you can say like if you do do you want are you typically a conservative driver or do you want to drive fast or slow you know it's like i'm late for my i'm late for the airport uh could you go faster than you know basically the kind of instructions you'd give to your uber driver it's like i'm late for the flight please hurry um or take it easy or you know whatever your style is so let's take a few more questions here so and then we'll call it a day all right so as our models have become more and more capable and i guess you're deploying these models into the real world um one thing i guess that's possible is for ai to become more i guess misaligned with what humans desire so i guess is this something that you guys are worried about as you guys deploy more and more robots um or do you guys like we'll solve that problem when we get there yeah i think that we should be worried about ai um now like what we're trying to do here is i say a narrow ai pretty narrow like just make the car drive better than a human and then have the humanoid robot be able to do basic stuff um you know so um at the point which you sort of start get uh superhuman intelligence uh yeah i don't know all bets are off um but you know and that's that's that's you know that'll that'll probably happen but but what what we're trying to do here at tesla is make useful ai that people love and and is unequivocally good that's our you know try to aim for that okay maybe one more question hi uh my question is about the camera sensor in the beginning of the talk you had mentioned about building a synthetic animal and if you think about it a camera is a very poor approximation of a human eye and a human eye does a lot more than take a sequence of frames have you looked into like like these days are like cameras like event cameras have you looked into them or are you looking into a more flexible camera design or building your own camera for example well with hardware four we will we will have a next generation camera but i have to say that the the current cameras we have not reached the limit of the current cameras so and i'm confident we can achieve full self driving with much higher safety than humans with the current cameras and current compute hardware um but you know are we good to be a thousand percent better rather than 300 better so we'll see continued evolution on on all levels uh and pursue that goal and i think in the future uh people will look back and say um wow i can't believe we had to drive these cars ourselves you know it's like self-driving cars will just be just a normal like self-driving elevators you know elevators used to have elevator operators and there's someone there with like you know big big relay switch operating the elevator and then every now and then they'd get tired or you know some make a mistake and share somebody in half so um so now we uh you know we made elevators automatic and you just go and you press the button and you can be in a 100 story skyscraper and don't really worry about it just go ahead and press a button and the elevator takes you where you want to go but it used to be that all elevators will operate manually manually it'll be the same thing like for cars all cars will be automatic and then and electric obviously so there will still be some gasoline cars and some manual cars just like there are still some horses so um all right well thanks everyone for coming and i hope you enjoyed uh presentation and thank you for the great questions [Applause]
Info
Channel: CNET Highlights
Views: 505,465
Rating: undefined out of 5
Keywords: event, livestream, live, 2021, tesla, elon musk, ai day, tesla bot, tesla robot, autonomous driving, q&a, q and a
Id: tSa1kOOELrY
Channel Id: undefined
Length: 50min 30sec (3030 seconds)
Published: Fri Aug 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.