How Meta’s Chief AI Scientist Believes We’ll Get To Autonomous AI Models

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Applause] thank you Yan and welcome and my God thank you for this a highlight of my year uh the opportunity to to talk to you I don't know what you can see right now but there are 2,000 of the smartest people in the planet watching you from Cambridge and uh boy what an opportunity to pick your brain he's in he's in Stereo look at that one well I can see them from the back okay yeah actually if you want Yan to see your face is behind you too uh so Yan um what an amazing coincidence llama 3 dropped just while we were meeting today what are the odds unbelievable though uh absolutely staggering so what came out today was 8B llama 3 8 billion and 70b uh so far what we're hearing in the rumor bill is that the 8B performs as well as the old llama 270b did so we're looking at an order of magnitude change does it sound about right to you also I noticed it was trained on 15 trillion parameters where did you come up or 15 trillion tokens where did you come up with 15 trillion tokens okay so the first thing I have to say is that I deserve no credit whatsoever for Lama 3 uh maybe a little bit of credit for you know making sure our models are open source but uh uh the technical the the you know technical contributions uh you know are from a very large collection of people and have very very small part of it so 15 trillion tokens yeah I mean I mean you need to get all the you know all the data you can get all the high quality public data and then you know fine tune and uh you know license data and everything so that's how you get to 15 trillion but that's kind of a bit you know it's kind of saturating like there is only so much text you can get and that's about it well I I got to say uh I owe a big fraction of my life journey to you you didn't know it but when you were doing uh optical character recognition way back in the day uh I was reading your CNN papers he invented convolutional neural Nets which really made those things work that became my very first dollar of Revenue I ever made in a startup was doing neural networks based on your work changed the course of my life now you're doing it again especially for you young folks in the front here by being the champion of Open Source I think you're fundamentally giving them an opportunity to build companies that otherwise wouldn't be able to be built so first of all huge debt of gratitude for you for championing that [Applause] so the next thing that happens uh could be one of those events we look back on in history and say that was a turning point for Humanity the 750b monster neural net will come out soon will also be open source I assume uh four or 5D from what I what I ga uh about 400 million about 4 million billion billion yeah okay yeah yeah um dense not sparse which is interesting uh so yeah uh the the it's still it's still training You know despite all the computers we have our hands on um it it still takes a lot of time takes a lot of time to fine tune yeah uh but it's going to come out a bunch of those uh you know variations of those models are going to come out over the next few months yeah that I was going to ask that question next so they they didn't come out concurrently which is interesting which means it must still be in the training process it's such a massive Endeavor and I saw in the news that Facebook had bought another 500,000 Nvidia chips uh bringing the total to about a million by my math unless you got a discount you might have gotten a volume discount but that's $30 billion doar worth of chips which would make the training of this model bigger than the Apollo moon mission in terms of research and development am I am I getting that about right it's it's staggering isn't it uh yeah I mean a lot of uh you know not just training but also deployment is limited by competitional uh abilities um I think the you know one of the issues that that we're facing of course is the supply of gpus that's one that's one of them and the and the cost of of them at the moment but uh but another one is actually scaling up the learning algorithm so that they can be parallelized on lots and lots of gpus um and uh progress on this has been kind of slow like in the in the community so so I I think we're kind of waiting for breakthroughs there but we're also waiting for other breakthroughs that are you know in terms of architectures like new principles new new like brand new blueprints with which to build AI systems that would enable them to do things they can do today MH and so uh since you brought it up the the philosophy of taking an investment that size and then open sourcing it there's no historical precedent for this and you know the equivalent would be is you know if a if you built a gigafactory that builds Tech um Teslas and somehow you gave it to society but you the thing is once you open source it it can be infinitely copied so it's not even a good analogy to talk about a gigafactory being open sourced um so there's no precedent for this in business history what's the logic behind making it open source what do you want to see happen from this well so what's happened um I mean certainly the the whole idea of uh open sourcing infrastructure software uh is very prevalent uh today and it's been in the DNA of meta you know Facebook before that since the beginning there's a lot of Open Source packages that uh uh are basically infrastructure software that that meta has been open sourcing over over the years including in AI right so everybody is using PCH well everybody except a few people that Googled but um pretty much everybody is in by and um and and that's open source the it was built originally at Meta Meta actually transferred the ownership of borch to the Linux Foundation um so it could be much more of a kind of community effort um so that's really the the end of the company and the reason is um you know infrastructure is better becomes better faster when it's open source when more people uh contribute to it when there is sort of more eyeballs looking at it it's more secure as well yeah so what is true for you know uh internet infrastructure software is also true for for AI and then there is the additional thing for for AI which is that uh fit models are so expensive to train um it's it would be a complete waste of resources to um you know have 50 different entities is training their own Parish model I mean it's much better if there is only a few but they make them open and that basically creates the the the substrate uh for a whole ecosystem to take off and it's very much the same thing that happened to the internet in the 90s if you if you if you remember in the mid 90s when the internet started to get uh popular the software infrastructure was dominated by proprietary platforms from either Microsoft or some Microsystems and they both lost they kind of disappeared from that market now it's all Linux Apache uh you know MySQL PHP whatever you know all the open source stuff even the core of uh uh web browsers is open source even the the software stack of cell phones cell phone towers is open source nowadays so infrastructure needs to be open source it just makes it progress faster be more secure and everything well I'm so glad to hear you say that because they're definitely diverging philosophies on that if you think about where open AI is going and where and where you're going um but the the version of the world that you're describing is one where all of these startups and all of these teams can Thrive and be competitive and create and innovate and the alternate version is the one where strong AI is invented in a box and it's controlled by a very small group of people and all the benefit you know confers to a very small uh group so I I don't have skin in the game on this but I certainly love your version of the future a more than alternate versions so very very glad to hear you hear you say it um so I want to spend a lot of our time or limited time that we have uh talking about the implications of this and where you see it going I also want to ask you about V JEA um so you've been very clear in saying that llms will take us down a path of incredible things we can build but it's not going to get you to a truly intelligent system uh you need experience in the world and V JEA I think is your solution to that is that going to carry us to that to that goal tell us about vppa first of all okay uh well first of all I have to you know tell you what I where I believe AI research is is going and I I I wrote a fairly long kind Vision paper about this about two years ago that I put online that you can you can you can look for it's an open review it's called a pass towards autonomous machine intelligence um I replace the autonomous by Advan now because people are scared by the word autonomous um so we have this thing autonomous or Advanced machine intelligence that's spelled Ami and in French you call you you know you pronounce this am that means French m in French which I think is a good you know good analogy anyway um current llms are very limited uh in their abilities and and uh you know stepen W from just before actually pointed to to that limitations as well uh one of them is uh they don't understand the world they don't understand the physical world the second one is they don't have persistent memory the third one is they can't really reason in the sense that we usually understand reasoning they can uh regurgitate uh previous reasoning that they've been trained on but and and adapt them to the situation but really not reason in the sense that we understand it for humans and many animals and the last thing which is also important they can't really plan either they can again regate plans that they've been trained on but really plan in new situations they can't and there is a lot of studies you know by various people that show the limitations of llms for planning reasoning and and uh uh understanding the world Etc so we need to basically design new architectures which would be very different from the ones we currently have that will make AI systems understand the world have pant memory reason plan and also be controllable um in a way that you can give them objectives and they will the only thing they can do is fulfill those objectives and not do anything else um subject to some guard railes so that's what would make them safe and controllable as well um so the missing parts is how do we get AI system to understand the World by watching it a little bit like baby animals and humans you know it takes a very long time for BB humans to really understand how the world works like the whole idea of uh the fact that an object that is not supported Falls because of gravity it takes nine months for human babies to learn this yeah it's not something you're born with it's something you have to observe the world and sort of understand dynamics of it right um how do we get how do we reproduce disability with uh with machines so for almost 10 years now uh my colleagues and I have been trying to um train a system to basically do video production with the idea that if you get a system to predict what's going to happen in a video it's got to develop some understanding of the nature of the physical world and it's been a basically a complete failure and we tried many many things for for many years um but then a few years ago what we realize is that the architectures that we can use to to train uh uh deep Learning Systems to learn representations of images are not generative they are not things for which you you know you you you take an image you corrupt it and then you train a a system to reconstruct the the uh uncorrupted image y right which is the way we train llms right that's how we train llms we take a piece of text we remove some of the words and train some gigantic neural net to predict the words that are missing if you do this with images or video it doesn't work or I mean it kind of works but you get representations of images and videos that are not very good and the reason is it's very difficult to actually reconstruct all the details of an image or a video that is hidden from you and so what we figured out a few years ago is that the way to approach that problem is through what we call a joint embedding architecture or a joint embedding predictive architecture which is what JEA means is an acronym um and the idea of joint Ting architecture goes back to the early 90s some people are worked on we used to call them it's Nets but the idea is basically uh if you have let's say a piece of video and you mask some parts of it let's say the second half of the video and then you train a big net to try to predict what's going to happen next in the video uh that would be a generative model instead of that we run both videos through encoders and then we train a predictor in the representation space to predict the representation of the of the video not all the pixels of of the video and uh and you train the whole thing simultaneously we didn't know how to do this four years ago and we kind of figured out a number of ways to do this when we now have half a dozen algorithms for this so VJ is a particular instance of uh of this kind of thing and and the results are very promising I think ultimately we're going to be able to build or train systems that uh basically have mental World models you know have some notion of intuitive physics have some possibility of predicting what's going to happen in the world uh as a result of taking an action for example and if you have a model of the world of this type then you can do planning you can plan a sequence of actions to arrive at a particular objective and that's really what intelligence is about that's what we can do psychology a really critical question actually because you know when when you use uh diffusion algorithms to create pictures you know they'll make six fingers or four fingers all the time they never make five fingers but these llms have a shocking amount of common sense but they also are missing a shocking amount of Common Sense Once you roll in the JEA data the V JEA data you give it a lot more of an opportunity to to think much more like we do because it all the real world experiences of moving around and feeling things are folded into the training data so do you think the result of that will then be one massive Foundation model or are we still going to use the mixture of experts approach and glue them together in kind of synthetic ways I think ultimately it's probably going to be one one big model of course it' be modular in the sense that uh you know there's there's going to be multiple modules that interact but not are are not necessarily completely connected with each other Uh there's Big debate now you know in AI whether if you want a multimodal system that that deals with text as well as images and video should you do early Fusion so should you basically tokenize images of or videos and then then turn them into kind of little vectors that you concatenate with you know with the Tex tokens or should should you do late Fusion which means you know run your images or video through some sort of encoder that is more or less specialized for it and then you know have some merging at the top I'm more in favor of the second approach um but the the a lot of the current approaches actually are more early early Fusion because it's easier it's simpler I'm going to I'm going to do the dangerous the dangerous thing of asking you to predict the future but if if if you can't then nobody can so it has to has to be you um so once you roll in the v JEA data and you train these massive models and suppose you go up another 10x you know buy another $30 billion or so of chips um the combination of the v jepa data plus this massive scale will that be enough to then solve fundamental problems like physics problems and biological experimentation problems or are we still missing something in the in the pathway that needs to be thought of and added after that well I it's clear that we're missing a number of things uh the the problem is that we don't exactly know what and uh we we can see the first uh obstacle really uh but uh but like where where is that going uh afterward is not clear but the hope is that we're going to get systems that can have some level of common sense you know at first they're not going to be as smart as the top mathematician or physicist but they're going to be as smart as your cat that would be a good uh you know pretty a pretty good advance advance already okay if we had systems that were you know as could understand the world like like cats if you had systems that that could be trained very easily uh in 10 minutes like any 10-year-old to clear up the dinner table and fill up the dishwasher we would have domestic robots if we had systems that could learn to drive a car in 20 hours of practice like any 17y old uh that would be a big big Advantage hey just here for a sec take a while y so you know we we spoke at the time party um uh at Davos uh on the subject and uh we enjoyed having you at Imagination action in the dome this is the second of three of our events uh I don't know if you realize this but if you speak at all three the next one's June uh 6 you get a chiaia pet this is a foot of a chiaia pet so if and I think a chiaia pet would go great there did you enjoy speaking Under the Dome not the MIT dome but the MIT event in Davos yeah that was fun yeah all right can I lock you in for next next year there was there was a spectrum of uh you know of people from the uh sort of techno positive Optimist uh and I I was not like uh at the end of that spectrum and and and on the other side doomers who think doomers it's Davos all right well um we have someone from open Ai and given that you work at meta you may not want to be seen in the same Zoom so ladies and gentlemen Yan look thank you Yan right thank you well done hold on
Info
Channel: Forbes
Views: 116,169
Rating: undefined out of 5
Keywords: Forbes, Forbes Media, Forbes Magazine, Forbes Digital, Business, Finance, Entrepreneurship, Technology, Investing, Personal Finance, Meta, Meta AI, Yann LeCun, Artificial Intelligence, Facebook's AI, does Facebook have AI, Large Language Models, Language Learning Models, autonomous AI, AI Autonomy, is AI Autonomous, who is building AI, openAI
Id: 6RUR6an5hOY
Channel Id: undefined
Length: 18min 16sec (1096 seconds)
Published: Thu May 02 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.