Leading the W{ai}ve 2024: Fireside Chat with Emad Mostaque

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right thank you Amy and Jamie uh who wants to hear more from Jamie that was pretty good right yeah you guys are in luck uh she's going to come back later for another panel on AI in the Enterprise uh so that's not the first time you're going to hear from her um moving on to our second keynote however uh this will be moderated by Minish Kumar is the director of the MIT Bitcoin Expo and the keynote is going to be a virtual fireside chat with moo the CEO of stability AI who here has heard of stability yeah I mean if you're if you're a builder or developer I mean this is like foundation work on generative Imaging right so I mean M ahad's gonna you know bring a lot of insights on on what it's like to build open source what's it like to build and these other modalities uh I think you guys will learn a lot from him so please welcome me in in in uh welcoming Manish and do we have a mod ready on the uh the zoom there we go all right over to you I'm excited to introduce our new next keynote speaker and the CEO of stability AI stability's mission is to activate Humanity's potential using generative AI by providing open models in every modality for everyone everywhere please welcome Imad [Applause] mostu thank you for having me today yes how you doing man thanks for joining I'm doing doing all right busy was always in the AI field yeah so in not so long amount of time you've kind of become like the OG of Open Source Ai and um I want to I want to get your perspective you know we're big on op Source here at MIT and I want to get a better understanding of why that mission is so important to you I think that um the foundation of this big generative AI shift has been on open source open models and open models are required for a whole variety of things from customization of your own assets through to every government regulated industry will be run on open models because the best way to think about these models I think is like they're like graduates that try too hard and occasionally they hallucinate but they can code they can write they can do all this stuff and what you have is proprietory models like Consultants that you borrow but then these these models will be inside your systems and no one's going to want to have a proprietary model for that they'll want open models that they know how they work they know the ingredients that make it up and so I saw there was a path to providing very high quality open models that people build on and then just Stoke Innovation yeah you know on campus I've heard that some of our Labs were kind of thanks to this open model you even share your open you even open up your weights to your models tell us about why and what you hope to happen in the future with that yeah so I think open source is interesting because you have the source code and then you've got the data sets and you have the weights and so for our language models for example like stable M2 which performs up at llama 70D level on a laptop we release the code the data and the weights others like M trial release the weights and those are open weights but you don't know the source code you know the data and you need a mixture of those but what you see again is once you release it you can take those assets you can optimize them so we've seen like orders of magnitude Improvement in speed of stable diffusion for example you've seen people try and optimize the data sets try new things have control Nets and the whole ecosystem build around that just like they have around llama and I think that's incredibly powerful this pace of innovation to address real world needs whereas you're a bit more limited with proprietary models and what you can do so we've seen companies go you know start as open source and then go close Source you're just on you know you've really been championing the open source can you tell us a little bit about the pros and cons about of the decision of do of doing either yes I think open source has been infrastructure and there's a question is this technology infrastructure I believe it will and should be um and we have classical open core models like Red Hat Linux and others where you have open source and then you build around that Consulting businesses Etc I think that the reason that a lot of companies went non-op source is partially there was the AGI discussions and other things which are bit comp ated and also because once things are open like mrr release a model one day at $2 per million tokens someone else does it at 3010 a token it's difficult to have a classical business model when things are moving this fast what we did is we moved to a membership model because in two years we've had 200 million downloads of our models and now we have the best models in almost every type apart from the large language from code to 3D to audio where it's a flat fee for all our models it's bit like Amazon Prime so that way we can grow with the market and you know it's going pretty well but it is difficult because you don't need that many open source providers you know uh and already people are moving beyond the base models to that next layer above of abstraction yeah that's a great segue to my next question so talking about advancement in models we had some great news coming out of stability AI this week that you folks um publish your new research paper uh stable diffusion 3 and can you tell us a little bit about that the new Innovations what can expect going forward why it's so important yeah so I think you know we pioneered uh with Robin lber Patrick Essa stable diffusion team captain K and the diffusion models um and then the parallel crack was these Transformer models on the language in particular so we have some of those too but this next generation of models are diffusion Transformers that mix The Best of Both Worlds so that's what's used in open AI Sora model for example which is kind of amazing and we've released a new architecture called multimodal diffusion Transformer that powers just stable diffusion 3 the stable diffusion 3 already now before being finished and we expect the way it's out I know probably in a few weeks a month or so already outperforms every other model including M Journey on the market and Del through because it leverages this Technologies to allow it to scale and also to have fine degrees of accuracy but the same architecture can accept audio video text any modality and this is incredibly interesting for where we're going because the models that we've had so far across the board have been largely filled with jug not optimized we're using research artifacts in business once we standardize around a few architectures and we believe the fusion Transformers We one of them for media we can then get to the optimization equation and there actually potentially orders of magnitude Improvement in speed quality and output that can come from that and so we're super excited around that so what are you thinking about like if we're thinking about what this new research means to what's coming out of stable diffusion can you give us some examples as to uh the the improvements that we'll see yeah so kind of we had stable diffusion then we bu a system called confi that can bring different models and control Nets uh to it so you have the base model and then you can control the output adjust it Etc the technology com stable diffusion means that it now has amazing typography it has amazing kind of situational awareness and presence and permanence so you can say a red ball with a blue hat on top of a avocado wearing a tiara and understands all of that straight from the base model and then as you build pipelines from that and you have discriminators and adjusters it means that we're moving towards Pixel Perfect control so I think the Paradigm is we have amazing models that can create then we have a control layer then we have a composition layer and finally a collaboration layer to really drive those amazing user outcomes and so it's like you program assemble a code then youve got libraries and then other patterns on top of it and finally you can build amazing apps we're moving up that stack right now that's awesome so with all this new innovation you know um in past interviews you've said that you can see about five years out into the where we're headed with the AI future and a lot of our audience today are going to be entering the workforce whether they're graduate students or undergrads in this time period um you've also talked a lot about like what the C maybe like changing a curriculum for in uh where in the AI assisted future um might be the best way to kind of grow your own skill set can you talk talk a little bit about where we're heading next five years what are some of the bottlenecks and some of the uh milestones we have to reach and um what what that might look like I think you know as I mentioned earlier the best mental model I've have for this is over enthusiastic graduates right now again that's hallucination you wouldn't trust to graduate with a lot of these things right but they can be very good I there's a lot of great graduates in the audience right now I think that's how you can think about it affecting your personal life your business and others because that creates a massive Supply especially in knowledge work of these graduates versus you know like the demand is going to be relatively stable growing a bit so when you're thinking about that future you have to think how can I leverage this technology to build organization systems and more that can both help me achieve my personal thing or build businesses or contribute to businesses because the pace of this technology coming like you know just have Microsoft they've adopted it amazingly quickly now they're three trillion company right is such that every organization can be massively improved by this technology you don't need to create new organizations you can improve existing ones like MIT is an amazing place but people still feel frustration around information flows what if you have more graduates with MIT working for MIT on MIT things like organizing papers and all sorts of other stuff life would be easier so I think it's just diving into using this technology adopting it from a design perspective and thinking about the systems of information flow and how to improve them one of the things I like to say is that AI can't do art it can make content humans do art AI can't program it can write boiler plates you know and so the role of the person is that next level above utilizing and leveraging all of these assets to create real world outcomes and that's what you should be thinking about and that's what you should be focused on that's great um so so compute you have sort of a infrastructure company right you provide infrastructure for these open models and it seems to me that this infrastructure race is you know we have the Nvidia sort of bottleneck and then along with that you know we have qualcom and with their Snapdragon Intel with their gouty 3 and AMD and you know all the usual players kind of trying to catch up or like create some create a more competitive market lately you know even you know the GR lpu U people are really trying to figure out how to get around this bottleneck and India I saw Bish ugal he is trying to just create his own gpus to avoid this infrastructure kind of um issue so talk a little bit I want to talk shortly about this broader GPU race and then I want to follow that up talking about um Computing on the edge so if you could start with the um with the first part of that yeah so I think the GPU thing is quite interesting been a big advocate of being long GP is we have over 10,000 a100 equivalents that makes us one of the larger players out there but we try everything from tpus to GIS to these other models ultimately the processes for training and running these models are not complicated things it's Matrix multiplications or diffusion kind of things and we used general purpose processors but in the first stage of this what we did is like when you take a crack quality steak and you cook it for longer it becomes nice and juicy we took the whole of the internet that we could see and and used rubbish data with big compute to achieve these outcomes it's so one of these models aren't insane you know now when you look at f 2 when you look at how D 3 or sa diffusion is built we're using synthetic optimized data and requiring a lot less compute um and so I think that you're seeing diminishing games with exponential comput um I think that you're seeing now standardized based models that you can build around and you see that actively because people GT4 CLA are amazing but they prefer to find T A Mist because it does the job so I think it's a satisficing thing on the large model training at the same time stable diffusion Excel for example took 20 seconds to run on a 4090 when we released it last summer now we can do 200 images a second if you look at stream diffusion you're seeing orders of magnitude Improvement and again the quality of what you can get now is crazy so if you download LM studio. Ai and you run stable lm2 you can run 100 tokens a second for a falcon 40 level model on a MacBook M2 and that's kind of crazy when you try like this is slightly magic right but they're still not optimized so when I look at the equation what I see is that language models will get to a gp4 level on the edge probably in a year or so and that's good enough for 95% of use cases and it's a big rise to zero against Google and others with their millions and millions of gpus which are actually very good chips then on the other side what you see is that we can already generate every pixel like for $15 a month right now in compute I can do 100 hours a month of cats with Hats of various types you know uh and so that's you know good enough content for most people but we'll have video generating every pixel as Jenson said in a few years so I think most of the compute will be pushed towards the edge except for the super expert systems like the multi-million token window Geminis Etc which are also amazing and then most of the compute otherwise in the cloud be used for streaming streaming of media I think that's a very different Paradigm for everyone expecting large language models to be everywhere I think language models will be on your phone on your laptop for 90 95% of use cases and then call into the cloud for the super expert stuff that's really interesting so I want to talk a little bit more about that um when it comes to moving to the edge uh you've often talked about what that might mean for all of us can you uh you know fill us in again about what compute on the edge means for the end user it means that you can have like a lot of people looked into fully homomorphic encryption and other mechanisms for privacy preserving AI if you've got a standardized model again the standardization is important that's why sour stability like gymbox and let said there's only two types of companies bundlers and un bundlers and we're a bundler good quality models of every type that you can build around like you don't need a new games console every so often if that model is on your device then you can train it dynamically with all your information so um answer AI and a few others just released a new set of protocols that allow you on a consumer GPU to train a 70 billion parameter llama model and fine tunit so now that's within access of everyone that's just going to get again more and more optimized you don't need 70 billion parameters 9 95% of use cases and base models with private data will outperform generalized models just like that's why we hire graduates and we just don't have Consultants all the time and then it comes down to real world use cases across these different modalities I have a music library you take one of our music models and then you have an intelligent music assistant you have your images and apple I'm sure will have an intelligent creator that can put your face into any scenario but really private on your iPhone so again I think we need to think about intelligence moving to the edge and it's super powerful when you think about it versus the classical internet world the intelligence with at the center and this is particularly important when it comes to emerging markets and the global South a lot of people in the audience will come from outside the US what happens when everyone has gp4 level smartphone AI on their smartphone in India or Indonesia that's a dramatic change in productivity and capital flows information flows and more I've heard you say not your data not your models tell us a little bit about what you mean by that actually I kind of Rift off the crypto not your keys not your crypto not your models not your mind so what's embedded in these models will have a big effect because we rely more and more on these models like if you talk to Claude uh three it's amazing and it's just like having a conversation with a real person and they're getting to that level now where it's just really nice like that movie her maybe once they get to AGI they'll go by way and Say Goodbye thanks all GP but it's like having that conversation uh anthropic had an interesting paper called sleeper agents whereby they program data into the models and then with a flip like turning 2025 or dosan or something the model turns evil and you can't tune out that out or identify ahead of time you so you need models of your own as well as other people's models because a you can't trust them especially in regulated Industries I think data transparency will be key and that also makes model safer but then also because you want it to represent your own View and your own biases because no technology is unbiased and we can't just have something where it has one point of view you know and again we have to distribute the benefits of this technology in fact you know like a lot of the things there's a bit of jingoism about other people can't have this technology so you're like when will a billion people in China have this technology you know and other things like that so I think open is the way because this something that can uplift all of humanity and you need those models that represent you and work for you as opposed to misaligned objectives maybe from some of the other models by advertising companies and others very interesting so as you can imagine when I told folks that you were going to be here a lot of students had a lot of questions so um I actually put some of those together and we just have a few minut left here so I'm going to fire off some of these student questions and um give you a chance at them um ready little bit of a lightning round all right do you recommend aspiring Founders in the room follow your footsteps attempt open source Ventures if so what's one tip to think about monetization uh God no don't do it being a CEO is terrible I know I think that U the key thing with open source as you do open source if you want to become a standard you want to spread and you want to leverage community so my thing was build those building blocks and then other people can build on top it's very difficult to do the base models but then you've seen companies like n and others take one level above that which requires far less overhead and you have to look at what is the classical open core business models because open source is $200 billion and more of market cap and again this is about a spread thing so I think very carefully where am I adding value that's sustainable and how can I have distribution in fact that's what I say to all AI Founders distribution is the most important thing because this technology can go everywhere if you're can to help a company with good distribution that's the key and that's the goal and that's how you can grow amazing businesses okay it's going to be a lightning round now you got a minute and a half right so what's one thing you wish more Founders worked on uh I think healthcare and education those two of the biggest education by far the largest personalized tuts to Sigma effect huge most interesting start up you heard about in the past month gosh uh don't know pass on that one I see you so okay uh what an industry right for disruption by AI I would say again education and everyone's education in school has been crapped no one in this audience was happy with their school and personalized tutors in the two sign effect effect it will transform education forever what's a use case you didn't expect people would hack on with stable diffusion um I think didn't expect them to be able to do the 3D because you can extend out the 3D representation from the 2D world and now the 3D is going insane so we just res trr that's 0.5 second generation going to get even faster and even better besides image what are some other modalities you're most excited about I think I'm most excited about the 3D I think we have a real chance of building that holid de you know you take the Vision Pro and we'll be generating Real Worlds live in a few years and that's crazy when you think about it also the other one is um audio music music is going to be transformed forever this year yeah I just got my young nephew on your stable audio and it was he was really blown away so um last one Manchester United or Chelsea uh I'm gonna go with Chelsea as a London boy even though I'm a Tottenham supporter but man you that never thank you so much for joining us say man we really appreciate your time your wonderful insights and um we're grateful [Music] [Applause] thank
Info
Channel: MIT AI ML Club
Views: 2,270
Rating: undefined out of 5
Keywords:
Id: gsUKnHRS7Mc
Channel Id: undefined
Length: 21min 9sec (1269 seconds)
Published: Tue Mar 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.