Open sourcing the AI ecosystem ft. Arthur Mensch of Mistral AI and Matt Miller

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm excited to introduce our first Speaker uh Arthur from mistol uh Arthur is the founder and CEO of mistal AI despite just being nine months old as a company uh and having many fewer resources than some of the large Foundation model companies so far I think they've really shocked Everybody by putting out incredibly high quality models approaching GPT 4 and caliber uh out into the open so we're thrilled to have Arthur with us today um all the way from BRS to share more about the opportunity behind building an open source um and please uh interviewing Arthur will be my partner Matt Miller who is dressed in his best French wear to to honor Arthur today um and and and helps lead lead our efforts in Europe so please Welcome Matt and [Applause] Arthur with all the efficiency of a of a French train right just just just just right on time right on time we we're sweating a little bit back there cuz just just just walked in the door um but good to see you thanks for thanks for coming all this way thanks for being with us here at aisn today thank you for hosting us yeah absolutely would love to maybe start with the background story of you know why you why you chose to start mrra and and maybe just take us to the beginning you know you we all know about your career at Deep your successful career at Deep Mind your work on the chinchilla paper um but tell us maybe share with us we always love to hear at seoa and I know that our founder commun also L to hear that spark that like gave you the idea to to launch and to to start to break out and start your own company yeah sure um so we started the company in April but I guess the ID was out there for a couple of months before uh timot and I were in master together G and I were in school together so we knew each other from before and we had been in the field for like 10 years uh doing research uh and so we loved the way AI progressed because of the open exchanges that occurred between uh academic Labs uh industrial Labs uh and how everybody was able to build on on on top of one another and it was still the case I guess when uh in between even in the beginning of the llm era where uh openi and deep mine were actually uh like uh contributing to another one another road map and this kind of stopped in 2022 so basically the one of the last uh paper doing important changes to the way we train models was chinchila and that was the last Model that uh Google ever published uh last important model in the field that Google published and so for us it was a bit of a shame that uh we stopped uh that the field stopped doing open uh open contributions that early in the AI Journey because we are very far away from uh finishing it uh and so when we saw chat GPT at the at the end of the year and um and I think we reflect on the fact that there was some opportunity for doing things differently for doing things from France because in France you have as it turned out there was a lot of talented people that were a bit bored at in big tech companies and so that's how we figured out that there was an opportunity for building very strong open source models going very fast with a lean team uh of experienced people uh and show yeah and try to correct the the the direction that the field was taking so we wanted to push it to push the open Source model is much more and I think we did a good job at that because we've been followed by various companies uh in our trajectory wonderful and so it was really a lot of the open source move movement was a lot of the a lot of the drive behind starting the company yeah that's one of one of the yeah that was one of the driver uh Our intention and the mission that we gave ourselves is really to bring AI to the hands of every developers and the way it was done and the way it is still done by our competitors is very closed uh and so we want to push a much more open platform and we want to spread the adoption and accelerate the adoption through that strategy so that's very much uh at the core well the reason why we started the company indeed wonderful and you know just recently I mean fast forward to today You released Mr all large you've been on this tear of like amazing Partnerships with Microsoft snowflake data bricks announcements and so how do you balance the what you're going to do open source with what you're going to do commercial commercially and how you're going to think about the the tradeoff because that's something that you know many open source companies contend with you know how do they keep their Community thriving but then how do they also build a successful business to contribute to their Community yeah it's it's a hard question and the way we've addressed it is currently through uh two families of model but this might evolve with time um we intend to stay the leader in open source so that kind of puts a pressure on on the open source family because there's obviously some contenders out there um the I think compared to to how various software providers playing this strategy uh developed we need to go faster uh because AI develops actually faster than software develops faster than databases like mongodb played a very good game at that and this is a good uh a good example of what we could do uh but we need to adapt faster so yeah uh yeah there's obviously this tension and we're constantly thinking on how we should contribute to the community but also how we should show and start uh getting some commercial adoption uh Enterprise deals Etc and this is uh there's obviously a attention and for now I think we've done a good job at at doing it but it's it's very it's a very Dynamic thing to to think through so it's basically every week we think of uh what we should release next on the on both families and you have been the fastest uh in developing models fastest reaching different benchmarking levels you know one of the most leanest in amount of expenditure to reach these benchmarks out of any of the any of the foundational model companies what do you think is like giving you that advantage to move quicker than your predecessors and more efficiently well I think we like to do uh the like get our hands dirty uh it's uh machine learning has always been about crunching numbers uh looking at your data uh doing a lot of uh extract transform and load and things that are uh oftentimes not fascinating and so we hired people that were willing to do the dot stuff uh and I think that's a uh that has been critical to our speed and that's something that we want to to keep awesome and the in addition to the large model you also have several small models that are extremely popular when would you tell people that they should spend their time working with you on the small models when would you tell them working on the large models and where do you think the Economic Opportunity from mrol lies is it in doing more of the big or doing more of the small I think and I think this is um this is an observation that every llm provider has made uh that like one size does not fit all and uh depending on what you want to when you make an application you typically have different large language model calls and some should be low latency and because they don't require a lot of intelligence but some should be higher latency and require more intelligence and an efficient application should leverage both of them potentially using the large models as an orchestrator for the small ones um and I think the challenge here is how do you make sure that everything works so you end up with like a system that is not only a model but it's really like two models plus an out Loop of of calling your model calling systems calling functions and I think some of the developer challenge that we also want to address is how do you make sure that this works that that you can evaluate it properly how do you make sure that you can do continuous integration how do you how do you change like one how do you move from one version to another of a model and make sure that your application has actually improved and not deteriorated so all of these things are addressed by various companies uh but these are also things that we think should be core to our value proposition and what are some of the most exciting things you see being built on mrra like what are the things that you get really excited about that you see the community doing or customers doing I think pretty much uh every young startup in the Bay area has been using it for like fine tune fine-tuning purposes for fast application making uh so really I think one part of the value of mix for instance is that it's very fast and so you can make applications that uh are more involved uh and so we've seen uh we've seen web search companies using us uh we've seen uh I mean all of the standard Enterprise stuff as well like uh Knowledge Management uh marketing uh the fact that you have access to the weights means that you can pour in your editorial tone much more uh so that's yeah we we see the typical use cases I think the the but the value is that uh for of the open source part is that uh developers have control so they can deploy everywhere they can have very high quality of service because they can uh use their dedicated instances for instance and they can modify the weights to suit their needs and to bump the performance to a level which is close to the largest ones the largest models while being much cheaper and what what's the next big thing do you think that we're going to get to see from you guys like can you give us a sneak peek of what might be coming soon or how what we should be expecting from MRA yeah for sure so we have uh so Mr Large was good but not good enough so we are working on improving it quite quite heavily uh we have uh interesting open source models uh on various vertical domains uh that will be announcing very soon um we have uh the platform is currently just apis like serverless apis uh and so we are working on making customization part of it so like the fine tuning part um and obviously and I think as many other companies we we're heavily betting on multilingual uh data and and multilingual model uh because as a European company we're also well positioned and this is the demand of our customers uh that I think is higher than here MH um and then yeah eventually uh in the months to come we are we will also release some multimodal models okay exciting we we look forward to that um as you mentioned many of the people in this room are using mrol models many of the companies we work with every day here in the silan valley ecosystem are working already working with mrol how should they work with you and how should they work work with the company and what what type of what's the best way for them to work with you well well they can reach out so we have uh some developer relations that are really uh like pushing the community forward making guides uh also Gathering use cases uh to Showcase what you can build uh with mral model so this is we're very uh like investing a lot on the community um something that basically makes the model better uh and that we are trying to set up is our ways to for us to get evaluations benchmarks actual use cases on which we can evaluate our models on and so having like a mapping of what people are building with our model is also a way for us to make a better generation of new open source models and so please engage with us to uh discuss how we can help uh how discuss your use cases we can advertise it uh we can uh also gather some insight of of the new evaluations that we should add to our evaluation suit to verify that our model is are getting better over time MH and on the commercial side our models are available on our platform so the commercial models are actually working better than than the the open source ones they're also available on various Cloud providers so that it facilitates adoption for Enterprises um and customization capabilities like fine-tuning which really made the value of the open source models are actually coming very soon wonderful and you talked a little bit about the benefits of being in Europe you touched on it briefly you're already this example Global example of the great innovations that can come from Europe and are coming from Europe what you know talk a little bit more about the advantages of building a business in France and like building this company from Europe the advantage and drawbacks I guess yeah both both I guess what one advantage is that you have a very strong junior pool of talent uh so we there's a lot of uh people coming from Masters in France in Poland in the UK uh that we can train in like three months and get them up to speed get get them basically producing as much as a as a million dollar engineer in the Bay Area for 10 times 10 10 times less the cost so that's that's kind of efficient sh don't tell them all that they're goingon to hire people in France sure uh so that like the the workforce is very good engineers and uh and machine learning Engineers um generally speaking we have a lot of support from uh like the state which is actually more important in Europe than in in the US they tend to over regulate a bit bit too fast uh we've been telling them not to but they don't always listen uh and then generally uh I mean yeah like European companies like to work with us because we are European and we we are better in European languages as it turns out like French uh the the French Mr Large is actually probably the strongest French model out there uh so yeah that's uh I guess that's not an advantage but at least there's a lot of opportunities that are geographical and that we're leveraging wonderful and you know paint the picture for us 5 years from now like I know that this world's moving so fast you just think like all the things you've gone through in the two years it's not even two years old as a company almost two years old as a company um but but five years from now where does mrr sit what do you think you have achieved what what does this landscape look like so our bet is that uh basically the platform and the infrastructure uh of int of artificial intelligence will be open yeah and based on that we'll be able to create uh assistance and then potentially autonomous agent and we believe that we can become this platform uh by being the most open platform out there by being independent from cloud providers Etc so in five years from now I have literally no idea of what this is going to look like if you were if you looked at the field in like 2019 I don't think you could bet on where we would be today but we are evolving toward more and more autonomous agents we can do more and more tasks I think the way we work is going to be changed profoundly and making such agents and assist is going to be easier and easier so right now we're focusing on the developer world but I expect that like AI technology is in itself uh so uh easily controllable through human languages human language that potentially at some point the developer becomes the user and so we're evolving toward uh any user being able to create its own assistant or its own autonomous agent I'm pretty sure that in five years from now this will be uh uh like something that you learn to do at school awesome well we have about five minutes left just want to open up in case there's any questions from the audience don't be shy son's got a question how do you see the future of Open Source versus commercial models playing out for your company like I think you made a huge Splash with open source at first as you mentioned some of the commercial models are even better now how do you imagine that plays out over the next cample of years well I guess the one thing we optimize for is to be able to continuously Produce open model with a sustainable business model to actually uh like fuel the development of the Next Generation uh and so that's I think that's as I've said this is uh this is going to evolve with time but in order to stay relevant we need to stay uh the best at producing open source models uh at least on some part of the spectrum so that can be the small models that can be the very big models uh and so that's very much something that basically that sets the constraints of whatever we can say we can do uh staying relevant in the open source uh World staying the best best uh solution for developers is really our mission and and we'll keep doing it David there's got to be questions for more than just the Sequoia Partners guys come on you talk to us a littleit about uh llama 3 and Facebook and how you think about competition with them well lfre is working on I guess uh making models I'm not sure they will be open source I have no idea of what's going on there uh so far I think we've been delivering faster and smaller models so we expect expect to be continuing doing it but uh generally the the good thing about open source is that it's never too much of a competition because uh uh once you have like uh if you have several actors normally that should actually benefit to everybody uh and so there should be some if if they turn out to be very strong there will be some cination and and we'll welcome it one thing that's uh made you guys different from other proprietary model providers is the Partnerships with uh snowflakes and data bricks for example and running natively in their clouds as to sort of just having API connectivity um curious if you can talk about why you did those deals and then also what you see is the future of say data bricks or snowflake in the brave new LM world I guess you should ask them but uh I think generally speaking AI models become very strong if they are connected to data and grounding uh yeah grounding information as it turns out uh the Enterprise data is oftentimes either on snowflake or on data rcks or sometimes on AWS uh and so being able for customers for customers to be able to deploy the technology exactly where their data is uh is I think quite important I expect that this will continue continue doing the ca being the case uh especially as I believe we'll move onto more stateful AI deployment so today we deploy serverless apis with not much State it's really like Lambda uh Lambda functions but as we go forward and as we make models more and more specialized as we make them uh more tuned to use cases and as we make them um self-improving you will have to manage State and those could actually be part of the data cloud or so there there's an open question of where do you put the AI State and I think that's the uh my understanding is that Snowflake and datab Bricks would like it to be on their data cloud and I think there's a question right behind him the grace I'm curious where you draw the line between uh openness and proprietary so you you're releasing the weights would you also be comfortable sharing more about how you train the models the recipe for how you collect the data how you do mixure experts training or do you draw the line at like we release the weights and the rest is proprietary so that's where we draw the line and I think the the reason for that is that it's a very competitive landscape uh and so it's uh similar to like the tension there is in between having a some form of Revenue to sustain the Next Generation and there's also tension between what you actually disclose and and everything that yeah in order to stay ahead of of the curve and not to give your recipe to your competitors uh and so again this is this is the moving line uh if there's also some some Game Theory at at stake like if everybody starts doing it then then we could do it uh but for now uh for now we are not taking this risk indeed I'm curious when an when another company releases weights for a model like grock for example um and you only see the weights what what kinds of practices do you guys do internally to see what you can learn from it you can't learn a lot of things from weights we don't even look at it it's actually too big for us to deploy a gr is is quite big or uh was there any architecture learning I guess they have they are using like a mixture of expert uh pretty standard setting uh with a couple of Tricks uh that I knew about actually but uh yeah that's uh uh there's there's not not a lot of things to learn from the recipe themselves by looking at the weights you can try to infer things but that's like reverse engineering is not that easy it's basically compressing information and it compresses information sufficiently highly so that you can't really find out what's going on coming the cube is coming okay it's okay uh yeah I'm just curious about like um what are you guys going to focus on uh the model sizes your opinions on that is like you guys going to still go on the small or uh yeah going to go to the larger ones basically so model size are kind of set by like scaling lows so it depends on like the compu you have based on the computer you have based on the The Landing AR infrastructure you want to go to you make some choices uh and so you optimize for training cost and for inference cost and then there's obviously um uh there's the weight in between between uh like for depends on the weight that you put on the training cost amortization uh the more you amortize it the more you can compress models uh but basically our goal is to be uh low latency and to be uh relevant on the reasoning front so that means having a family of model that goes from the small ones to the very large ones um hi are there any plans for mistol to exp expand into uh you know the application stack so for example open a released uh the custom gpts and the assistance API is that the direction that you think that M will take in the future uh yeah so I think as I've said the we're really focusing on the developer first uh but there's many um like the the frontier is pretty thin in between developers and users for this technology and so that's the reason why we released like a an assistant demonstrator called lha which is the cat in English and uh it's uh the point here is to expose it to Enterprises as well and be make them able to connect their data connect their context um I think that's that that answers some some need from our customers that many of of the people we've been talking to uh are willing to adopt the technology but they need an entry point and if you just give them apis they're going to say okay but I need an integrator and then if you don't have an integrator at end and often times this is the case it's good if you have like an off the shelf solution at least you get them into the technology and show them what they could build for their core business so that's the reason why we now have like two product offering there the first one which is the platform and then we have the sh uh which should evolve into an Enterprise off the shelf solution more over there there there I'm just wondering like where would you be drawing the line between like stop doing prompt engineering and start doing like fine tuning because like a lot of my friends and our customers are suffering from like where they should be stopped doing more PRT engineering yeah I think that's that's the number one pain Point uh that is hard to solve uh from from a product product standpoint uh the question is normally your workflow should be what should you evaluate on and based on that uh have your model kind of find out a way of solving your task uh and so right now this is still a bit manual you you go and and you have like several versions of prompting uh but this is something that actually AI can can help solving uh and I expect that this is going to grow more and more automatic across time uh and this is something that yeah we would love to try and enable I wanted to ask a bit more of a personal question so like as a Founder in The Cutting Edge of AI how do you balance your time between explore and exploit like how do you yourself stay on top of like a field that's rapidly evolving and becoming larger and deeper every day how do you stay on top so I think this question has um I mean we explore on the science part on the produ part and on the business part uh and the way you balance it is is effectively hard for a startup you do have to explore it a lot because you you need to ship fast uh but on the science part for instance we have like two or three people that are like working on the next generation of models and sometimes they lose time but if you don't do that you're at risk of becoming irrelevant and this is very true for the product side as well so being right now we have a fairly simple product but being able to try out new features and see how they pick up is something that we we are we need to do and on the business part you never know who is actually quite mature enough to to use your technology so yeah the balance between uh exploitation and exploration is something that we Master well at the science level because we've been doing it for years uh and somehow it transcribes into the product and the business but I guess we're currently still learning to do it properly so one more question for me and then I think we'll be we'll be done we're out of time but you know you've in at the scope of two years models big models small that have like taken the World by storm killer go to market Partnerships you know just tremendous momentum at the center of the AI ecosystem what advice would you give to Founders here like what you have achieved in the pace of what you have achieved is truly extraordinary and what advice would you give to people here who are at different levels of starting and running and building their own businesses in it around the AI opportunity I would say it's it's always day one so I guess we yeah we are uh I mean we got some mind share but there's I mean there's still many proof points that we need to establish uh and so yeah like being a Founder is basically waking up every day and and figuring out that uh you need to build everything from scratch every time all the time so it's uh it's I guess a bit exhausting but it's also exhilarating uh and so I would recommend to be quite ambitious usually uh being more ambitious uh I mean ambition can get you very far uh and so you yeah you should uh dream big uh that's that would be my advice awesome thank you arur thanks for being with us [Applause] today
Info
Channel: Sequoia Capital
Views: 27,512
Rating: undefined out of 5
Keywords:
Id: yinHx5UnYs0
Channel Id: undefined
Length: 26min 14sec (1574 seconds)
Published: Tue Mar 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.