GPT4All Story, Fine-Tuning, Model Bias, Centralization Risks, AGI (Andriy Mulyar)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the secret to trading good machine learning models is almost never iterating on like the next best like mathematical model or the best uh like new or deep learning architecture the secret is having high quality clean data welcome thank you Andre for joining me a quick introduction Andre is the co-founder of nomec AI and nomic AI is a startup dedicated to making AI more accessible to everyone through data interaction tool Atlas and their open source GPT for all models which version 2 was just launched yesterday and we're definitely going to be talking about that welcome Andre hey Matt thank you so much uh great to be here cool so I think first I I would just love to learn a little bit about you um where where were you born where'd you grow up how did you get into technology sure yeah um so I actually emergedly from Ukraine I was born in Libya it's a town uh City in Western Ukraine um I moved to the US when I was four uh my parents my family won the Green Card Lottery actually to come to the US so I'm an immigrant um and we moved straight to uh Richmond Virginia so this is like a town or a city about an hour south of DC it grew up my entire life there um I actually at one point uh wanted to be I wanted to be a YouTuber at one point in my life uh but then I I started I picked a python in like seventh seventh eighth grade and uh started kind of programming from there um actually one of the first ever big projects was I wrote a Minecraft server from from scratch that kind of like got really involved uh in programming I ended up going to a high school that sort of focused on uh like programming and computer science and math uh and after that like sort of like fell in love with it I've been doing that ever since um and college got involved with AI I worked at a natural language processing lab uh this was like right before uh the whole like uh deep learning meets uh language uh craze started in like like 2016 28 2017. um sort of rode that wave uh published some research papers um worked at a startup for a little bit of time uh doing uh like trading large language models on medical data um started a PhD after that uh doing interpretable machine learning and then uh sort of saw the world moving really fast uh had knowledge of how to how large language models worked and how to make how to make them work um and just saw that I can make a bigger difference uh I think I think building in in the open building uh building a startup as opposed to doing my PhD and I dropped out last May and to found novik and and you mentioned you were doing some early NLP work um what what kind of stuff were you working on as at such a young age yeah I mean um not necessarily young a lot a lot of people do work in like research labs in their undergrad um so I was supposed to be working on uh like like structured data extraction from like text documents in my early undergrad so these things is called a it's something called named entity recognition I was working on that uh basically just like under the mentorship of a professor of undergrad University uh learning the ropes uh eventually I got a chance to do an internship at Johns Hopkins under this professor Mark dredsey I learned a lot there that's kind of where where I first learned about language models and all of that that was like 20 2018 2019 um and that sort of like sort of like shot me into this like direction of uh of uh of artificial intelligence like a career pretty much and so okay so you dropped out of your PhD program to found nomec AI how did that come about how did you what problem were you solving how did you identify that problem I'm I'm an entrepreneur myself so I love hearing those kind of the the founder stories yeah yeah so basically um right when I was finishing undergrad um covid started and I was offered this like very very unique chance to work at an early stage startup uh who had a lot of uh medical data and they were training a large language model to basically summarize uh that medical data and they were presenting it to Radiologists they had like a whole like product built out and everything um and I was put on the team that was actually training a large language model so like making them do well on uh on on on at the task at hand and one of the big problems uh that you have with trading large language models is that large language models they hallucinate um and the reason they hallucinates uh there's many of them one of the core reasons is because of the data they're trained on uh there's like spurious correlations in the data um and obviously if you're building like the medical domain what happens is if you're a large language model hoodie slates you don't have a like large language model powered product because the medicine you cannot hallucinate you can't make things up that'll possibly kill people um so this is like a very very important problem to work on early on this is like 2019 pre-gpt3 we were working to try to solve this problem and we were able to solve it and the way we were able to solve it is by doing very very uh exhaustive intensive data quality and data cleaning um operations over top of the data sets we were training the large language models on uh so I kind of learned from the from like a very early point in my like getting career that the secret to trading good machine learning models is almost never iterating on like the next best like mathematical model or the best like new or deep learning architecture the secret is having high quality clean data um right and I I left uh that that that uh that started after about a year I learned a lot met some really great people there and then let's start my PhD and I sort of saw the world was moving really really fast um and everyone was starting to realize that large language models are going to be this New Foundation that the future of computing gets built on um and I do I knew a lot about them and I wasn't literally leveraging it um so I met up with one of my um one of my co-workers from that from the startup I worked at and we were just like hey like we know the secret that no one else knows the secret is the data um let's build something that allows people to very easily go in and curate data for large language model training um yeah so we kind of let me let me let me stop you for a second so you mentioned spurious uh correlations in the data first what does that mean exactly what causes those and then what are the qualities of a really high quality data set yeah so sprays correlation this means when uh you have pieces of data that when you have pieces of data that uh you train your you train your machine learning model on that allow the model to make connections that it shouldn't be making because it allows the model to make a shortcut in its reasoning to solve the problem that you're trying to solve so for example um if you could uh like like one one common thing that is a very bad spirit is correlation uh that causes for example model biases is when uh say you're trying to do a task like predicting like somebody's credit score right uh this is something that is uh an automated process nowadays these credit courses credit score systems that give you a credit score these are automated models um and they condition on like a bunch of attributes of like the humans for instance like they'll take in like your previous payment history they'll take in the area you live in um but oftentimes uh like there's very very um there's very very simple attributes that if you don't tell them all to not look at it can use to start being able to like reject or accept people for loans for example like zip code ZIP code is this very very like strong razor uh that you can use to say okay if I accept everyone in this ZIP code and deny everyone in this ZIP code we're going to get much better outcomes for our like let's say like Credit Credit scoring system but obviously that's like not a correlation you want to use because it's going to like marginalize a giant like large groups of people but that's that's the kind of correlation the model can very easily make uh spurious correlation um that you don't want it to be doing making when it's uh doing its end prediction so when it comes to large language models they can do this exact same thing because they're trading on large quantities of text and they spoil these correlations exist in there as well and so so nomic helps to identify those spurious correlations but what are the qualities of uh like how do you tell when you have a really clean good data set yeah so there's a couple things right so the way people evaluate um like the quality of a machine learning model is usually by taking some amount of data that the model has never seen before uh that they don't see during training and then they evaluate that model on there and the kind of two forms of evaluation there's quantitative evaluation so you can commute a computer metric like accuracy just like count up the total number of times the model was right over the total amount of times it tried and then there's also a qualitative evaluation so you can have humans look at the model's outputs for example uh rank the model's outputs and then you can use statistical tests to say like oh this model is better than another model because humans found statistically significant Improvement in the models like qualitative Behavior now uh the problem with large language models is that they produce text um and to evaluate the quality of a model that produces text it's very very hard to use automated metrics uh so like accuracy how do you how do you say the model is Right more than not when uh it's just it's just generating paragraphs of text out so one of the things that um uh we set out to do is to be able to give people the ability to look at large collections of text in a very very easy manner this is what this is what this tool called Atlas that we set out to build was um it's this tool that's instrumental to number one uh looking at the data sets of uh larger language models uh because it's it shows you immediately what are the contents of those data sets so you can very easily find for instance things that shouldn't be in there uh and that's for a domain expert to decide but that sort of thing uh it was it was not possible before and people would just literally take a million documents of text sample 100 of them and just manually read through them that used to be the process um but the whole point of building out Atlas so you can look at all those one million documents of text on one screen in a pre-organized view that starts giving you those insights into what could possibly be wrong with your data so you can make those manipulations before doing like a costly model train for example so I I love the tech side of things you Atlas must be dealing with and just enormous amount of data and being able to visualize that as super impressive impressive can you just give a high level overview of the architecture needed to to have a product like Atlas yeah so like I guess the question is how do you look at tens of millions of documents of text on one screen and make it such that a human can understand what's going on yeah there's kind of a few components to it uh number one uh you need to use the power of AI to do it so Atlas in the back end is actually mostly AI driven uh what happens is uh data gets uploaded into Atlas and Atlas turns every single data point uh with internal machine learning models that it has on the back end into a string of numbers it's called a vector so it represents every single piece of text that comes in as an embedding vector and then what we do what we have in Atlas is a very very complicated but also uh once implemented very fast and very a very simple method uh to take those large collection of embedding vectors that every document corresponds to that that corresponds to the set of documents you uploaded and then project that into two Dimensions so it takes this High dimensional space that the neural network views the documents in and shows it to you in two Dimensions that a human can interpret it Atlas is in some sense uh a a method of looking at your data Through The Eyes of a neural network and that's what that two-dimensional map is anytime a neural network thinks you're you're anytime a neural network thinks two pieces of data are similar the atlas map reflects that to you putting the two-dimensional points in a similar location and that's how that's how you get the human interpretable view of uh of a neural networks of a neural network over top of your data and since you have the power of a neural network now to look at your data you can make the same kind of inferences the neural network would would allow you to make what what is a typical user of Atlas look like who who are they within the company what title do they usually hold um and and what does their day-to-day look like in the product yeah so like Atlas users uh Atlas has grown very much from um from developer driven growth uh developers find that oh my God I have 50 000 documents I have a hundred thousand documents that I need to go through because I need to use it as a model train or I just have this large collection of data sitting around I don't know what's in it or I have this like giant uh AWS S3 bucket with a million images I don't know what images I have um and they naturally find this as a solution to get get a sense or get a pulse of like what's in their data set and uh a lot of the growth has been really people are uploading like small samples of data uh they show it to like their bosses their product managers and they're like holy crap like this is extremely useful we finally know what's in our data sets and Atlas that also gives you sort of like um interaction over top of that it's like once your data is in you can manipulate it so you can uh go in and slice and dice the data set filter it in different ways and sort of get like a a level of interaction just just beyond the sort of exploration over top of the data um and yeah the kinds of users um so everyone from like data scientists who work from a lot of with a lot of unstructured text or images um a lot of librarians actually uh get a lot of joy out of using Atlas because it's the first time they can like see their entire like digitized archive collections all in one place uh kind of one of the beauties of Atlas is that the links that you make are shareable so if you upload a million documents you could just copy paste that link to somebody else and then they can see your one million documents but in a way that they can start exploring it themselves and it can can Atlas also be used by more amateur uh AI Builders as well yeah yeah so like uh the whole point um of Atlas is to be able to go in and do exploration of data sets without having to even put up any money um we offer like a very generous free tier like 50 000 data points you can just go in and for most sort of amateur Builders who are like tinkering with their own models that's more than sufficient sort of like to get started um we also offer a pretty big tier for like researchers for example uh researchers can go in uh we get we we we we grant them large large data sets they can put in they can use part of their research projects um and so if anybody watching this video wants to take advantage of the 50 000 credits how would they do that um here's the secret users sign up and you start using it it's already there you have it awesome um okay so at a certain point and I think it was released a few weeks ago you had the idea of GPT for all so I I want to learn about how from your building gnomic you have the Atlas product out in Market when did you first think to train your own local model and we can even take it from a higher level what what is GPT for all but first I want to hear about how how this creation came to be yeah sure um so I guess I'll start at the high level uh gbt for all is a large language model uh that you can run locally on your computer you don't need a GPU you don't even need that fast of a computer uh it only takes a little bit of ram which is sort of like the memory your computer has to get going and uh it's not as good as for instance like the sort of open AI chat but it's not as good right now as the Open Eye chatter like gpt4 but it can do a lot of this a lot of the same things that it that uh these sort of like in Cloud offerings can do but you don't have to send your data to so kind of like a third party everything runs locally on your computer you have the Privacy there you can run it under in your internal systems if you wanted to if you were at a company um uh so that's what gpg GPT for all is it's a GPT model that isn't that that it's it's a it's a uh it's a large language model that isn't Bound by some third party that you have to go through to get access to that additional technology that uh large language models give you um so kind of how we got started with it is uh like I said uh the whole reason we built Atlas in the first place is because we sort of made this key realization that the secret to making good large language models is high quality clean curated data um and we're building out Atlas um and nobody really cared about large language models up until like the general public didn't really care about or know about the existence of large language models as this sort of like pivotal new technology that was going to have done in this new era of like a new uh when was this uh this was when when when did people start like yeah when did people start I mean I I kind of know I I would I would say like October of 2022 November of 2022. so we were about six months into building Atlas at this point uh and so for about six months we were working on Atlas and uh we knew that this problem existed we knew that uh like higher quality clean data sets contributed to better large language models the problem is like nobody cared about large language models uh so there was like sort of this mute point of like like what is this pretty scatter plot that you're showing me like this looks so cool but like I don't quite see how it's useful um that we would get a lot um then everything sort of like uh uh like we were basically early to building this thing out which was cool because we had to build out when the when the when the world realized how important this new foundational technology of the large language model would be to every part of like the software's deck um so about that time somebody started thinking about okay maybe it's about time to sort of uh like demonstrate that hey Atlas is actually the tool that you can use to curate your high quality data sets for your large language model trains um and the issue is that a lot of people still couldn't access technology even if we made one um at that point um there to run a large language model typically uh this is like honestly before um March of 2023 like a month ago from the recording of this video uh you needed to have a GPU which cost you a couple thousand dollars to plug into your computer or you needed to rent one for a couple dollars an hour from like a uh online cloud provider and that was just like beyond the reach of most people in the world uh most people in the world couldn't have access um and the access that you could have for example through an API AI like open AI or um or anthropic or cohere the issue with them is that they were blocked in a lot of countries like China didn't have access Russia didn't have access um uh yeah there there's a lot of questions Italy has band axes for example uh there's just a lot of these blockers around even getting access to this technology so we started thinking like hey uh what can we do how can we how can we go about uh like getting getting something that is not only a demonstration of the power of Atlas uh in its data cleaning abilities but also something that can actually be beneficial to like end end people uh who would use this model so what we did uh is uh we did nothing we waited and magically this uh Facebook dropped this GPL license model called llama which is probably the best uh Foundation model that exists whose weights anyone can act well you can access it through through a license uh they've been linked on the internet but you should probably go through Facebook to ask to ask for them that's the ethical thing to do um and uh that model uh proved to be this sort of uh good foundation to be able to to for you to be able to fine-tune large language models from uh to create your own for instance like chatbot like systems and there was a kind of a group of hackers that emerged um one guy from Bulgaria uh this guy uh his uh his GitHub is gergenov who started working on who started working on this uh CPU compatible version of llama that runs really really fast without the need of a GPU to be in your computer um and what this basically does is it unlocks the utilization of these models for a gigantic class of people who just didn't have access to the resources to run them before um so we we kind of realized this uh the we saw the alpaca model success in being able to uh take this and uh fine-tune the model on a data set and it releases everyone we're like okay I think we can do this uh at One Step better because we have this tool Atlas to be able to curate the gigantic data set uh that is needed to actually make a strong assistant type model um so that's what we did uh we sort of basically were in the right place in the right time with the right set of ingredients uh put them all together um and uh we released GPT for all a couple Tuesdays ago um and the one kind of thing that we did that made a difference from everything else is he released sort of like a one-line command you can run and then magically on your computer you have the model running locally uh and sort of like this like remove this sort of like extra friction that everyone else had around uh using these sort of like on CPU models that existed um and that sort of like one one thing that we did uh by removing the friction by giving like a very very distinct set of instructions to use it uh causes basically explosion of people acknowledging the fact that hey these models are not just not just these magical magical black boxes that exist in the cloud of open AI or whatever other large language model provider you can run these on your own computers you can have access to this technology yourself yeah yeah I mean when I when I first saw GPT for all I I was absolutely uh Blown Away thinking you know about four gigabytes of hard drive space was really all I needed somewhat of a modern-ish computer and and I can have a pretty pretty darn good model running locally on on any device um did you think it was gonna take off or be adopted at the rate it did because I mean when I put out the video about it uh that video got a ton of views especially for you know my relatively small Channel um and and so many people were into it and so many people were downloading it and installing it did you expect that yeah we had everything completely planned out uh I'm kidding of course not um like Our intention was to basically do a demonstration that uh like clean data is the recipe to success when you want to train a large language model um it was just we did these few extra steps that made it very easy for everyone to have access to the kind of artifact that we made um and it all sort of opened my eyes to this sort of idea that there's a large collection of people uh also companies who really don't like the current model uh for accessing this fundamental technology they don't like the idea that they have to send their data in like API requests and have that data be done have these companies do whatever whatever they do with that data and store and store it they don't like the idea that all of their employees are putting their company's private data into this like chat interface that gets soaked into a into into a few few companies hands people really want to access this technology because there's a real Paradigm Shift happening right now in the developer world if you're building an app that doesn't doesn't have a large language funnel plugged in you're going to be behind the curve um but there's many many applications that just fundamentally don't allow for data to be sent externally from maybe they don't have internet access um and it really started opening about opening up my eyes to like the kind of pent-up demand that the world had for something that people can run without having to jump through a third party's sort of like net uh to use and I think a lot of people are just liking to hack on it too they want something locally that they can play with rather than just a kind of a walled off API to to hit um so I think that's probably part of its popularity is the fact that yeah you have it on your own computer and and you can you can fine tune it you can you can mess with it you can play around with it and extend it um that at least that's exciting for me yeah the other sort of the other sort of big thing is um we made a we made a very very uh strong point to do everything in the most open way possible um usually when people release let's say like a machine learning model um or a paper about a machine learning model uh they don't release sort of like the the whole set of things they release some subsets so they'll release the model weights or they'll release the paper uh hopefully they release the data too um but usually for most large language models uh that people train uh the data is a secret sauce the data is what uh companies hold dear to them the data is usually uh taken from sources that uh the companies don't have copyright to so they actually can't release it without putting themselves in like a a legal gun pointing situation um so they don't release it um we were very very um we're very very adamant about releasing the entire stack we we truly believe that this technology can only grow fast and can only grow in the right direction and can only grow in a safe direction if the whole Community had its eyes on it while it was being built and that's that that that's why we released it uh data included all for free to the world yeah this I mean this feels like the very early days of the internet um I I was pretty young at the time but um you know there was AOL which essentially was the internet and then all of a sudden Broadband uh and and browsers took over and then you know that is why we have a vibrant uh internet now um so I I want to say I really appreciate that you put that out and you open sourced everything um and I also want to say congrats you launched version two uh which is based on gpgj we'll we'll talk about that in a moment um but I want to I want to talk a little bit about uh specifics of GPT for all and um maybe this is actually a good jumping off point to to jump into your your new launch which which came out yesterday um first talk about the differences and then I put out a video yesterday about a GPT of for all J which also you know version two we'll call it um and there were a bunch of people who had questions and I'd love to go through some of those questions as well with you so first let's talk about what's the difference what did you launch yeah sure um so GPT for all uh the original model um it was trained uh from an existing large language model so train means a fine tune in this case uh so we collected a large data set of responses from open ai's Assistant API um and then we continued training the large language model that Facebook released called llama um on this data set to teach it to be how how to be an assistant uh and a system is sort of like the kind of interface that you see when you interact with a large language model through like chat.openai.com for example um and uh yeah we paid open AI for the data uh data back to us uh they have a clause in their terms of service that says uh you can't build a competing product with us uh and we weren't trying to build a competing product we were releasing this out research purposes for the world to interact with and use um and uh the model was released uh the model itself uh the GPT for all model can't actually be the original one can't actually be used for commercial purposes because that model is derived from uh a GPL license the GPL licensed lava model so the weights of the Llama model are under this thing called the gnu public license which allows researchers to use it but if you want to use it for uh commercial purposes you have to talk to Facebook so turn it up for a second so you you built your fine tune model on top of the Llama model meaning that your fine-tuned model inherited that license exactly exactly a condition of the new general public license is that derived Works inherits that licensing and we wanted to respect that um and we plastered it everywhere uh it's at your own risk if you use that original model in in some setting that breaks that license um for example if you were to go to build a big big business over that model uh what would happen is that Facebook can come after you um who knows that they would but that's a risk um they're breaking the license now uh the very first thing we did is when we realized that people cared about large language models that they owned and that ran on their hardware and that ran with their data um is we set out to sort of fix this licensing issue uh because there was just basically a whole swath of people who wanted to start building immediately uh without having to rely on these like third-party dependencies um or sending their data to third-party services and they couldn't because the building was basically stopped by the fact that hey I can't build anything that might potentially like make someone money uh because of this license so we set out to build uh over top of a backbone uh so this is a foundation model that was not a GPL licensed so we chose just for starters this model called um gptj GPT gptj is a large language model that's trained over top of this data set called the pile um the pile is just a conglomeration of like a bunch of the internet scraped down um there's for example um the Enron emails from the Republican and run cases in there there's a bunch of stuff you can you can find it on the hugging face website if you want to look at the contents of it we also have some Atlas maps of it and that is an open source model in itself yeah so that model was sorry the the pile is completely open anybody can get access to that yeah so the anyone can get access to the pile uh calling the pile open source is difficult uh calling data in General open source is difficult most of these large language models that have been trained by people are riddled with copyrighted data going into them um every single large language model that you've used likely has been trained on data that's copyrighted um and there's probably gigantic legal cases that will be fought over the next coming decade about this uh but this is just like the reality of things um there's no way to prevent this people are scraping the internet uh to to train these models so uh kind of besides that point um uh gbtj was released with an Apache 2 license by Luther AI a few years ago uh it's certainly not the best model uh it's certainly like qualitatively and quantitatively in terms of metrics a worse model than lava but it's also a model that the world can build on um so we took uh basically uh we were hoping to ship it fast uh out to the world so they can have a model that they can build on that's that's uh openly licensed uh the issue is that the model was kind of a worser quality so it required a lot more say work on our end and iteration RNs to get something that would have quality that uh people would be happy to interact with and get some utility from um so we had to increase the amount of data we were putting into the fine-tuning process of the model uh we had to do a lot more data set curation we actually made a bunch of Atlas maps in the middle to sort of like help us debug the model while we were training because the model was basically learning to memorize things as opposed to learning to like generalize like an assistant uh due to like the data composition they're kind of a bunch of like Technical Machine learning problems that we had to sort of solve machine learning and data problem that we just saw to get that model out but we actually released it yesterday um so it's so there is okay so there was Technical Machine learning and data problems to be solved uh there was also this new element that we had access to uh that allowed us to actually ship it and that's this uh to eight thousand nine thousand Personnel community on Discord uh that all believe in what what the GPT for all movement is doing who are contributing their time and effort to help us actually make this possible it would have been impossible to release the gptj model without certain key members of that Community who sort of stood up and dedicated and contributed and donated tons of their time uh to to the efforts what are they actually doing uh yeah so like every people were contributing everything from like suggestions on data like what they wanted to see this new model be able to do uh they were also contributing their actual like developer hours so there's some brilliant developers who are in the community um and we sort of like gave him some like rough suggestions on like this so that these are the things that we think should exist um so the actual code that makes the Llama model for instance run really really fast on GPU on on CPU so like the GPT for all model that's based on llama run really fast on CPU you can't actually use that same code to make gptj run fast on CPU so there was this one developer who went in and wrote a completely custom version of the CPU code all in like a very very low level programming language from scratch well not exactly from scratch he derived it from the existing code but he had to do hundreds of lines of code alterations uh and it's very complicated code to write and he did this all for all all for free and open llm Future um and it would not have been possible with with the community community's contribution to do this um so like there were there were separate several sort of difficulties that existed that had to be implemented to get a Apache 2 license model out there to the world that runs fast on CPU um which is what people need because if the model does not run fast on CPU uh its utility to 99.99 of the people is not is like nil yeah I mean when I installed the new model a few you know a few days ago I that was one of the first things I noticed was it was very very fast um I I already thought gpt4 was pretty fast but um yeah certainly the new model seems very fast so um so let's let's go into some questions I mean I I had a bunch of people ask questions um letting them know I'm talking to an AI leader and then also especially after releasing the video yesterday about GPT for all Jay uh they they had quite a few questions and so I'd love to dive into those with you um please yeah um so you you mentioned just now it's powered by a CPU a lot of people wanted to know how can they actually power it with a GPU is that possible is that coming if it's not possible yeah so if you we actually released the raw model weights that are so to get a model running fast on CPU there's this process called quantization that you have to run which basically takes the like 30 30 25 30 gigabyte per uh file that contains all the models parameters um it makes that makes that file a lot smaller uh but still preserves the quality that at the parameter that the model waste have so that uh the model is still good when it's smaller um as the way is the way I'll describe that process at the high level um and what what we were able to do is we put the model through that process that's what's running on CPU but we actually also released the actual end model weights uh the ones the the the the 25 gigabyte 30 gigabyte parameter uh 30 gigabyte file of model weights and those model weights you can natively put those onto a GPU and run them uh no problem at at high speed um now that's that's something that you'd have to do yourself you have to be a little bit technical to do that um but there is in the short-term roadmap uh there's definitely the uh the desire for the community and also people in the community currently working on it uh to implement uh like GPU support into that sort of sort of like uh launcher that we release the chat interface that you saw uh with the GPT for all J launch awesome okay and um during the installation process you get a couple warnings uh unverified developer warnings some people said like hey this looks like a scam or it's a virus right right you know obviously we we spoke so I knew where it was coming from I knew nomik as a company maybe just briefly talk about why it says that and and why people should trust the download link yeah I mean so like you guys are working on the bleed you guys are interacting with something that's on the bleeding edge of uh what's possible to translate from research to uh a use a useful object that you can interact with like all of human history's knowledge on a single file um unfortunately apple and Microsoft don't work that fast uh so we are waiting to get the certificate signed from Apple and Microsoft and in a couple days you will you will not get those security warnings when installing the file okay so it's as simple as you're moving faster than you can get approval from yeah and we and we bias towards releasing it getting feedback starting iterating early before waiting for you know all the all the all the eyes to Dot and the t's to cross and then uh getting that feedback so going back to GPU usage another thing that a ton of people want to know about is fine-tuning the models on their own data I know gbt for all the original had some instructions for doing that I know it requires a pretty high-end GPU or Cloud GPU what's the plan for the new version of it in terms of fine tuning and then maybe you can touch on briefly like high level what does that flow look like if I wanted to say uh you know if I have a hundred documents and I want to fine tune a model on it what is that high process high level process look like yes this is something a lot of people have asked for um unfortunately it's a kind of a complex question to get right um because there's the one question of like can I fine-tune this there's another question of can I fine-tune this to get something higher quality there's another question of can I find do this on my own data and not and actually get a model that still acts like the previous model but just knows a little bit more about my data there's like there's like a lot of layers to this question I would say the best answer is so number one um GPT GPT for all J uh the instructions to train the model uh are all in that GitHub readme the gpg for all GitHub readme oh you can go there there's a single command you can run if you rent a big enough machine gpus from a company like paper space you get the ability to uh to train it yourself um now there's a question about uh like hey can I do this for cheaper do I have to spend a bunch of money on gpus unfortunately right now um answer answer that question is is is is no uh you you still will need to you will still need to have uh a GPU or a few gpus if you want to train with a lot of data um because you need a lot of data to get a high quality model out um you will still need gpus to train it uh unfortunately training uh like deep neural networks uh these models are gigantic uh they're just the model itself to like load it up onto your computer's memory you have you need to have like Ram that's like 32 gigabytes of RAM just to look just to just to load the raw model weights up that you'll be training this is like not consumer grade Hardware um that most people have like like the actual raw GT for all uh J model uh if we didn't put it through the whole process that makes it run fast on CPU uh you would likely not be able to load that model up on your like max memory and and and so maybe when a lot of people think they want to fine-tune a model they're possibly just talking about giving the model access to additional context like yep I have this PDF or I have this Excel document that has a lot of my personal information on it I want to be able to ask questions about that document so maybe rather than fine-tuning the model what they should be doing is thinking about how do I plug this data or give it additional context maybe you could speak to that a little bit how they might do that either with the old model and or the newer one yeah um yeah so you're exactly right uh if your goal is to get some usefulness out of the model um to like solve a task like I don't know uh have it letting you answer questions about some PDF document you have whose texts you have accessible uh likely what you want to do is not actually uh fine-tune the model so that is keep training the model so what you can do um is something called retrieval augmented Generation Um and what this means is you ask the model um you're able to take put you're able to give the model uh a bunch of basically a prefix of information that you wanted to use when it's answering your question and then you can ask your question afterwards so you can prompt the model with a bunch of information um and then you can ask it a question about that information then it'll get it'll give you and if the model is good enough it'll give you an answer now uh if you have a lot of information so imagine you have like a dozen PDF documents or two dozen PDF documents uh you have to do some some more clever things um and usually what you want to do uh if you're just getting started and you want like a good tutorial on this the best place to go is this um get up this uh this this like uh large language model ecosystem um uh building app ability equals called link chain it's l-a-n-g chain um and what they do is they give you sort of like recipes with like really really readable uh human interpretable tutorials on how you can get started using these so GPT for all is actually integrated in the link chain you can use it as a model back end and they give you all these great tutorials on how you can do this sort of like retrieval augment on augmented prompting to allow you to get some of the benefits of what you would get from fine-tuning a model without having to actually like spend your money on gpus and this sort of thing okay and and so if somebody did want to fine-tune a model what would they need to actually output something that is better than what the base model already is um yeah so likely what you would need um is you'd have to first rent a uh giant uh machine uh that has eight gpus attached to it called the dgx uh a100 um and this machine costs like a couple hundred thousand dollars into 100 200 to actually buy out right uh but you can rent it for like 12 to 25 an hour depending on your provider I'd recommend paper space if you're uh the reason I'm recommending paper spaces because they actually helped uh sponsor their GPT for all on getting the models trained we would not be able to move as fast without them so I'm really thankful for for the team there um they um you can rent a machine from them you then you have data that's properly formatted to train to keep like training and assistant uh and now what it means to properly format data to training in the system it means you need to have basically examples of conversations that the model can learn from um so examples of conversations over top of your data so you would give for example a uh some section of one of your PDFs you asked a question about it and then you give then you would have to write that answer and you would need probably a couple thousand examples like that um so quite a large number of examples to be able to like effectively fine tune uh one of your models to make that work now um that's a thing that's a large technical lift uh the everyday person uh should not be expected to be able to do that and I I can assure you the ecosystem over the next like four to six months will develop such that this becomes a more and more accessible thing for everyday people to do okay now a couple quick questions uh people are asking about multiple languages uh does GPT J handle multiple languages does it handle it well Which languages doesn't handle and if not what's the plan yeah I mean so like a GPT for all should be able to support every language to be able to support everyone uh the world is not just English um now to support multiple languages kind of like two parts of that recipe part number one is you need to take a foundation model so you need to take that sort of like base backbone architecture so let me cptj or llama and that model needs to be trained to learn about different languages um and the way you do that is you make sure that the data you're collecting to pre-train that model with to start teaching about language has a has a diversity of languages inside of it um and then the next step is to make sure that when you're fine-tuning the model to be an assistant or to follow instructions for example you need to make sure that you have questions and answers that are not just English um that's the way you do it um and these are all in the again short-term roadmap for the next model iteration uh we're growing the data set by the hour um and uh we're growing the diversity of the data set by the hour as well um so yeah okay all right great yeah I I mean access for everybody all over the world regardless of language is definitely important um and uh I I another question that I got quite often is what's the max token length yeah so what does that what does that mean I guess to sort of like prefix so max token length means how many roughly how many words you can put in uh to the model before uh it doesn't know about the previous words that you gave it when it's sort of like giving you an answer um and a token is roughly on average two and a half sorry uh a word is roughly an average one and a half to two uh tokens um that's that's that's that's so like the way the way computers uh sort of like encode in text uh is a little bit different than the way you for example split like a sentence on like spaces or like on periods or something like this um and uh the rough equivalent is like one um one one English word would be like roughly like one one and a half to two tokens on average uh so for instance like the word like supercalifragilistic Expialidocious might be like like eight tokens um yeah yeah 2048 tokens uh and this is small uh compared to uh like a large language model you can use for instance like via like a like an API uh like open AI or anthropics model for example um but uh there are plans in the medium uh to short-term future to be able to have models that support much longer token links obviously and does that uh token length include both the prompt and the response or is that separate okay so that's an important distinction so your prompt plus the response is 2048 tokens yep exactly exactly and the reason and the reason for that is the the model when you're when you're using it it needs to take in all the text that you've given it prompt included response included previous questions included such that it can make the uh next most likely best answer uh right yeah so the next topic I want to talk about a lot of people had very strong feelings about uh in the comments section and I've heard this in a lot of places is uh model bias um first let's talk about what is model bias does it exist does it not exist and why does it occur and then we can dive into like who gets to decide and and so on so I'd love to hear kind of the high level about is there model bias and and how does that actually happen yep yes this is a topic that obviously a lot of people have a lot of opinions on and there's many schools of thought and a lot of these schools of thought to fight with each other all the time um but I think about it like this um so a model's biases are functions of the data that the model was trained on so if a data if a data set is biased and a machine learning model is trading that data set uh the model is not going to magically learn to ignore those biases it's going to inherit them this is why for example in that that the example that I mentioned about the credit scoring right uh you would look the current scoring model is very very susceptible to picking up biases about data it was it was trained on uh conditioning on the spurious correlations in the data set to do things that maybe humans wouldn't wouldn't perceive as like acceptable uh for instance like take one group of people and always reject them verbal loan uh because compared to another group of people um large language models are not any different uh they are machine learning models trained on large amounts of text and they learn the biases under an underlying underlying data you can go in and actually there's tons of papers out there that compare for example uh the way the like the the sex that a large language model assigns like different um to different like job roles for example uh if you go in and uh ask like um what what what what what is the name of this nurse and then let the large language model continue it uh it's much more likely to put a woman's name versus a man's name and vice versa for example like a plumber um and this is just a fact because these are the kind of biases that it's learned from the data that it's been trained on now uh in the context of the gbt for all models uh these models uh sort of get bootstrapped uh to get bootstrapped uh they needed a lot of data um the apis that we used uh have a model that exists in them that we distilled the model from and now these apis have all the biases baked in from the creators of those models um so the original model that we released has these biases baked in as well um but the cool thing is is that now that we have access to the data and the data is open sourced and these data can be manipulated by the community anyone can take this data set and sort of engineer the model to be at whatever level of filtering or whatever or have whatever biases that it wants to be now this could mean bad things can happen right a person could engineer the model to have biases that are maybe not aligned with what um with what most humans most humans can want uh but you can also do like vice versa as well um and like this is like a like a difficult topic it's very sensitive A lot of people uh like disagree with this whole whole idea of like releasing models out into the open because you can have basically people can do people can trade them themselves to basically output whatever kind of biases that they want uh these models can be plugged into automated systems that will uh if the models are if the models have harmful biases propagate those biases throughout the world um my one defense to that is would you prefer a centralized organization to have all that power um I would personally prefer these models be built out in the open for while while everyone has access to seeing how these models are made uh the appropriate defenses can be built if these models are built out in the open to these sort of place yeah so we're going to talk about centralization in a few minutes um I I want to touch on the biases a little bit more so when when a company like meta who originally trained the Llama model or Luther AI gbgj are the engineers making decisions day to day of the biases themselves are they deciding um and and and or or is the bias inherent in the data that's being put together and that's just the way the world is or the way that the data reflects the way the world is yeah I mean you know the answer to this question like No No One's Gonna No One's Gonna purposely go in and try to inject inject some sort of bias into a model and then you know like try to hide it or something like this at one of these organizations um it's it's a function of just sort of the data that exists on the internet um and if you train on parts of the internet that have like like content that people generally find like immoral or like not good the model will have those biases as well um uh yeah but the thing that does happen though is when you train a model to be an assistant and have a system-like behavior that requires a very very high level of curation of the data and those models are specifically Engineers to have certain types of biases um and those biases are usually viewed as like generally good by uh by people uh by governments and that's why the models have those biases for instance like if you look at open ai's model it'll refuse to respond to you uh when you ask to do things that are maybe illegal or maybe if it's uncertain about something for instance giving medical advice it'll refuse to do that which is arguably a good thing um but also you have no alternative if you have a use case for example where um where you needed to use the model to do something that required for instance like maybe like fake medical advice because maybe I don't know you want to make a game or something like this uh then you're you're completely out of luck and and so um it sounds like all models have bias because the data that is building the models have the bias um is there should people expect more or less bias in a local model versus uh a cloud model like chat gbt um I I think like quantitatively qualifying if you should expect more or less bias um you can't you can't make this like a like a blanket statement the biases of a model are the biases that the trainers of the model have instilled in it especially if you have like an assistant like model or an instruction following model um and you just need to be cognizant of the data that model was trained on this is why we open source the data set for GPT for all and this is why we put it into a viewer that this what didn't require you to like page through every single data point but you can see all the all the data points pre-organized in a view that lets you immediately go in and say hey that data shouldn't be in there because Yeah in our original release uh we did a lot of work querying the data we didn't do a perfect job there were tons of people who were looking at the atlas map and saying hey this data shouldn't be in there that that would put a bad bias into the model or a bias that doesn't align with what we're doing and that's the kind of criticism that should be happening that's the whole point it's the whole point of releasing the data yeah and so so help me understand there's bias in the data and then there's filtering and those are two different things right no they're they're all they're all the same thing filtering is just another word of like going through and like manipulating the bias to be like slightly different for example please so for example if you wanted to train an AI model that doesn't always say uh as an AI language model I ca so for example if you have a data set that is one of the ones like the ones collected from openai that model has a certain type of bias instilled into it uh any model that you train from that data set will also learn to have that bias now you can manipulate that data set that you received from one of these like Cloud providers um to not have one of those biases by manipulating the underlying data so for example if I was to go in and remove all the data points that have this uh as an AI language model line the model would be less likely to refuse to respond for example in certain scenarios so so in in the nurse example that you gave like give me the name of a nurse it's much more likely to give a feeling yes Mary for example right but then what I thought filtering was and and just please correct me is if you know for Chach EBT if you ask how do I produce a drug any any drug right and it's not going to tell you so that is different from the bias and what would you call that or is that also that's that's the same thing um that that filtering is a form of bias in itself you're filtering to align to a set of moral standards or a set of legal standards that's injecting advice got it so there can be bias not only in the original data but applied afterwards by whoever's creating the original model fine-tuning the model it's just layers of bias yep and so if somebody is asking for an unfiltered version of GPT for all what are they going to get and is is there still going to be bias in it I mean we kind of talked about the answer but I want to hear it from you yeah so I mean every model will always have bias um if you like a model that responds to more things and isn't hesitant uh there's a way to make a model that is less biased towards doing that uh an example is removing all the data points during training which say like hey I can't respond to this query because of x y and z um that's actually what we did what we just did um so for example we released the model yesterday uh overnight it was clear people were not liking how restrictive it was uh so we kicked off a train and uh in a couple hours there will be a model that people can use to interact with that is a little bit less restrictive um but the sort of big demonstration here is that once you have things out in the open it's very very easy to make these sort of data interventions um but there's always there's also bad sides to it like uh I'm not gonna I'm not gonna sit here and like try to like prophesize like the the the the the the the the the the the beauty and like how the whole world is going to be amazing because like data are open like like this does open up Avenues to like bad things happening um but I truly believe that by opening up avenues for people to interact with interact and visual and and be able to understand the data sets that these models are trained with it'll allow us to arm ourselves as like as like humans to be able to combat like Bad actors using this technology um one thing that comes to mind all the time is like this idea of like um I I call this bring your own bias uh so this is like for example uh maybe a company like Pepsi uh really wants the world to drink a lot of Pepsi so what they could do is they could uh pay some creator of a large language model to replace the word coke with Pepsi and all the data that it ever sees that gets trained on so whenever you ask the model when you're working with it uh to help you like I don't know write an email or something like this and you ask it to like write an email to tell your uh tell your friend to get you a drink well that drink will always be Pepsi no matter what which is obviously very useful for Pepsi as a company because you know they'll sell more Pepsi um but this sort of thing uh will happen in the next in the next decade it probably already is happening um and it's just something that you can't prevent um but there are systems there are systems that can be built and there are researchers currently working on it uh when the data is open uh to be able to sort of mitigate the harmful effects of these sort of things okay so that dovetails really nicely into the centralization discussion um open AI is fully centralized GPT for all is open sourced talk a little bit just what what is the difference and you know we've touched on it a lot but let's summarize like why why does it matter and should there be two options and then um I yeah let's let's I want to hear from you like why is decentralization so important to you yeah I mean so look I'm not attempting to like uh this this like my my objective is not to like disparage these large companies they provide a big value to society um they provide they they first demonstrated that you can take this sort of like foundational technology and put it into a useful form that humans can generally like use as co-pilots in their everyday lives like there's tons of applications being built over top of the ecosystem um there are tons of amazing useful things that you can do with the help of models like models produced by like open Ai and Tropic and cohere um they're doing they're doing great Services um but when these companies were originally formed uh they were kind of formed under a different ethos and they're operating over now um if you go look at the original open AI Charter from like 2016 or 2017. uh their goal was to be a company that can safely drive Humanity into an era where an AGI exists and make sure that it doesn't do bad things uh and that and that Humanity doesn't like get destroyed because somebody builds a technology that um because technology doesn't get out of hand um is a way is a way to summarize that thing um and the issue is that over the years uh because of like financial reasons obviously um like they got more and more restrictive uh they got more and more closed Source uh they got they started sharing models less and less the last the last Model that they actually released to the public for anyone to use uh was gpt2 I believe uh maybe don't quote me on that but that's that's the that's the last large language model they released after gpt3 that model was held behind their API and they would only release for example like a paper about it a paper detailing like this is the rough thing that we did to train it this is kind of kind of what the data looks like this is kind of how we clean the data this is what the model can do and here's this little like uh here's this little place where you can send a computer request to and out comes the model output um and then people started using these systems uh chat GPT came out they gave sort of like this like human interface layer to these models that made them useful assistance that people you know paste in their problem alums and maybe get someone to like sit on there basically like like sit like sit by them and help them answer that's basically what chat gbt is someone it's a co-pilot that you can use uh to help you solve your everyday problems um whatever that may be uh their plug-in system is like an addition to this right um but then gpt4 came out and gpt4 I think is when the research Community got really really scared of what was happening um because that was the first time when they didn't publish an academic paper but published sort of a marketing paper it was a 96 page white paper that was basically without a data set section without a method section and they actually explicitly said that we're not releasing any of that information because of competitive reasons there's other large language model companies that exist that they're in competition with and that was sort of like the thing that gave me the initial like jump to hey like we should reconsider letting this just kind of slide uh we should look at Alternatives that's actually the thing that actually the reason I started building GPT for all uh and could have got the team at nomic behind it and then eventually the whole Community behind it was of because of the gpt4 paper like that that's not how the world should operate people that that technology should not be sectioned off away from the majority of humans for use you shouldn't need an internet connection to use that technology um and that's just like that was just a state of the world so that that makes a lot of sense and I I one one argument for centralization that I've heard not necessarily that I agree with but that I've heard is that when everybody has their own models their much more easily abused and and able to use for nefarious purposes Can you steal man that argument do you agree with that argument and and maybe give me a steel man for the centralization argument yeah I mean so like that that's 100 True um but in the very same vein like it's very clear that these uh large language model companies are more and more uh aligning their efforts with uh like capitalistic efforts like they are taking in uh they are a lot they are they are allowing sort of like economic threats to bias how they are acting in the world um and who's to say these companies in a few years uh if the whole system says centralized won't uh be doing things that are sort of generally not good for all all of the world um and like I I don't suspect that they would but uh there's nothing to stop them from doing that um the only way to stop it is putting these models in everyone's hands um yeah yeah I mean light is the best disinfectant and and there's there is a world which centralized models and Edge models can coexist and sure I I think they were competing with each other it's keeping each other honest yeah yeah I mean like in its current state uh Edge llms uh like GPT for all and the other various ones in the um in the agile Alum sort of like ecosystem that's been built out over the last two months um they they don't serve as a as a piece of competition to these large language model companies there are they are they are uniformly worse in capabilities than like for instance like the chat GPT API that you can interact with in gpt4 um they are uniformly probably harder to use um but they're also uniformly more open because you can actually use them anywhere you can go in and put them on a Raspberry Pi or like a small little computer and take them into a place with no internet and then you have access to this technology that you wouldn't otherwise have access to you could build a business around these things and not have to worry about open AI upping the prices in two years when you've built your whole business around it and then like squeezing you until until it's a really good point couldn't operate anymore like platform risk yeah there's there's many many reasons why you want to own the hardware your computer your computers run on um privacy for example as well many many companies have banned the use of chat gbt internally uh because all the employees were just the technology was so useful that all the employees were just sending all the sensitive company data over to open AI um and companies need these technology they they will fall behind their competitors if they don't use these Technologies large language models uh but they don't get access to them because of data privacy concerns um which open AI is like trying to trying to fix an event but yeah all right so you you've mentioned that these local models are not yet comparable to open ai's models or or other centralized models what will it take to get there any predictions on timeline and and that kind of leads into the point of data sharing which um you know you're by the way folks are already sharing their data with openai every single time you put in a request they are logging that they're using it to train future models so uh people who are kind of concerned about giving their data to an open source company why should they not be concerned and again what what is it going to take to get local models on par with centralized models yeah so the secret sauce to having a model that works as good as gpt4 but you can run on your computer uh is a lot of high quality curated data of examples of you interacting with an assistant model um that go in and get edited such that good responses when the model responds well you uh you have reinforced that response when the model hasn't like elaborated enough on something you change it this is the internal process that openai did to get a very very high quality and sort of assistant style data set they train these models on um that's part of the secret sauce and that's what the community can help out on um the community can help by uh whenever you use a GPT for all model uh opting in to share your data uh this is not something that's on by default but you can opt in uh and your data will be shared with something we're calling the TPT for all open source data Lake uh it's a collection of basically all the data that everyone has opted to share for the Improvement of future GPT for all models and this data Lake uh data in it is constantly accessible it's constantly visible to everyone it constantly downloadable by everyone so not only for example uh people in the gbt for all Community researchers have access to it to improve their systems um and uh it it it is the it is the place where um many organizations are sending data already um is is there any way to identify a single individual from in the data Lake based on their prompt is it anonymized um so there's various levels of like permissions um if you would like uh sort of like your your data to be tracked back to you because you want credit for example to uh if you if you want the credits that your data contributed to like these open models um you can you can take that you can tick that box and you'll you'll be able to excuse me you'll be able to uh have that information associated with you uh it's all you can you can anonymize it too to sort of every extent and we're we're trying our best to adhere to all sort of like International laws around sort of data privacy for if you're in Europe uh you can delete your data within submitted to adhere with European laws um and this sort of thing yeah so the other piece of the secret sauce is actually something that's very very hard for the community to do and this is something that a company like open AI uh was specifically uh sort of built from the ground up to be able to do that is invest a lot of capital to do a large pre-training of a gigantic model uh with hundreds of billions of parameters over as much data as you can possibly gather from the entire world um and they were specifically well capitalized from the very beginning uh to be able to do this uh they received I think over a billion dollars in Investments throughout their lifetime to iterate to experiment to push The Cutting Edge because this technology did not exist before open AI they they contributed fundamentally to the existing of large language models without them large English models likely would not have come come as fast they did amazing things for the world um and just getting to get a good quality uh on edge large language model that can uh do useful things for you maybe is even comparable with uh chat gbt 3 uh 0.5 or gpt4 uh an organization needs to go in and uh sort of do that same work uh but instead of holding it closed Source release it or openide or open AI needs to release it one of those yeah um and let's let's talk about Edge models again um I've seen some incredible implementations of GPT for all uh people are sharing them on Twitter and they're on devices that I would have never expected um what what is the coolest device that you've seen GPT for all installed on um yeah so like the model itself requires about four gigabytes of RAM to run uh which kind of puts limitations onto the like kind of device you can run it on um but what people have done is run the model on like a small CPU and then plug that model into devices like a TI-84 calculator or a DSi and then use the interface on these devices like a like on the TI-84 so I think you're using like your Calculus exams so like type in questions and get outputs out it's in a stream and it streams into the calculator um all running all running on us on on CPU um and the same thing with the DS's there's um we actually so like to encourage the community to like have some fun with this uh we actually put up some bounties where we offer like a couple hundred bucks like the best use of GPT for all um to the best use of GPT for all uh put on to like the most like uh insane Edge device you can imagine like the kind of things we suggest are like toasters and microwaves unfortunately people who couldn't figure that out but maybe soon so so what you're really saying is the entirety of human knowledge basically can be put on a TI-84 calculator um that you can just foot put in your pocket go into you know go anywhere you want and just start typing and yeah Edge devices that are really inexpensive a huge amount of human knowledge can be put directly on them yeah so not something as Extreme as a TI-84 calculator that has very very low memory um you can definitely hook a cable into a TI-84 calculator later run the model on like an external CPU and then interact with the model on there um but something as simple as like a very very old phone for example uh without internet access could run this model um and you could take it with you into the jungle and ask it like what should I eat this flower should I not eat this flower and describe the flower for example amazing uh that I mean that's so cool and it has and it has all of humans knowledge most of humans knowledge on the internet kind of baked into it and then queryable with human language I mean I I know a lot of this technology has been around uh and has been developing for years now but it in the last few months it's really insane to think about how much we've progressed with large language models where they're where they're being installed where they're being used it's it's so impressive it's so exciting yeah on the shoulders of giants um absolutely and all the all the ingredients just have really come together over the last few months to really make like a real societal impact with these things um yeah so uh last question for you a couple weeks ago a an open letter came out I think you know Wise signed it Elon Musk another a bunch of notable artificial intelligence leaders which called for a six-month pause um I made a video about it there's obviously reasons that uh you know Elon Musk and other AI leaders want open AI really they're calling out open AI to pause their development for six months of uh llms that are better than gbt4 um and there's there's arguments that hey maybe that makes sense even if it's not feasible or not even possible because you have other states and and they're not going to abide by uh you know an agreement so where do you fall on the spectrum of Full Speed Ahead or let's take a second to think about what may happen if we go Full Speed Ahead where do you fall on that spectrum and why so I personally uh so I've like skimmed the the the thing that was written and signed I saw a lot of controversy around like a lot of people being claimed to have signed it but then they're like I never actually signed this um I'm not honestly I honestly don't have too strong of an opinion on just like that whole that whole letter and the whole situation that happened around it uh the one thing I do have an opinion on is when people with like strong financial incentives uh attempt to block progress because it can help their financial incentives and I think a lot of people in the community perceived it that way um and uh to be honest I also perceived it that way originally uh but I I don't I don't have like very very strong opinions either way because I'm not knowledgeable enough and sort of like the details of that I didn't look into it too hard aside from the paper do you think that we're on a course for this technology to benefit Society in the long run do you think like how big of a risk do you think AGI really poses how are you thinking about that look Matt um I think large calls to large language models uh will become like operating system calls that your computer makes to uh like Linux or Windows or whatever the underlying operating system is on your computer it's going to be the fun it's going to be a fundamental unit of interacting with your with your with your computers um these models will be pushed onto more and more constrained Edge devices uh they will get better as time progresses uh it will get better much much more rapidly uh to comparable quality of these like closed source apis that exist because the whole Community is putting their efforts behind it's not it's not the effort of a couple hundred people at a company it's the effort of tens of thousands of people who truly believe that this technology should not be closed away Behind the Walls of of some company's Cloud um so yeah this technology is going to permeate everything there's like tens of thousands of developers right now building companies and apps with this core underlying technology right now they're building on the closed source apis and that's okay uh they're good they they solve business problems for people they they help people do their jobs they help they help students learn how to write they help students learn how to learn they're doing really good things um but also very soon you're not going to have to use closed Source apis are there going to be models that exist out there uh and if I have any saying it quickly that that allow you to actually do some of these things without having to sort of send your data through to a third party uh well I Andre I want to say thank you very much for talking to me I I want to say thank you for your work uh open sourcing a GPT model I mean it really I think a lot of people are following in your footsteps and and a strong open source Community is going to be critical to not uh let you know a single company or a handful of companies really own all of this technology and data if people want to use Atlas go to gnomec.ai check it out if you're dealing with large sets of data and you want to explore your data and help train your models please do that please also download GPT for all the new version came out yesterday and they can find that I believe at gbt4all.io is that correct uh yeah GPT for all.io the best place to go is the GitHub still um uh the ecosystem is very nascent so it's gonna it's gonna be growing uh I'm sure you won't miss it if you glue it cool and and of course uh follow Andre on Twitter if you want to get updates in in real time on what he's working on so yeah thank you so much for talking to me yep thanks everyone for uh I guess caring all right see ya bye thank you
Info
Channel: Matthew Berman
Views: 11,192
Rating: undefined out of 5
Keywords: gpt4all, gpt4all v2, gpt4all-j, llama, llama model, alpaca, alpaca model, artificial intelligence, large language models, llms, chatgpt, openai, chat gpt, open ai, agi, artificial general intelligence, nomic ai, nomic
Id: 8ZW1E017VEc
Channel Id: undefined
Length: 73min 52sec (4432 seconds)
Published: Sat Apr 15 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.