Run AI Locally on Your Computer With These Tools!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

over the course of the last year or so there's been a a divide between the world of Open Source uh AI models and those that are more proprietary especially because the large language models are just so big right they've got you know a dozens or hundreds of billions of parameters uh and that can require a whole lot of memory in order to run and the question is can you run that on mere mortal or consumer grade machines uh the first generation of the stuff the uh you know things like anthropic or open AI those were all running uh you know on Specialized servers that you take advantage of huge infrastructure and run on a classic Cloud surface basis but these many uh open source models have been coming out U you know led by our friends over at Facebook uh who made the Llama model and then coming out of that you've got Falcon and mistol and all sorts of other interesting models how can we take advantage of them in ways that uh are different uh from the strong points of the production of the of the proprietary models and one key thing you can do with open source uh software is you can download it and run it in whatever machine you see fit until relatively recently this was a big problem because running it on like mere mortal machines like my MacBook here uh would be uh quite challenging uh but there's this uh new moot New Movement of foot to try to standardize ways that we can make those models run on local devices uh much more smoothly and that has allowed us to have things like LM studio theyve also made a video about uh that we can interact with models now that we download to our own machines and this works in Windows this works on Macs um it requires I mean not the super old Hardware but you know your basic workstation type Hardware if you're running a gaming PC or even not a gaming PC like a Macbook which is not good for gaming um you know you can it'll make use of the Silicon on it in order to uh interpret right the model and be able to run it locally now the the reason it's able to do that is because we have a packaging solution uh where we can take these models that uh were originally written in pytorch or using tensors or or or what have you safe tensors Etc um and and turn them into a standing format that various client side software can interpret and also organize it so it can be run through more consumer grade hardware and this is where something called gml really comes into play uh this is a project that started I think like you know seven months ago or so uh where he create where this um gentleman uh created both llama CPP and Whisper CPP basically I think originally working with whisper uh and then uh creating the Llama model uh that would um basically uh compress so that you could run the model and would fit within the memory limitations of a more um uh you know consumer grade a machine like like my MacBook and this becomes a standard right so like you have this llama CPP which is a way and a tool chain for taking um a model that you could express through um you know pytorch uh uh or or what have you and turn it into a new binary standard uh that is called ggf um which is their Universal format that then packages up their their markup language for being able to describe model in a standardized way that can then run on the consumer Hardware this can run on like Raspberry Pi like he talks about here um but the other really cool thing about it is it means that if you download a GG UF file right a file in that format it can be run by multiple applications that can all use it locally and that is the idea behind say LM Studio where you can just you know go it will it will search hugging face for you and it will find models uh that are interesting that have been converted Ed to this ggf format and then uh and and probably hm studio is one of the friendliest to use uh but it's definitely not the only one there's one called GPT for all that that I think came out before LM Studio originally supported a separate and I think a kind of proprietary it binary format but now is really standardized on GG UF the same way the LM studio is one of the key differences is that uh GPT for all is also supporting U basically a rag structure where you can be downloading uh uh downloading um uh an embedding model uh and be able to uh load documents into it and be able to ask questions of documents using the structure in general GPT for all is a little bit older than LM studio um but has this additional feature set that is you know definitely something to pay attention to and it's much leaner in terms of its UI uh so it's not containing a whole lot of CFT that you might not need uh there's also AMA uh that I think I've also done a video about before uh which is um more of a command line interface uh that is um but but but the really cool thing about isn't the command line interface itself is the fact that it will host itself I'm not sure I think I'm just showing my browser window here but it's also in my Max toolbar because it is creating a little Local web server uh that can then be communicated with by other applications you might have running right now it becomes a service that can be connecting you uh connecting other applications with your um W with AI models and an example of that actually is chat D which is um a a neat little program that allows you to um it will make use of the Llama 2 model and you can upload your uh a file into chat D it will shred it using fairly naive approaches uh and you can just start to ask questions about it uh about your about your documents it works okay um these things all have have room for improvement but what's really cool about it is that you can start to open it up because this thing is open source right the the the models themselves are open source and also the applications for building them are open source and that means we can start digging into them to figure out what are good Trends where can we make improvements uh and how can we just do much better with these things and and really the key here is packaging how can we distribute these models that will make it easier for people to download and get value from them right so LM Studio has created this nice little goey to be able to go uh SL and shop for your model and go try it out and download it as long as you have enough disc storage because these things are multiple gigabytes each um GPT for all also has a little bit of a shopper not quite as extensive um the um old llama has support for being able to both pick from its model selection as well as being able to Define what they call a model file which allows you to use any arbitrary Giga file although that requires a little bit more work and then there's an initiative coming from Mozilla uh which is called llama file and there angle on it is they're going to take the um the the the packaging parts that allow you to run a model from llama CPP right basically take one of these ggu models and just jam it right into the executable uh and then that way it becomes easy to distribute one of these models right so instead of having I need to go get a program and I need to go get a a giga file from somewhere you can just download the one single executable and be able to run that either on a local server or just running that from the command line I think this is also really neat and just shows how much variety there is out there for um being able to try different ways of Distributing these interesting models and combining them with applications that allow us to get more value out of them now I don't know where all this is going to go right I think there there's definitely llama LM Studio looks really gorgeous ol llama looks super composable into other applications uh and and sort of shows and chat D is sort of a good example of that for us to be looking at the Llama file initiative is interesting in terms of making even simpler distribution instead of having a central server on your application on your desktop uh you just have uh the individual files you're downloading and just running them each as applications I don't know which way this is going to go but the fact that the world is getting really Wilder and wooler in this world of local AI leaves me really really excited for what going to come next

Info

Channel: State Change

Views: 236

Rating: undefined out of 5

Keywords: Descript

Id: vrDZKgAKUMw

Channel Id: undefined

Length: 8min 33sec (513 seconds)

Published: Wed Jan 10 2024