Local RAG using Ollama and Anything LLM

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] uh this is going to show you how you can run uh a GPT like system without the internet on your own system but more uh impressive we're going to learn about how to do a rag system on your computer what is rag well rag stands for requirement augmented generation essentially it is is a way for you to talk to your documents PDF files and such um this uh is based on a open- source program called AMA until recently AMA was only available on Linux and on Mac but it's just now become available uh for uh PCS so uh if you uh follow my instructions you should be able to run it on just about any system so uh what you want to do is go to olama . here you'll see here that you can now download uh your uh the program for these systems so it's it's just like an executable file I've already downloaded it but so you'll pick what operating system that you use and it'll give you instructions uh for how to run these models and then here uh you can get updates now um up here you'll see that there are models uh you can look at uh basically the featured models most popular and the newest models uh I'm going to go briefly over these models to give you kind of an idea what to play with Gemma uh is brand new it came out uh about a week before this was recorded uh it's from uh Google and it's purported to be extremely powerful uh scoring almost as good as chaty PT 3.5 a year ago and yet it only takes about uh 7 billion uh parameters uh so it's it's very high performance llama 2 is been around since July of this year and it came from meta and it is a very popular program this is the 7 billion parameter model mistol is from France uh this is a 7 billion model this is a very high- performing uh model uh probably a bit better than llama 2 mixol is very large I wouldn't try to run this on most PCS uh it's a very big model and it basically is multiple llms combined together lava is a multimodal uh uh llm and essentially you can use this to look at uh pictures uh readwords OCR uh and um with the right interface you could upload a picture and it'll describe the picture uh neural chat is Intel's uh large language model code llama is is also from meta this one is a version that is very good at writing code dolphin is uh basically a Orca derivative uh which is basically a training set that's been combined uh with it and you'll see a lot of open orca orca these are all uh training uh methodologies to get better Improvement here is a model that I really like it's called llama 2 uncensored uh most people who work very uh a lot with uh open- Source models they prefer the uncensored models because they perform better uh we all saw what happened to Google with their debiasing and how it uh basically forced the llm to giving answers that were ridiculous um uh in these smaller models the uncensored models tend to perform higher Pi is a two billion parameter model from Microsoft it performs very well and then some of these other mini ones are also so uh for for applications are quite small uh you can uh can run uh uh these Pi models now there are versions of AMA that run on the Raspberry Pi as I showed to you before earlier it does run on Linux and those models will run um on the Raspberry Pi uh okay so uh you download this and uh what you'll need to do to run is uh you'll need to go to your uh Powershell and uh run it and and what uh you'll have to understand is is that the uh AMA is both a server and a client and uh and so it uh basically uh sets itself up as a server and responds so what we want to do is you start with the Commando llama then run and then you list the model that you want to run now first of all we're going to see what models I have downloaded so by doing list I can see which models I have so I am going to uh run uh let's say U llama 2 uncensored so I'll do o Lama run llama 2 Dash un censored now what it'll do now is it's going to load that model up the model will Stay Loaded for 5 minutes then it'll eject that model I'll show you who was the 16th president of the USA he R was 16th pres of the USA that's correct all right so we now have Alama in the server mode we do basically we to do is we type slash by in order to exit uh from the uh chat mode then type Alama serve and we put it in the server mode now for the next step here is we're going to look at another download which is called anything llm and anything llm uh used to be only available on Docker but they've now uh created it as um a uh uh exe so it's very very easy to load now all right we want to go to the page use anything.com download this is uh and as you can see here you can download for the Mac again uh either Intel or uh for uh the uh Apple silicon and uh then there's the windows version so pick the version that you want and then download uh anything llm all right so I'm G to I have already done that so I'm going to bring up anything llm so we're now into anything llm and I need to set this up first we're going to go up to this wrench here and uh here you can uh give it a uh your message that you're going to give to start and okay so uh you can set uh the appearance the llm preference we want to go into this and we want to go to uh olama so you want to click this as your model you can see here there are the choices you can uh use open the eye which means if you have an API key with open AI you can do that uh you can do uh Z's opening eye and thropic Claw 2 uh Google's uh Gemini uh hugging face uh LM Studio which is another local one local AI uh together AI or directly with mistal uh you can uh use but we're we're going to be using um the Lama and as you see here it already knows where the server's at and I can now choose the different uh models that I want to run and um this one I'm going to pick llama 2 which was the one that we were using before save the changes and then I'm going to close this I'm going to go into uh my user area and I'm going to say uh what year is it it's uh having to go through the process of loading the model again I uh wanted to show you uh but we can have it do things like uh write a poem about orbital debris okay not very long um write a joke about AI why did the program go to therapy because it was feeling insecure and needed help okay not very funny all right so now what I'm going to show you is how the rag system works here so what I can do is I can take uh a document and um and then I can feed it here and uh it will uh then be able to read that document so I I just all I have to do is just go in and drag uh a document and uh I'm going to pull this one up all right I'm going to now uh I am now going to find that document up here and I'm going to move it into end the workspace I'm going to move it into the workspace now when I embed it it's going to take that document and it's going to embed it so that the llm can actually see this document okay so I asked uh what is it uh the the cber and it basically said the cber had passed uh this February um and that uh there's going to be other opportunities uh that is correct uh that I'm going to say please read through the SBI document and give me an outline for a phase one sbir so it's telling me what the phase one is instead of giving me the outline okay d okay so now it's finally given me the outline now remember this is running locally it's not using the internet and there's no uh charge for tokens okay so I'm going to finish out here um I've shown you how you can uh run AMA on your system how you can chat with it uh without a user interface and I've given you one option which is anything llm that you can use uh to chat with it as well as have it red doents uh this program can also um uh uh take web pages and it summarize them as well so this could be a very very powerful Tool uh the more powerful your computer is the larger and more capable models that you can run this is a 7 billion parameter model so it's uh kind of a lightweight and as I said mistol is probably a better model to use than than llama so anyway thank you once again for uh watching bit-sized AI helping you survive in an AI world
Info
Channel: GovBotics
Views: 9,088
Rating: undefined out of 5
Keywords:
Id: iB6ZMJyyauM
Channel Id: undefined
Length: 15min 6sec (906 seconds)
Published: Thu Feb 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.