Unleash the power of Local LLM's with Ollama x AnythingLLM

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey there my name is Timothy kbat founder of mlex labs and creator of anything llm and today I just want to show you possibly the easiest way to run any local llm on your laptop and get full rag capabilities so you can talk to PDFs MP4s just regular text documents and scrape entire websites pull a whole YouTube video a GitHub repo doesn't matter I want to show you how to do that with the latest and greatest and most powerful models that are out there I'm going to show you how to do this with a tool called olama and then our tool called anything llm oama is as easy as it comes it's an application that you can just download and run on your laptop no GPU required and you can run a whole bunch of llms all locally on your machine now I'm going to show you what it looks like to download AMA and use it then I'm going to show you how to upgrade olama and have it work with anything llm which is another desktop applic application that can work with olama to give you full rag capabilities on PDFs text documents video audio websites GitHub repos the list goes on and on and on both of these services are totally open source on GitHub so give us both a star and I'm going to show you how to run this today on my Intel based MacBook Pro now that being said my MacBook Pro is not the best candidate for running these kind of models on it I'm going to be running a five bit quantized l 2 model it'll run actually pretty well however if you are on an M1 series chip or at least have a GPU on your desktop you're going to go way faster than mine so I may speed up the segments where we're just waiting for inferencing however the performance is basically as good as your machine can get and I also want to preface by saying that ol Lama it says right here on the website if you go to O llama.com it says that Windows is coming soon the olama team just showcased that the windows app was working on a Windows machine so you can expect that to come probably by the time that you watch this video and then for the full rag capabilities brought by anything llm we already support Windows so let's just put these two things together and let me show you how powerful this is so first things first let's get AMA set up go to ama.com and then just click the download button download it and install it as a regular application like you normally would so we have just downloaded the olama application let me show you how get it started first we're going to go and I'm on a MacBook but if you were on Windows it would install on your desktop I have the app installed here so I'm going to click on it and we're going to see a little icon in the top that pops up AMA doesn't ship with a UI in it already so this is where the only bit of technical information is going to be needed to just get a llama model running I'm on their GitHub which I got from just clicking on the GitHub on their homepage and this is a list of all of the models that are supported you can expect there to be many more models in the future and there is a little technical note that you should have at least 8 gabt of RAM available to run the 7 billion parameter models 16 GB for 13 billion 32 for 33 billion obviously I am not on a massive MacBook Pro I have about 16 gigs of RAM but I'm going to run a smaller model and so what I would like to do is just download the 3.8 GB 7 billion llama 2 model so what we'll need to do is just open up a terminal and run o Lama run llama 2 so we have our terminal open and if we type in O Lama you'll see that we get some commands here which means that we have Ama installed and everything is working so let's just copy this command that will then allow us to download and run the Llama 2 model within our terminal now if you haven't downloaded the model yet it's going to download depending on your internet speeds it'll take a bit so we paste that model into terminal and we wait for that to boot llama 2 has now just fully booted on my MacBook and we're ready to run a chat I do want to say I have OBS running on this and obviously I'm trying to run inferencing if you hear my computer taking off I'm sorry but let's just send a simple message so we're not sending too many tokens and we'll just say hello and you can see that we almost start streaming instantaneously because all we have to say is just hello now we get a response memory is retained in the model up to a certain point right until the cont overflows and it just starts getting rid of your older messages if you want something more sophisticated you're going to want to hook up o llama llama 2 running instance to anything llm and that's what I'll show you next so to exit the olama terminal just so that we can really upgrade and unlock the full power of a local llm on our laptop we're just going to run slby now we're back at the terminal and in fact uh you can keep this open cuz we may need it later we're going to upgrade AMA we're going to give it all of the bells and whistles that you already want it to have but you want private Vector database you want rag on a bunch of different types of documents you want a clean chat interface you want all of that and I know you want it and you want it for free so we have it for anything llm so all you're going to do is go to use.com and you can click download anything llm for desktop depending on what Mac you're on or if you're on Windows you would just click the appropriate button and download it okay so we have anything llm downloaded so let's boot it up for the first time again I'm going to go to Launchpad and I'm going to click on the anything llm IC so we'll see anything llm boots up and we're kind of brought to this onboarding screen because we need to First configure the instance you can change any of these settings at any time but this is how you get started now the first question is what llm do you want to use well we want to use olama so you can scroll or just type in here and go to olama and it asks for the very first thing is the olama base URL now keep in mind I'm running anything LM and oama on the same machine that's how lightweight anything l l m is what is the olama base URL when AMA boots up it actually runs a server that is very familiar if you've used the open AI API but all you would need to do is do olama serve and now you'll see that it was already running for me because I have it to where when AMA boots up to run the server and you'll see it runs on this specific address and Port you would just copy that and type in HTTP col SL and then that and you'll see that the chat model selection shows llama 2 latest already selected if I had all of the oama models available in here I could pick between them and we know that the token limit for this is 496 or at least that's what I'm going to put it as now next is embedding AMA doesn't support embedding models but you don't have to worry about that because anything LM ships with one so just click next and Vector database you can go and run chroma locally maybe you want to run quadrant or milis or we8 or go hosted with pine cone because you already have that you can do that here or you can just click next because we give you a vector database that stays on your computer and you can see the data handling in privacy screen your model and chats are only accessible on this machine same for embeddings and same for Vector database so not a single piece of our private data is ever even going to leave this laptop we do ask a nice survey just to help us improve the product but of course you can skip that now I I want to scrape the use.com website and use that for information to tell Ama about anything llm so that when I talk about it the chatbot is smarter so I'm going to make a workspace that we're just going to call anything llm so we are now on the anything llm workspace in anything llm sorry about that and we're on the default thread we could start a new thread if we wanted to you can make as many as you want but let's just pick a random one and like I said we can upload documents I can just click here and you'll see I have some PDFs I have some text documents and some other just random files but let's use the use anything website we're going to go ahead and fetch that it should only take a few seconds there we go and now we're going to embed it and again this embedding model is running locally on the machine and it's already done I do just want to point out that if you have multiple olama models running you can actually change what model a specific works SPAC is running so if you want anything LM to use mistal and you want another one to use maybe code llama you can have that kind of granularity all within anything LM all within this interface of course you can modify the prompt Snippets of text you want to return uh from your vector database maximum similarity threshold we really give you every kind of control you could want but now let's do the most important thing which is ask llama 2 running on oama what is anything llm and now because we are giving olama context plus our question and the history your machine really has to Crunch all those tokens and so the streaming may be a bit slow for me and that's because again I am using only my macbook's CPU to answer this question so I'm going to let this speed up for a little bit so that we can kind of watch it unfold okay and our inferencing is done and if we were to send another message it would have history of this chat now considering that ama has already melted a hole in my CPU from having to run these models uh I'm not going to go any further hopefully you have a more powerful machine to run these kind of models how I just ran llama 2 you can run a smaller model this is just the one I chose to use but you can see that anything LM really levels up the olama application and gives you way more control on the application side of llm while olama really helps you get any local llm up running very quickly hopefully this short tutorial was pretty useful for you to figure out how to run a 100% private local llm with full rag capabilities on your desktop today in less than 5 minutes thank you and let me know if you have any comments or questions

Info

Channel: Tim Carambat

Views: 48,921

Rating: undefined out of 5

Keywords:

Id: IJYC6zf86lU

Channel Id: undefined

Length: 10min 14sec (614 seconds)

Published: Wed Feb 14 2024