Local LLM with Ollama, LLAMA3 and LM Studio // Private AI Server

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone Brandon Lee here from virtualization out 2 and today we're diving into an exciting topic Ai and specifically how to set up a large language model or llm on your private AI server using WSL or Windows subsystem for Linux olama and llama 3 models among others there are also other tools that we're going to take a look at one in particular is LM Studio this is a fantastic way to keep your data secure and under your own control without any prying eyes on what you are querying so let's get [Music] started first up what exactly is a large language model or llm well an llm is a type of artificial intelligence designed to understand and generate human language these are the types of AI that have become all the rage over the past couple of years or so they are trained on vast amounts of data and can perform a variety of tasks such as answering questions summarizing information writing code and even generating creative content llms use deep learning techniques to predict and generate text based on the input that they receive and this makes them incredibly versatile tools for natural language processing what are are the advantages of Hosting your llm locally instead of hitting public services such as open AI chat GPT running llm models locally has numerous benefits the biggest advantage that we can point to with running these locally is the data security side of things your sensitive information stays within your own server in your own home lab Network or Home Server Network that you have control over so you're not having to insert API keys for open AI or another public AI service to access the language model you can also gain more control over the model performance as well as the ability to do things like fine-tuning and this allows you to create customized Solutions tailored to your needs especially for businesses or companies looking to leverage artificial intelligence with these large language models without sending their data to public resources now first up let's talk about Hardware what do you need do you need anything special or specific to run these large language models on your own Hardware yes and no to run lolms effectively you'll need some beefier than average Hardware to have a really good experience it's great if you have a high performance CPU or at least semi-modern CPU at least 16 gigs of memory and preferably an Nvidia GPU for Optimal Performance an SSD of course is always nice in 2024 and also 100 gigs or so of free space is recommended so you can download several language models and work with those simultaneously so now let's get started let's jump right into what we need to install to start working with these lolms locally first we're going to set up olama and olama is an open- Source tool for running local lolm models on Linux or WSL you can install it extremely easily with a single band let's take a look at that command to install a llama in Linux specifically WSL or Windows subsystem for Linux as you can see here the curl command that we're going to use is just a simple curl command that hits the ama.com install.sh script it's going to pull that down and then execute the shell script from The Bash prompt as a note we can also download this and install it on Windows devices as well as Mac OS now the Mac OS client is a ga release the Windows app currently is in a preview release so for the rest of the video we're going to stick with the Linux installation or simply just running a llama from WSL which you can easily do in your Windows installation now that we have a llama installed we can download and run a popular model such as llama 3 that is heavily downloaded really a great model and we can easily do that with a simple command notice what this command looks like we're simply just going to type in ama run llama 3 and as you can see once we type in and initiate that command it's going to start downloading the Llama 3 language model and extracting that into a usable form in our olama instance running in WSL this will download the model locally which is how the magic happens behind the scenes this is the whole purpose of what we're trying to achieve we want the model to be local no internet connectivity required no other dependencies to reach out and hit external apis once downloaded you can start interacting with this directly in your WSL terminal and let's see what that looks like as you can see here we are initializing this chat session directly from the WSL terminal after we have loaded the Llama 3 language model and you can just start asking questions like you would in chat gbt directly from the WSL prompt now we want to mention this as well while we are working with and demonstrating llama 3 it is just one of the many models that have been trained and are publicly available you can go to a website called hugging face and let's take a look and see what that looks like it has hundreds if not thousands of openly available AI models that we can download and readily use on on our own Hardware we've looked at seeing how easily we can do this we can pull down language models we can start running this directly from our WSL prompt now let's say though that you want something a little more userfriendly than a WSL terminal they want to be able to access from many different devices not just your Windows host that you're running WSL on there is an open-source project that is awesome it's called open web UI that does just that for your private AI server and what is even better is that it is simple to download and deploy using a Docker container and we can just issue a simple Docker run command to get this open web UI up and running and start interacting with our API that we have running locally for our language model now as we can see the docker run command is a simple command that allows us to expose Port 3000 to the internal Port of 8080 we're just simply issuing a Docker run we're exposing the external Port of 3000 to the internal Port of 8080 and notice the olama base URL what this is going to point to is our WSL instance that's running on 101 14966 and it's just a HTTP port and then we're simply pulling the container for the open web UI container now that we have the open web UI container up and running we can access it sign up and then configure our settings if we still need to tweak things like selecting the model and we can also configure or reconfigure the address that is configured to point to the API so what we're doing here is we have spun up the container which is the web front end and we're simply telling it the address for which we want it to access this API that's running inside of Windows subsystem for Linux now I do want to mention a hack that I needed to do in my home lab server to make sure the docker container where I was running the open web UI container instance was able to access the WSL instance of the olama API I needed to add a proxy connection that allows WSL to be accessed via the machine's IP address instead of the WSL only Local Host connection or 127.0.0.1 now here's what that command looks like it's a netsh command we are entering the netsh interface Port proxy and we are telling at the listen Port which is 11434 which is where the AMA API is listening we're telling the address the connection port and then we're telling it the address that it needs to connect to now this allowed me to solve an issue with the open web UI Docker container where it was not initially able to connect to the AMA API address and Port once I put in the port proxy everything started working as expected next now we can start chatting with our local large language model like we would with any publicly available lolm such as open AI chat gbt and others we simply select the language model and then we start asking questions I think these are such a great way to empower your home labs and accelerate learning for instance what if you want to write an anible Playbook to update your Linux virtual machines we can do that with a local language model also what if you want to write a power Cal I script to select all virtual machines that have an ISO connected we can do that with the large language model that we've downloaded and that we're hosting locally another pretty cool solution that doesn't require as many steps to set up is a solution known as LM Studio LM studio is not an open- Source solution but it is a free solution that provides a readymade dashboard as such that also has the guy built in once you install it and you can simply access the language models that you want to download and download it right from the solution so it's like an all-in-one package where you don't have to spin up oneof doer containers for each part of your local llm application these llms are perfect for businesses that don't want to use publicly available Solutions like open AI chat GPT and others due to privacy or compliance concerns also it is great for H lab enthusiasts who are all already self-hosting Solutions and they want to do the same with their AI resources companies can also fine-tune these language models that are provided generically and they can train them for specific tasks such as a product specific knowledge base imagine like a chatbot on a support website or like a ticketing system while setting up local llms is relatively straightforward you might encounter challenges like Hardware requirements and man ing model configurations as we've seen it can be a bit quirky putting these Solutions together but all in all it's not very difficult to do if you have a few minutes and a guide such as this video make sure your setup meets the necessary specifications to enjoy smooth performance and having that discrete Nvidia GPU will allow the experience to be one that is enjoyable when you're working with the technology and that's a wrap running a local llm is a powerful way to leverage AI while keeping your data secure and under your own control if you found this video helpful please give it a thumbs up and subscribe to the channel for more Tech tutorials do stay safe out there guys please do keep on home labbing and I will see you in the next video [Music]
Info
Channel: VirtualizationHowto
Views: 6,746
Rating: undefined out of 5
Keywords: artificial intelligence, ai, generative ai, private ai, chatgpt, ollama, llama3, lm studio, home lab
Id: HfyDHm3wqjo
Channel Id: undefined
Length: 11min 56sec (716 seconds)
Published: Wed Jun 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.