Unlocking The Power Of GPUs For Ollama Made Simple!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
sometimes you want another GPU for whatever reason maybe you don't have a GPU or you'd rather not hear your fans all the time you want your local machine to run a local coding model but need access to a different model for other tasks as well maybe you're streaming with OBS and don't want things to to lag you're running different agents and you want them to respond together faster there's probably an infinite number of reasons to need more gpus so let's search for a solution we need a cloud provider that has access to Virtual machines with gpus not just shared gpus that have generic names where you don't really know what's there like on Azure but to know that it's an Nvidia card for which a Cuda driver actually exists now I have friends who love paper space for doing this which is now owned by digital ocean paper space is pretty awesome but apart from the fact that I seem to have to make quota requests every freaking time I start a machine they get approved 20 minutes later but it's always a pain on the plus side they are pretty much the only reliable source of windows-based instances with GPU that I've been able to find now I don't use a lot of windows but I did for this channel recently I know others who adore fly. iio but I felt it was a bit awkward to use I spent a while with the docks and I just couldn't figure it out I've tried using services like Lambda Labs or gcp but I always ran into issues getting access to a GPU and I wouldn't know there was a problem for sometimes as long as 5 minutes eventually you see a red exclamation point in the console saying no no more gpus in that region try again tomorrow try again and then you try a different region and hope for the best why can't they just tell you where gpus exist I have no idea well that's what I find to be really amazing about brev they make it super easy to find a GPU somewhere on the planet with a few different providers usually a fraction of a second of latency doesn't really make a difference for this use case so if the machine's in Singapore or anywhere else where it's late at night doesn't really matter to me and you pay them rather than signing up for AWS and gcp or others though if you do have an account on those platforms you can integrate it accordingly now you might be thinking that this is going to be very expensive sure maybe if you needed the machines to be running 24 hours a day but how long do you use models on any given day two maybe 3 hours 5 days a week so you could do that for about 6 seven bucks a month I think that's pretty reasonable and the instances are up and running within a minute or so let's check out how this works I'll log into my brev dodev account and click the new instance button so which GPU do I want I tend to go for a T4 it's cheap and it's pretty fast often 40 something tokens per second a lot of the time pricing changes depending on what's available but I'll often choose spot pricing to drop that lower and here's what it costs when recording this video the price seems to move around a little bit depending on I don't know what it depends on I can give it more or less dis and then set a name let's call this one remote AMA and click deploy so far in my experience the machine is up and running in about a minute maybe a little bit less then AMA takes another 4 seconds to install because all the GPU drivers are already there you know often on gcp when I do get an instance even if specifying a machine learning optimized instance I'm waiting for 5 minutes or more just to install the Cuda drivers and then pulling and running llama 2 or Gemma or another model is another 20 seconds so that's pretty quick so fully up and running in about a minute and a half did you notice how I logged in normally with most Cloud providers you'd have to give it an SSH key at the beginning or download one to connect or download some other file to connect but with brev you don't deal with any of those things you install one command when you set up your account called brev I can type brev shell D- host remote AMA and I'm sshed into the host perhaps even cooler is I can type brev open-- host remote olama and vs code opens with everything set up to work against that remote machine I think that's pretty cool but I would like to be able to just run olama and have my olama client access the remote machine so there are a few steps to getting that working we need to tell the client machine where the AMA service is running we need to tell the AMA server that we should accept requests from other machines and we need to enable remote machines to access the brev server that last option can be the easiest and it can be the hardest on some platforms you might just grant all access to all visitors and that is super dangerous and that's stupid probably just stupid because there are search engines out there that make it easy to find open ports all over the world I tried it once about four months ago and found dozens and dozens of Alama servers wide open don't do that it's amazing how much free compute you can get if you one try and two don't have no morals rev doesn't allow you to just open things up anyway they offer you the ability to open up a service in their UI that you can share with folks and then you use brev to authenticate and get access to the service if you'd like to see that I can cover it in a future video but the approach I like to take is to use tail scale tail scale is like really secure VPN done really really simply it is amazing how quickly you can be up and running and for three users even with a custom domain it's free with a 100 devices I don't know about you but I don't have 100 devices beyond that it's like six bucks per active user per month now you would think based on this I was getting some sort of Kickback from tail scale but it's just a really cool service now I'm not going to show setting up tail scale from the beginning but I can if you want just ask for that below remember my goal here I want to add remote olama to my tail scale Network so I'll choose add device in tail scale and choose Linux here's a shell script to run copy that now run brev shell D- host remote AMA and paste the command then on the remote host run sudu tail scale up it gives me a URL to open and that will log the machine into my network depending on the provider actually running this host the name and tail scale may be different I'll rename it to remote olama we're almost there now on remote AMA we need to add an environment variable to tell the AMA service to take requests from remote machines so we need to set _ host to 0.0.0.0 yeah the right way to do this is to run sudu systemctl edit. service the first time we do this we get a blank file add service at the top then environment equals _ host equals 0.0.0.0 that's a little strange the second equal sign is inside the double quotes save that out and then sudu system CTL Damon reload and then sudu system CTL restart olama to restart the service okay we're in the final stretch when you set up tail scale you get this cool icon in the menu bar on the Mac I assume there's something similar in your Windows task tray or in some Linux command you should see remote AMA listed there so in the terminal on your local machine run _ host equals remote amaama run llama 2 and boom you are in and running llama 2 on olama on a machine somewhere in the world and it just works when it's time to stop the instance visit brev dodev and click the delete button now you might wonder why I put the environment variable on the line when running the olama client well maybe I'm running olama on this machine for help with coding if I set the environment variable at the right way for the service it'll screw up the service I just want this to take effect for the CLI client and let VSS code and the local service continue to work fine so what else can you do now that you have Ama running with tail scale well maybe you want a web UI that you can run from your phone that'll just work and maybe You' do that with your regular machine instead of a hosted server maybe a friend has a super powerful machine that you share access to and that becomes your Alama server that you both use usually if setting up remote networking was easy you probably did it wrong but tail scale makes this and so many other situations super easy and you did it right and that's why I like brev it makes something hard super easy I guess that's why I like olama so much it makes something that's pretty hard super easy now I want to go back to something I showed earlier on the command to open a shell to a brev instance was brev shell D- host remote ama if you leave off the Das Das host it'll log into something else and it may take a while longer before it works what we're doing with brev isn't actually their main product their main thing is making backend instances for your collab notebooks they want you to have access to super fast and Powerful machines for working with Jupiter so when you start up a new instance it spends a while getting a new container good to go so that you can then drop into a new or existing notebook this is a world I haven't really gotten into but they have some amazing resources on the brev dodev website for learning all sorts of AI and machine learning and other topics using their notebooks and instances you really have to check out this video by Harper Carol where she goes through fine-tuning based on her journals it's amazing and that's what I have for you for this time let me know if you have any more questions I'm watching all the comments all the time and love to hear what you to say thanks so much for being here goodbye
Info
Channel: Matt Williams
Views: 20,078
Rating: undefined out of 5
Keywords: large language models, artificial intelligence, machine learning, local llm, ai chatbot, deep learning, large language model, local llm model, local llm chatbot, tailscale, secure access, local llm with internet access, tailscale setup, tailscale docker
Id: QRot1WtivqI
Channel Id: undefined
Length: 11min 52sec (712 seconds)
Published: Thu Feb 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.