How to Turn Your AMD GPU into a Local LLM Beast: A Beginner's Guide with ROCm

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
those of us with Nvidia gpus particularly ones within a vrm have been able to run large language models locally for quite a while I did a guide last year showing you how to run vicuna locally but that really only worked with Invidia gpus support has improved a little including running it all on your CPU instead but with the launch of amd's rock software it's now not only possible to run large language models locally but it's insanely easy now if you're just here for the guide skip to this time code if you want to know how this works so damn well stick around technically speaking Rocka or formerly known as radon open compute platform isn't actually new AMD launched it back in 2016 although the more recent stable releases came really within the last year or so rock them similar to nidia's Cuda is a platform of tools that allows your graphics card to act as a general purpose processor rather than the specialized Beast it normally is it turns out that having thousands of cores available to do work simultaneously is pretty handy Nvidia has generally dominated this space as developers need to integrate Cuda into their applications to leverage the benefits of offloading work onto to the graphics card and until now AMD hasn't really had a comparable option that developers could even integrate there is and was open CL but if you've ever tried using an AMD GPU for a compute work you'll know that that's not great Rockham then offers a wide set of tools for a bunch of applications it's open source and free to use marking a significant change from nvidia's proprietary licensing Arrangement Rock actually comprises of a whole bunch of different tools for machine learning there's a few interesting ones Mi Vision X is a set of computer vision Tools Mig graphx and torch Mig graphx allow pytorch to run on AMD gpus and there's Mi open which is a straight open-source deep learning library these tools together allow programs like LM Studio to include rockim support with seemingly relative ease and that means for you as the end user you get better computer performance and support for things like large language models to run locally it also turns out that you don't need a 7900 XTX to run this well gigabyte sent over their RX 7600 XT a card that AMD marketed specifically for use with large language models thanks to its Hefty 16 GB of vram this bad boy is currently retailing for around £330 comes with a triple fan cooler only needs two Apen PCI power connections and as you'll find out in an upcoming video it's actually a pretty decent gaming card plus it turns out that it is perfect for llms so how do you make this work it's ridiculously easy head to LM Studio's website specifically LM studio. a/ Roam and download LM studio with beta roam support once it's downloaded you'll need to let it close itself and then open it again and then you're like 90% done head to the search Tab and find some models now if you want to use say Lama V2 AMD recommends the Q4 korm version from the bloke uh just hit download give it a minute and then head to the chat tab on the right hand side are all of the setting the key one to check here is the LM Studio detected your GPU as AMD rocket check the GPU offload check checkbox and set the GPU layer slider to Max uh select your model at the top and then that's kind of it use it like chat gbt except here you can even change the system prompt which works impressively well actually using this is ridiculously smooth and fast I'm genu L amazed like this is a 7600 XT this isn't a topend th000 card it's a budget card and yet I'm getting responses faster than chat gbt in Gemini it's crazy how fast this is and even more interestingly the vram usage with this 7 billion parameter model is insanely low it's only using around 6 GB which well I appreciate that is how much of V Ram many of you have in total but you know in my experience with using llms on Nvidia cards vrm usage seems to claim a lot in use and the 360 TI with 12 gig VR Ram that I often use gets overwhelmed pretty easily one of the interesting features of LM shio besides its pretty amazing design and ease of use is the access to the system prompt right on the sidebar that means that you can change how the model responds quite you know drastic Ally too the system prompt for those that don't know is basically the bit of the prompt that the software interjects before your prompt to set the context for the response chat GPT system prompt is likely incredibly long and details things like the tone of the response as well as I'm sure an awful lot of limitations dpd the parcel delivery company had a bit of an issue with their AI chatbot in January and during that time someone managed to get it to provide its own system prompt I'll put it on the screen so that you can pause and read it in full if you want to but in short it is a wall of text limiting how the llm should process the response things it shouldn't do and things it should I don't think their prompt was amazing I mean the bot was caught swearing at people so clearly not but it gives you an idea of what you can do to get the sort of responses you want I've got a pretty simple example here too I told it to answer as a a trusted friend as kindly as possible the tone ver your difference in the replies is amazing it goes from being preclinical to being a very weirdly uncanny valley sort of friend it starts using emojis in the responses that's a pretty big difference I'm still really Blown Away by the performance here not only are we barely touching the 16 GB of vram we have on Tab but it's responding incredibly quickly regardless of the prompt of course this is the 7 billion parameter model but there are you know higher count versions available I did try using vicuna 13B and that loaded fine using around 10 GB of vram that's still pretty great I did try loading the 30 billion parameter version 2o but that failed to allocate the 20 Gigabytes it wanted so it just wouldn't work at least on this cart I can't imagine what the 70b you know models would need instead there are hundreds or thousands of different models available so obviously I can't test them all here but here's what you should look for if you're going to give this a try the primary thing to find is a model that lists full GPU offload possible that should mean that it'll load uh on your GPU and and vram rather than run on your CPU in system Ram although even then it seems actually like it isn't heinously slow on that either so could be an option but that should run as as fast as possible otherwise the higher quantization bits that's the Q number and the file names the better uh quality you generally get uh 4 bits seems to be pretty common here you can try higher or lower depending on what you need and otherwise just experiment with it and I would love to hear how you get on in the comments down below I will leave a link to LM studio in the description if you want to try this out yourself I think Rockham the rocken beta version is technically limited to uh radon 7,000 series cards right now although uh that may just be a branding thing and it might work on other cards again have a play and see how it works otherwise if you want to see more videos like this one more tech reviews and that sort of stuff hit the Subscribe button turn on the Bell notification icon feel free to check out plenty of other videos on the end cards including a review of this Monitor and maybe the uh the uh other AI guide that I did as well with the text generation web UI and yeah otherwise thanks for watching hope you enjoyed it we'll see you on the next video
Info
Channel: TechteamGB
Views: 9,194
Rating: undefined out of 5
Keywords: techteamgb, Tech teamGB, GB, tech, Tech team GB, machine learning, deep learning, amd gpu, llm, amd llm, amd gpu ai, amd ml, ml with amd gpu, lm studio, rocm, lm studio rocm, how to use amd gpu, how to use amd gpu for machine learning, amd gpu ai guide, text generation webui, llama amd gpu, llama 2 amd, llama 2 amd gpu, how to run llm models, how to run llm models locally, llm amd, llm amd gpu, llm amd gpu guide, rocm amd, rocm ai, amd rocm
Id: VXHryjPu52k
Channel Id: undefined
Length: 9min 20sec (560 seconds)
Published: Fri Mar 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.