Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let me show you how to get started with Lama 3  within local GPT local GPT is my own project that   lets you chat with your documents locally in a  secure environment everything is 100% private and   nothing leaves your system the project already  has more than 19,000 stars on GitHub so let's   get started first we need to clone the repo so  click on this uh code button then copy the URL   to clipboard next we need to clone this repo  so I'm going to open a terminal and then type   get clone and then provide the uh URL and after  that I usually like to have dedicated folders   so I'm going to just call this local gpt Lama so  that I know that I'm working with uh the Lama 3   model and this will download all the files in the  folder next we need to Simply change directory so   we're going to move to the directory that we just  created and for that we type CD and then the name   of the directory that we created if you type LS  this will list all the files that are available   within this folder okay so let's clear this up  okay next we need to create a virtual environment   so for that we're going to be using conda so the  T the command is going to be conda create -n then   the name of the virtual environment so let's  call it local uh three just to indicate that   we are working with uh Lama 3 and you also want  to Pro provide the python version that you want   to work with I like to work with uh python 3.10 so  this is going to ask me whether I want to install   all the packages say yes then this will start the  installation process okay the virtu environment   is created now we need to activate that virtual  environment so for this I'm going to just copy   that command now if you see here we can see that  we are now in the local three virtual environment   all right so let's type LS again this will give  a list of all the packages now I want to install   this requirements.txt uh because it has all the  packages that we need so we're going to say pip   install dasr and then requirements. text this  will download all the packages that were needed   in order to run local GPT except one package and  that is a Lama CPP these are the python binding   of Lama CPP so depending on whether you're using  Nvidia GPU or you are on Apple silicon you want to   use two different commands so I am running this on  an M2 Max which is based on Apple silicon so I'm   going to copy this section okay so you can see all  the installation is complete so let's just clear   this and I'm going to paste the command to install  llama CPP and this will download Lama CPP as well   okay so if everything goes well you are going to  have Lama CPP installed on your local machine okay   so next I open that folder within Visual Studio  code and we need to open a new terminals now we   first need to activate the virtual environment  that we just created so we're going to say cond   activate and then the virtual environment that I  created I called it Local 3 so you can see that   we are in the new virtual environment okay if  you're new to local GPT so the way you change   change your model is you need to go to constant.  Pi file and within here you're going to see the   model _ ID and model base name now if you're using  an unquantized model you just need to provide the   model ID and the model ID is basically the address  of the hugging pH repo in this case so for example   if you go to hugging pH for Lama 38 billion  model this is the repo ID so you just copy that   and you can keep the base name none in case if  it's a unquantized model if you want to use a   quantized model so for example if you want to  use this quantize model from Quant Factory then   you will also want to provide the gguf file name  of specific quantization level that you want to   use okay so in my case I'm going to be using the  unquantized model from meta now if you're using   the meta version of Lama 3 you will also need to  log to your huging face Hub account because it's   a GED model so when you go to this page you're  going to see that it's going to ask you whether   you accept all the terms and conditions because  it's a gated model I have already accepted the   terms and conditions but in order to use this  model I also need to log in to my hugging face   account on the terminal where I'm using this model  so let me show you how to do that so for that   we're going to be using the hugging face CLI so  this is the command hugging face- CLI login then   it's going to ask as for the access token so we  need to provide our access token and in order to   get your access token just go to your hugging face  account click on settings then you might have to   create a new access token you can see that I have  a bunch of access tokens already available here   so I'm just going to copy uh this one and then I'm  going to come back here right click hit paste it's   not going to show the token but it already pasted  the token just hit enter then it's asking me to   add token as get credentials so just type no hit  enter and if everything goes well you're going to   be logged in uh to your hugging face account okay  so we are all set now we need to start ingesting   our files so local GPD comes with this paper as a  example document now in order to ingest files you   will need to type in Python ingest dopy if you are  running this on a system that has an Nvidia GPU   it's going to use uh Cuda by default but you can  also uh provide the device type on which you want   to run this by adding the flag flag device type  I'm going to be running this on MPS uh so just hit   enter and this will start the induction process  now if you run this command for the first time   it will have to download the embedding model that  local GPT uses and you can actually see which   embedding models local GPT uses by go to constant.  piy if you go up by default we use the instructor   large model which is a pretty good embedding  model there are a whole bunch of other options   available you can select one of them if you want  Even If you're looking for multilingual embedding   models we have only also listed some of them or  you can just go to the hugging face leaderboard   ofing models and select one of those okay so the  injection is complete it split the document into   1993 chunks so I'm going to just type clear okay  next if you want to start chatting with a document   then you can just directly go and run python runor  local GPT dopy again you can provide device type   if you want or by by default it will first try to  use Auda if it doesn't find anything then it will   try to use MPS and if not then it will try to  use CPU so we're going to just type that and it   will load the model right now it's downloading the  model because we are running it for the first time   but while this is downloading let me show you a  couple of other things so if you go to the prompt   unor template utils file now I have added prompt  template for Lama 3 so you can actually provide   which model you're using and based on that we're  going to select the prompt template so for Lama   2 or l 1 I was using this prom template but for  Lama 3 they have changed the prom template and I   have observed that if you don't really provide  the proper prom template you're going to run   into a lot of issues so we take care of that for  you right similarly if you're using mistral there   is a specific prom template for that and if you  don't provide any prom template then this is the   prom template is going to use so you actually need  to uh provide what prom template to use uh there   are three options which you can actually find in  this local GPT or actually run local GPT file so   let me show you where to see those okay so here  are the list of acceptable prompt or model types   so we have L 3 which is the new addition then Lama  mistal and ncore Lama so this this is the default   pred that is going to be used if you don't provide  any of these three now another thing that I wanted   to mention was that at the moment based on my  experimentations it seems like the ggf quantized   versions of the models don't really Follow The  Prompt template I don't know what's actually going   on over there I think there was some issue with  the end of the sequence token so seems like some   of them haven't implemented it correctly so that's  why I am using the unquantized word provided   by rather than some of the quantized version  that are created by the community so the model   download is complete just to confirm we're using  the instructor large as our embedding model and   it downloaded the uh meta L 38 billion instruct  version and we are running this on MPS now we   are going to be using Ora paper as our knowledge  base so let's ask a few questions which are uh   related to this uh specific paper Okay so there's  a specific uh section on instruction tuning let's   ask the model what instruction tuning is and let's  see what type of responses we get all right so my   question is going to be what is instruction tuning  and let's see what response uh local GPT creates   okay so here is the response according to the  provided context instruction tuning is a technique   that allows pre-trained language models to learn  from from the input which is natural language   description of task and response pairs this means  that the model learns to perform specific task or   functions by being trained on examples of inputs  and corresponding desired outputs this seems to   be correct all right so let's ask a more specific  question in this uh case I'm asking how does arus   performance compared to chart GPT there was a  small comparison that is provided in the paper   according to the response Ora performs at par  with text toin 003 on an aggregate across all   task and retains 88% of GP charge GPT quality  now here is the uh information that is getting   from so the responses seems to be accurate okay  so this was a quick video on how to get started   with L 3 within local GPT as for the local GPT  code base I'm going to be creating a lot more   content in near future I'm actually rewriting  most of the code base and we're going to add a   lot more advanced rack techniques such as  query expansion context expansion ranking   those are not the techniques that are currently  available within local GPT but a major update uh   to that is coming I'm also putting together  a course on Advanced Techniques uh for rag   so if you're interested in that make sure to  sign up link is going to be in the video Des   description now in a subsequent video I'll  show you how to use the Groq version of Lama   3 which is available through the grock API  within local GPT so if you're interested in   something like that make sure to subscribe  to the channel so you don't miss that video   I hope you found this video useful thanks for  watching and as always see you in the next one
Info
Channel: Prompt Engineering
Views: 8,780
Rating: undefined out of 5
Keywords: prompt engineering, Prompt Engineer, LLMs, AI, artificial Intelligence, Llama, GPT-4, fine-tuning LLMs, LocalGPT, llama3, llama 3 rag, prompt engineering full course, chatgpt edureka, chatgpt prompts, prompt engineering training, prompt tips
Id: S6PdFPoteBU
Channel Id: undefined
Length: 12min 24sec (744 seconds)
Published: Sat May 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.