End To End LLM Project Using LLAMA 2- Open Source LLM Model From Meta

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all my name is kushak and welcome to my YouTube channel so guys yet another amazing video on generative AI where I will be specifically discussing about llama 2 uh llama 2 is an open- Source model uh again it has been created by Facebook or meta and you can use this specific model even for commercial purpose uh so this is quite amazing this is an open-source llm model altogether I will try to show you how we can use this create an end to-end project also in this specific video so there are many things that are going to happen and probably whatever topics that I teach going forward that is related to generative AI I will definitely follow this kind of approach so that you also get a brief idea about all these kind of models so what is the agenda of this particular video the agenda is that we will get to know about Lama 2 then we will go ahead and see the research paper where I will be talking about the key points uh about the Lama 2 model again since this is an open source and uh soon Lama 3 is also going to come up so that is the reason I'm going to create this particular video I really want to be in sync with all the open source llm models that are coming up right then we'll go and apply and download the Lama 2 model so we'll be seeing like how we can actually use this particular model in our project also so for that purpose I will be downloading this model you have to also apply this in the meta website itself and there is also one way how we can also use it directly from hugging face so I will also show you that and after that we will try to create an end to end llm project and this will be a Blog generation llm app uh all these topics I will be covering it I know it'll be a little longer video but every week one kind of this kind of video is necessary for you all and since 2024 I have the target I really need to teach you generative VA in a way that you can understand it and use it in your industries also so I will keep a Target so every video I'll keep a Target like this target for this particular video is, L likes not thousand LHS but thousand likes and comments please make sure that you write some comments and I'll keep the target to 100 okay so this will actually motivate me this will probably help this particular video to reach to many people through which they can actually use this and entirely this is completely for free which will also be beneficial for you and I my aim is to basically democratize the entire AI education okay so let's go ahead and let's first of all start with the first one that is introducing Lama 2 what exactly is LMA 2 Lama 2 is an again open- Source large language model it can be it is used and it is uh available for free for research and commercial purpose you can actually use this in your companies in a startup wherever you want to use it okay now let's go ahead and read more about it so inside this model uh it has till now Lama 2 has released three three different model size uh one is with 7 billion parameters the other one is 13 billion parameters and the the best one is somewhere around 70 billion uh parameters uh pre-training tokens is taken somewhere around 2 trillion context length is 4096 uh again when I say that if I probably compare most of the open source models I think Lama 2 is probably very good we'll be seeing all those metrics also so here you can see Lama 2 pre-train models are trained on 2 trillion tokens and have have double the context length than llama one its fine-tune models have been trained on over 1 million human annotation okay and now let's go ahead and see The Benchmark and this is with respect to the benchmarking with all the open source models so it is not comparing with chart GPT sorry GPT 3.5 GPT 4.0 or Palm 2 okay so all the open source models uh here you can probably see this is the three version 7 billion 13 billion 65 billion 70 billion right all Lama 2 right llama 1 was 65 billion one uh one model it had over there so if you see LMA 2 with respect to all the metrics is very good MML that is with respect to human level understanding Q&A all all the performance metric is superb natural language processing gsmk human eval in human evalve it is probably having a less when compared to the other other open source models so here you can see in human uh human Val human eval human eval basically means with respect to writing code code generation there it has a lot of problems so here you can see 12.8 18.3 it is less it is less when compared to all the other open source models over here and there are also some other parameters you can probably see over here with respect to different different task you can see the performance metrics okay so this was more about the model now let's go ahead and probably and this is one very important statement that they have come up with we support an open Innovation approach to AI responsible and open Innovation give us all a stake in the AI development process so uh yes Facebook is again doing a very good work and then soon they also going to come up with the Lama 3 Model now let's go ahead and see the research paper so here is the research paper the entire research paper now see uh what you should really focus on a research paper you know in research paper they'll be talking about how they have actually trained the model what kind of data points they have they actually taken in order to train the model and all right so over here you can see that um in this work we developed and released Lama 2 a collection of pre-trained and fine tune Lune language models ranging in scale from 7 billion to 70 billion parameters so if we talk about parameters it is somewhere around 7 billion to 70 billion our fine tune llms called Lama to chart are optimized for dialogue use cases just like a chat bot and all right uh more information you can probably see over here what is the pre-training data see so they have told that our pre-training data includes a new mix of data from publicly available sources which does not include data from meta products or Services we made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals now this is where ethics comes into picture they really want to use this AI in a responsible way right so we trained on two trillion tokens uh and obviously for all these things you have to use Nvidia GPU okay I know guys it is boring to read the research paper but it is good to have all this specific knowledge so please keep your energy up watch this video till the end then only you'll be able to understand things right not only here later on you'll be having other models like Mistral I'll probably create a video on Mistral also in the upcoming video right so everywhere with an end to end project everything I will take this format let me know whether you're liking this format or not so training data we adopt most of the pre-training settings and model architecture from Lama 1 we use the standard Transformer architecture now you can understand how important Transformer is right most of the open source model are based on Transformer architectures itself right we trained using adamw optimizer okay with so and so parameters we use consign learning rate schedule with so and so and here you could probably see with respect to the performance like how well it was training BPL process tokens how many tokens was actually done with respect to all the different varieties of llama model now this is basically the training loss you can probably see training loss for llama 2 okay this is also important training hardware and carbon footprint it is basically saying that how much it is using they used Nvidia a100 I've seen this GPU it's called amazing it's very huge okay and it is very fast also but again with such a huge amount of data it is also going to take time right so all these things are there you can also see time how much time it has basically taken how many hours 70 billion this many number of hours power consumption this this all information is there right this is good to have right all all you should know like which is taking more energy and all right and here um with respect to the uh llama 2 you can probably see with respect to Common reasoning it is very good when compared to all the other models open source model World Knowledge reading comprehension math mlu math it is little bit less you can see over here when compared to the other model I think it is still 35 itself but remaining all it has basically come I think this 35 is also greater than all these things right MML is very much good it is able to achieve till 68.9 Google gini has said that it is reach to 90% okay but again this is the thing that you really need to know uh some more information fine tuning fine tuning also okay this is very much important guys it has it has used this uh reinforcement learning okay where uh and with human feedback so R lhf basically means reinforcement learning with human feedback and this is what chat GPT is also trained with right so uh definitely I think as we go ahead as we go ahead and see Lama 3 and all it is going to give us very good accuracy I guess okay so superat fine tuning uh if you go ahead and just check how generative how llm models are trained you'll be able to get a video on this I created a dedicated video where I've explained about supervised fine tuning how does supervised fine tuning happen what how does uh uh rhlf happens right re reinforcement sorry R lhf human feedback happens all those things I've actually explained so here you can see some of the prompts right a poem to help me remember the first 10 elements on the periodic table hydrogen come first as the element War helium is second for balloons this this I want you to roast me now see this statement is also very important right so uh I want you to roast me I want you to make it particular brutal swearing at me so it is saying I'm sorry but I cannot comply with that request using vulgar language or intentional hurting someone feelings is never expectable so some kind of feelings they're trying to bring inside all these kind of models okay uh sft annotation is basically there you can probably read all these things this is good to have good to learn how this reinforcement learning with human feedback was done and all everything is given over here so uh this was all about the research paper still there are many papers to go ahead you can probably go ahead and check it out uh there is a concept of reward modeling also reward is also given right the parameters they have used two separate parameters over here and various kind of test is basically done so this was all the information about this now the next thing is that how you can go ahead and apply or download this specific model just click on download the model over here so the third part provide all the information over here and what all things you specifically required like Lama 2 and L chat code Lama lamard so go ahead and just put all this information and click on submit after submitting probably it'll take 30 minutes and you will start getting this mail okay you all start to set building with code Lama you will also be getting the access from Lama 2 see you'll be getting this entirely right model weights available all the models weight will be given to you in this specific link you can click and download it also if you want so that you can use it in your local or you can deploy it wherever you want okay so this kind of mail you'll be getting uh Lama 2 commercial license all the information with all the info over here and these all models it is specifically giving again I told you 70b 70b chat why these two models are there this is specifically for q& kind of application dialog flow application I can basically say uh remaining one can be used for any kind of task uh in a complex scenarios and all okay so once you do this the next thing is that you can also go to hugging face in hugging face you have this Lama 2 70b chat FF and there is the entire information that is probably given about the entire model itself you can probably read it from here with respect to this Lama 2 is a collection of pre-trained this this information is basically there you can also directly use it if you want the code with respect to Transformer you just click on using Transformer you'll be able to get this entire code where you can directly use this also okay what we are basically going to do I'm not going to use 70 billion parameters since I'm just doing it in my local machine with the CPU itself okay so what I will do I I will be using a model which is basically uh it is basically a quantized model right with respect to this same llama model it is called as Lama to 7B chat gml so if you go ahead and see this uh you'll be able to see that this particular model you'll be able to download it and you'll be able to use it it is just like a good version but uh less parameter versions right so when we say contage that basically means uh this model has been compressed and probably provided you in the form of weight so what you can any of these models the recent model what you can do over here which is of 7.16 GB you will first of all download it so I've already downloaded it so I'm just going to cancel it over here okay because I've already downloaded it over here okay so I will do that specific download uh over here and then you can probably go ahead and start working on this and start uh using this and now how you can probably use it I Will Show You by creating an end to end project so for creating an end to end project what are the steps uh again the project name that I already told is basically a b blog generation llm app here I'm going to specifically use this open source llama Lama 2 model again I'm going to use the hugging face API also for that uh and let's see how this specific uh step by step how we will be doing this specific project so let's go ahead and let's start this particular project okay guys now let's start our blog generation llm platform uh application so the model that I had actually generated over here you can probably see the model over here in the bin size and this is the size of the model is over here I'm going to specifically use in my local machine for my local inferencing and all so over here what I will do I will go quickly go ahead and open my vs code so my VSS code is ready over here okay now let's go ahead and do step by step things that we really need to do first of all I'm just going to create my requirement. txt file requirement. txt file and now I will go ahead and open my terminal so I will go ahead and open my command prompt and start my project okay so quickly I will clear the screen I will deactivate the default environment cond deactivate okay and we'll do it step by step so first step as usual go ahead and create my environment cond create minus p v andv environment I hope I've repeated this specific step lot many time so here I'm going to create cond create min SP VNV with python wal to 3.9 y okay so just to give you an idea what how exactly it is going to run how things are basically going to happen uh step by step we'll understand so first of all we are creating the environment and then we will go ahead and fill our requirement. txt now in requirement. txt I'm going to specifically use some of the libraries like send sentence Transformers C Transformer fast API if you want to specifically use fast API I I'll remove this fast API I think uh I will not require this IPI kernel so that I can play with Jupiter notebook if I want I can also remove this I don't want it langon I will specifically using and streamlet I'll be using okay so first of all I will go ahead and create cond activate Okay cond activate uh V EnV so we have activated the inv M and the next thing is that I will go ahead and install all the requirement. txt okay and in this you don't require uh okay so okay I've not saved it so requirement. txt is not saved now in this you don't require any open AI key because I'm just going to use hugging face and from hugging face I'm going to probably call my model which is basically present in my local so here is the model that I am going to specifically call okay so once this install ition will take place then we will go ahead and create my app.py and just give you an idea like uh I'm going to basically create the entire application in this specific file itself so quickly uh let's go ahead and import our streamlet so till the installation is basically happening I will go ahead and install streamlet Okay as St and then along with this I will Al be installing Lang chain. prompts because I'm also going to use prompts over here just to give you an idea how things are going to happen it's going to be very much fun guys because open source right it it's going to be really amazing with respect to open source you don't require anything as such and then I'm going to basically write prompt template because we need to use this from Lang chain then I'll be also using from Lang chain Lang chain do llms I'm going to import C Transformer okay why this is used I will just let you know once I probably write the code for this okay so three C Transformers also I'm going to basically use over here so this is going to be from okay so C Transformers prom template and St for the streamlet I'm going to specifically use the first thing is that I will go ahead and write function to get response from my um llm uh llama model right llama 2 model I'm going to basically use this okay still the installation is taking place guys it is going to take time because there are so many libraries I've been installing okay so I'll create a function over here let's create this particular function later on okay now after this what I'm actually going to do is that we'll go ahead and set our streamlet right set uncore pageor config see now many people will say streamlet or flask it doesn't not matter guys anything you can specifically use streamlet why I'm specifically using is that it'll be very much easy for me to probably create all these things right the UI that I want so in set page config I'm going to basically use page title generate blogs page icon I've have taken this robot icon from the streamlet documentation layout will be Central and uh initial sidebar will be collapsed okay so I'm not going to open the sidebar in that specific page now I will keep my ht. header so ht. header in here I'm going to basically generate my blogs right so generate the blogs and I'll use the same logo if I want so it looks good okay so this is the next thing I will probably this will be my head over here first of all I will create my input text okay so input text field right and this will basically be my input text field and let me keep it as a um a text area or a text box whatever things is required so I will write go ahead and write St dot St do input textor input okay so this will basically be my St so let's see everything is working fine why this is not coming in the color okay still the installation may be happening so over here I'll go ahead and write this I will say enter the blog topic right so if you just write the blog topic it will should be able to give you the entire blog itself with respect to anything that you want okay so done the installation is basically done over here you can probably see this good I will close this up now I'll continue my writing the code so I've created a input box now the other thing that I really want to create is that I'll try to create two more columns or two more Fields below this box okay one field I will say that how many words you specifically want for that blog okay so over here creating two more columns for additional two Fields additional two Fields okay so here first of all will be my column one let's say colum one and column 2 I will just write it like this and here I will say St do columns and uh here I'll be using I'll be giving right what should probably be the width like let's say 5A 5 if I'm giving you'll be able to see that the width of the text box or width of the column that I specifically have I'll be able to see it okay I'm I'm just creating that width for that columns okay now I'll say with column one whenever I probably write anything in the column one or select in anything in the column one this will basically be my number of words okay number of words and for here I will be creating my St do text input and this text input will probably retrieve the details of number of words okay so here I have specifically number of words great now the next column that I specifically want the detail so for whom I am actually creating this particular blog I want to probably put that field also so with column three I will probably create something like this I will say okay fine um what blog style I I'll create a field which is basically called as blog style okay now inside this blog style what I am actually going to do sorry not column 3 column two because I've created those variable over there okay so the blog style will basically be a drop- down so I will say St do select box okay and I will say what box this is specifically for so that first message I will say select write writing the blog for for okay so this I'm basically going to say that okay for whom I'm going to write this particular blog okay and with respect to this I can give all the options that I really want to give okay so for giving the options I will also be using this field so let's say the first option will be for researchers whether I'm writing that particular block for researchers or for data scientists okay data scientist or I am basically writing this block for for common people okay common people so this three information I really want over here and this will basically help me to put some styling in my blog okay that is the reason why I'm basically giving over here okay and by default since we need to select it in the first option so I will keep it as index as zero okay so here is all my stylings that I've have specifically used so if you want to probably make it in this way so you'll be able to understand this so this will be my column one and this will basically be my column two okay and then finally we will go ahead and write submit button submit will be St do button and this will basically be my generate okay so I'm going to basically generate this entirely uh generate is just like a button which will basically Click by taking all this particular information so from here I'll be getting my input from here I'll be getting number of words from here I'll be getting my block style okay all this three information now this will actually help me to get the final response here okay so I will say if submit okay if submit I have to call one function right and what will be that specific function that function will return me some output okay and that output will be displayed over here now that function I really need to create it over here itself let's say I will say get llama response okay so this is basically my function and this I will create in my definition and what are parameters I specifically require over here right this three parameters right and uh if I probably call this function over here what are the parameters that I'm going to write over here is all these three parameters so first parameter is specifically my text input input text the second parameter that I'm actually going to give over here is number of words the third parameter that I really want to give is my blog style so like what blog style I really want okay so all this three information is over here so this will basically be my input text okay uh I'll write the same name no worries number of words and third parameter is basically my block style so all these materials will be given in the description if you're liking this video please make sure that you hit subscribe press the Bell notification icon hit like again just to motivate me okay if you motivate me a lot I will create multiple contents amazing content for you okay now here is what I will be calling my llama model right llama model llama 2 model which I have actually downloaded in my local and for that only I will be specifically using this C Transformers right now if I probably go ahead and search in Lang chain Lang chain see whenever you have any problems related to anything as such right C Transformer C Transformer go and search in the documentation everything will be given to you so C Transformer what exactly it is it is it is over here it is given over here or not here let's see the documentation perfect so here you can see C Transformers the C trans Library provides python binding for ggm models so gml models the blog G gml models whichever model is BAS basically created you can directly call it from here let's say in the next class I want to call Mistral so I can go ahead and write my model name over here as mral and it'll be able to call directly from the hugging phas okay um not only hugging face but at least in the local uh if you have the local if you want to call it from the hugging face then you have to probably use the hugging face API key but right now I don't want to use all those things so I want to make it quite simple so CC Transformers and here I'm going to basically write my model model is equal to and this should be my model path right which model path this one model slash this one right so here you can probably see this specific name V3 Q8 Z bin okay so I'm going to probably copy this entire path and paste it over here okay so this will basically be my model and inside this what kind of model type I want there is also a parameter which is basically called as model type and in model type I'm going to basically say it is my llama model okay and and you can also provide some config parameter if you want otherwise it will take the default one so I'll say Max newcore tokens is equal to 256 and then the next one will basically be my temperature colon 0.01 let me keep the temperature point less only so I want to see different different answers okay so this is done uh this is my llm model that I'm basically going to call from here and it is going to load it okay now after my llm model is created I will go ahead and write my prompt template because I've taken three three different information so template here I will go ahead and create this will be in three quotes if you want to write it down because it is a multi-line statement and I will say write a blog write a blog for which style right blog style for whom for this specific blog style for researchers for freshers for anyone you can write right or I'll say job profiles like can for researcher job profile for fresher job profile for normal people job profile right so something like this job profile for a topic which topic I'm going to basically say this will be my number of words sorry not number of words this will be my input _ text so this is how we basically write prompts within how many words the number of words okay this many number of words I'm going to basically write this okay so this actually becomes my prompt template entirely okay this is my entire prompt template write a blog for so and so for blog style this this this to make it look better what I will do I will just press tab so that it'll look over here okay so this is my template that I'm probably going ahead with I've given the three information block style input text number of words everything is given over here now finally I need to probably create the prompt template okay so for creating the prompt template I'm going to use prompt is equal to prompt template and here I'm going to basically give my input variables so input underscore variables and inside this I'm going to basically write first information that I want what kind of inputs I specifically want right uh whether I want um this block style so for block style I can just write style second one I can probably say text third one I can basically say ncore word so this will basically be my three information that I'm going to provide it when I'm giving in my prompt template okay and finally uh this is my input variable this next parameter that I can also go ahead with I can provide my template itself what template I want to give so this will be my template over here now finally we will generate the response from the Llama model OKAY llama 2 model which is from gml okay so here what I'm actually going to do I'm going to basically write llm and whatever things we have learned in Lang chain till now prompt dot prompt. format and here I'm going to basically use email sorry email what are the information that I really want to give over here prompt. format so the first thing is with respect to style the style will be given as blog style so I'm going to basically write blog uncore style okay the next information that I'm probably going to give is my input text input text is equal to not input text text is equal to input text I have to give text is equal to input undor text and the third p parameter that I'm going to give is my ncore words which will basically be my number of words done so this is what I'm specifically giving with respect to my prompt uh and what llm will do it will try to give you the response for this and then we will go ahead and print this response and we will return this response also okay response response okay and what we'll do we will go ahead and return this response so step by step everything is done now I'm going to call this get llama response over here already is done now let's see if everything runs fine or not uh hope so at least one error will at least come let's see so I will delete this and let's go ahead and write over here to run the streamlet app all you have to do is just just write streamlet run ab. py okay so once I probably execute this you'll be able to see this is what is my model but still I'm getting uh module streamlet has attribute no head okay so let's see where I have specifically done the mistake because I think it should not be head it should be header okay I could see the error header okay fine no worries let's run it baby let's run this again stream late run app.py no I think it should run this looks good uh enter the blog topic number of words researcher writing the researcher blog data scientist common people so let's go ahead and write about large language model so 300 words so number of words I will go ahead and write 300 I want to basically write it for common people and we will go ahead and generate it now see as soon as we click on on generate it is going to take some time the reason it is probably going to take some time because uh we are using this particular in my local CPU but we got an error let's see key error block style it seems so I will go to my code block style block style block style so one minor mistake that I have specifically done over here so what I will do is that I'll give the same key name so that it does not give us any issue okay so this will be my input text and number of words the thing is that whatever things I give in that prompt template the input variables should be of that same name okay so that is the mistake I had done it's okay no worries so let's go ahead and execute it now everything looks fine have assigned the same value with there number of words number of words so here also I'll go ahead and write number of words block style input text and this also should be block style the name I'm giving same right right for both prom template and this okay so I think now it should work let's see so go ahead and write this and now my page is opened now I'll go ahead and write large language models and it will probably create my words so this will be 300 I want to create it for common people let's generated it as I said that the output that I'm probably going to get is going to take some time because I'm running this in local CPU um let's say if you deploy this in the cloud uh with respect to if there are GPU features then you will get the response very much quickly so just let's wait uh till then uh we get the output hardly but I think it is 5 to 10 seconds Max and since I've told 300 words it is again going to take time so let's see the output once it comes so guys it hardly took 15 seconds to display the result so here you can see that large language models have become increasingly popular in recent year due to the in due to the ability to process and generate humanlike languages it looks like a good blog you can also create any number of words blog itself Now understand that I have a good amount of ram my CPU has lot of cores so I was able to get it in 15 seconds for some of the people it may take 20 seconds it may take 30 seconds now you may be asking Krish how can you specifically reduce this time is very much simple guys we will probably do the deployment in AWS or any other Cloud Server itself which I will be probably showing in the upcoming videos and there you'll be able to see that how with the help of gpus the inferencing also becomes very much easy not only that we'll also see how we can probably fine-tune all this data set with your own custom data itself right because these all are very big big models and I'm just taking the 7gb model over here uh this is with respect to 7 billion parameters right but still it is able to give you a very good uh data itself right so I hope you like this particular video this was it for my side if you like this particular video please make sure you hit like share and share with all your friends I'll see you in the next video have a great day thank you one all take care bye-bye
Info
Channel: Krish Naik
Views: 38,168
Rating: undefined out of 5
Keywords: yt:cc=on, llama2 tutorials, llama2 llm model from meta, large language models, open ource llm models, llama turoails, generative ai, hugging face llama2 model
Id: cMJWC-csdK4
Channel Id: undefined
Length: 36min 1sec (2161 seconds)
Published: Wed Dec 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.