Run Llama 2 Web UI on Colab or LOCALLY!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is all you need to see if you want to run llama 2 model on free version of Google collab if you are like me who doesn't want a really big GPU then you can use this video tutorial to use llama2 of course the 4-bit quantized version on free Google collab this is quite amazing the model was launched yesterday and today we have this collab version thanks to once again commendru kemendra has done a tremendous job in getting all these models into Google collab I'll link this repository in the YouTube description below the like button so you can go here and then just go scroll down and click open and collab the 7 billion chat version which is the quantized model that model comes from here it's from 4-bit this is a organization in hugging phase it it uploads a lot of models so we have got the safe tensors here so the safe tensors file are available that means the file totally is less than 12 or 15 gig and it does pretty well on free Google collab if you can see my GPU Ram is around 13 gigs and um and it runs it runs quite fine and I'm going to show you how to do that but before I show you that just a quick shout out once again to kamendru if you appreciate this work or if you use this work anywhere for commercial purpose and you make money please try to sponsor Kevin has got patreon Kofi GitHub sponsors I would really appreciate that if you go ahead and then support this is my support where you can start the GitHub repository as well okay anyways this is the interface that we get the GUI graphical user interface and you can go ask questions here who are you I mean this is a very bad question to ask an AI but um that's a question that I would like to ask to make sure that you know that I'm chatting with llama and you can start with basic questions like for example can you calculate what is what is two plus two I am I I'm not sure like if it is going to do a good job but let's let's try it out because it's the smallest model like the 7 billion parameter model that's also quantized version so it says two plus two equals four I'm not evaluating the model I've got a separate video for that but this graphical user interface also lets you clear the history you know stop it and um chat change the model parameters and also you have like certain settings like what do you want the context to be for example I said like this is a conversation with your assistant this is your duty I can go here and then say you have to be sarcastic and then it will be sarcastic you can use certain characters if you have got any example overall this is quite amazing and how do you run this it's very very very very simple for you go here like I said go to this repository click open and collab and after you click open in collab all you have to do is Click red all like all you have to do is Click runtime and click run all I would encourage you that you can also duplicate this like file save it in your drive and then run this that is another version the other way of running it if you think you know um the owner or creator of this notebook can take some data of yours that's up to you but you can just go ahead and then say click run all and that will run yours before you click run all it's always a good practice to make sure that your check if GPU is enabled like in this case T4 gpus enabled and after you run everything you can see what it does first tries to set up the web UI interface and it tries to download the model from the particular location that we just discussed about the tensorflow tensor files save tensor files and the tokenizer and the model information once that is done it's going to activate the gradual link once the gradual link is activated you're going to get a simple link the gradual link and that link is good enough for you to go play with the model so the grid your link let me check if it is available for me to share with you yeah so you get a gradual link like this you can click the gradual link here and that should ideally work for you if you are trying to do this on your local machine you can still go ahead download this notebook which will download like click file download The Notebook as dot i p ynb file if you have got the Cuda setup proper like if you have got all the GPU related stuff properly set up you can still open this on Notebook and then run this the only catch here is that this apt get will work on Linux if you are on Linux this is quite straightforward but if you are on Windows you need to figure out how to install those things but the rest of the thing is very similar download the 4-bit like get the web UI once you get the web UI then download the models once you get the download the models go into the web UI and just run the server.pi which will invoke the gradu and in that case you don't need the external gradient link all you need is the internal localhost radio link and once you click that you will be greeted with this interface where you can go ahead and then start chatting with llama you can ask any sort of question that you like like one of the things that people love to do is write um write a poem um in David Bowie Bowie I don't know if I'm pronouncing it David Bowie style about a star man okay just generate so once you generate it's going to take a little bit of time if you want to see if it works fine you can go here and see the output generation and how much time it takes we are currently running on a T4 GPU so you can see that it takes about 6.75 tokens per second and the total operation takes here about it took about two seconds it depends upon what you are trying to generate and how much it has to generate so it says it has generated a poem sure here's the poem and the style of David Bowie Starman or Star man shining bright in the night sky you light dot ignite a Celestial being born of the stars above with a twinkle in your eye you dance and love I'm I'm not like uh I'm not like big fan of poems or I I don't appreciate the artistic sense of those but still it looks like a really good poem to be honest like it it Rhymes and when you read it it feels good but as you can see it stopped a bit after this because you have the limitation of the context window or the limitation of the number of tokens that you can generate and the parameters that you want to play with you can go click the parameters and then you can go here and then see how many output token that you are planning to generate like in this case it says 200 uh you can play with the temperature if you want more accurate you can keep the temperature low if you want more creative you can keep the temperature high and then you can also play with other parameters this is this is a really good interface uh this interface you know lets you play with the model uh it play lets you play with the interface and it also lets you play with the model settings itself like the context system context and other aspects and all it takes is one free Google collab notebook for you to run this and in just like a couple of minutes you are up and running and if you want to remove this you can go here click clear history and the clear history will actually clear everything for you so you know you can run this every time again the only catches if you are running this on a Google collab notebook every time you have done this it's going to do the entire process again it's going to download the model again it's going to run everything again if it's on your local machine the model would be run already the model would be downloaded already all you have to do is run it so that is a bit of difference but again like if you do not have GPU especially if you do not have a GPU that has like 16 gig Graphics memory vram then Google collab has definitely the Savior that you can turn to and then run this that's it I'm quickly summarizing go to this GitHub repository which I'll link it in the YouTube description thanks to Cameron Drew click open and collab 7 billion chat model click it then open the collab notebook click run all just make sure that you are you have GPU you won't click run or do it otherwise copy the notebook in your own drive and then click run all after that you'll get a grade your link somewhere here in the com the logs click the gradual link after you click the gradual link you will be greeted with this kind of interface and that's it you can start chatting with the llama2 model the Llama 2 model that just got launched yesterday quite amazing thanks to everybody who has contributed to this thing I hope this was helpful to you all the required links will be in the YouTube description see you in another video Happy prompting
Info
Channel: 1littlecoder
Views: 29,419
Rating: undefined out of 5
Keywords: ai, machine learning, artificial intelligence
Id: 9Wf9zmdms0k
Channel Id: undefined
Length: 8min 32sec (512 seconds)
Published: Wed Jul 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.