Run Llama 3 on Windows | Build with Meta Llama

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone I'm Nava and I'm a developer Advocate at meta this build with meta Lama video series demonstrates the capabilities and practical applications of Lama models for developers like you so that you can leverage the benefits of what llama has to offer and incorporate it into your own applications in this video series we'll discuss some of the various ways in which you can run metal Lama models and go over some of the resources available to you to get started in the second video we will learn how to run llama models on Windows OS using hugging face apis with another step-by-step tutorial to help you follow along so let's dive in for this demo I'll be using a machine running Windows OS with our RTX 4090 GPU since we will be using hugging face Transformers library for this setup this setup can also be used on other operating systems that the library supports such as Linux or Mac OS using similar steps as the ones shown in this video let's take a look at the steps to allow easy access to meta Lama models we are providing the downloads on hugging face where you can download the models in both Transformers and Native Lama 3 formats to download the weights visit one of our repos containing the model you'd like to use for example we will use the metal Lama 3 8 billion instr struct model read and agree to the license agreement fill in your details and accept the license and click on submit once your request is approved you will be granted access to all the Lama 3 models our request was approved and we now have access to the models for this tutorial we will be using metal Lama models that are already converted to the hugging face format however if you would like to download the original native weights click click on the files and versions Tab and download the contents of the original folder if you prefer you can also download them from the command line by typing pip install hugging face Hub then type hugging face CLI download followed by arguments as shown in this example we will showcase how you can use metal Lama models already converted to the hugging face format using Transformers to use it with Transformers We will be using the pipeline class from hugging face is it is recommended to use a python virtual environment for running this demo in this demo we are using miniconda but you can use any virtual environment of your choice make sure to use the latest version of Transformers to do that open your terminal and type pip install hyphen U Transformers upgrade we will also use the accelerate Library which enables our code to be run across any distributed configuration to do that in your terminal type pip install accelerate we will be using python for our demo script to install python visit the python website where you can choose your OS and download the version of python that you like we will also be using py for our demo so we'll need to make sure we have py installed in our setup to install pyo visit the pyo website and choose your OS and configuration to get the command you need paste that command in your terminal and press enter next open the editor of your choice and create a python script in our script we'll first import Transformers import torch and from Transformers we'll Import Auto tokenizer next we'll Define the model we'd like to use in this case we will use the 8 billion instruct model which which is fine-tuned for chat to do that we'll type model equal to followed by the model name we will also Define the tokenizer which can be derived from Auto tokenizer based on the model loaded by using the from pre-trained method this will download and cache the pre-trained tokenizer and return an instance of the appropriate tokenizer class to do that let's type tokenizer equal to Auto tokenizer from pre-trained model now we need a way to use our model for inference let's type pipeline is equal to transformers. pipeline text generation model is equal to model torch D type is equal to tch. float 16 and device map is auto pipeline allows us to specify which type of task the pipeline needs to run in our case text generation it specifies the model that the pipeline should use to make the predictions specified by model Define the Precision to use this model which is torch float 16 and device on which the pipeline should run defined by device map among various other options we'll also set the device map to Auto which means that the pipeline will automatically use a GPU if one is available now that we have our pipeline defined we need to provide some text prompts as inputs to our pipeline to use when it runs to generate the responses let's define this as sequences let's type sequences is equal to Pipeline with the prompt saying I have tomatoes basil and cheese at home what can I cook for dinner do sample is true top k equal to 10 number of return sequences one end of string token ID will be provided Ed by tokenizer endof string token ID truncation is true and max length is 400 the pipeline sets do sample to True which allows us to specify the decoding strategy we'd like to use to select the next token from the probability distribution over the entire vocabulary in our example we are using topk sampling by changing max length you can specify how long you'd like the generated responses to be setting the number of return sequences parameter to greater than one will let you generate more than one output finally in your script add the following to provide input and information on how to run the pipeline we'll run a for Loop over the sequences and print the generated result now we have our script ready let's save it and go back to our terminal we're ready to run the script but before we do that let's make sure we can access and interact with hugging face directly from our command line to do that first let's install the hugging face CLI using pip install hyph U hugging face Hub CLI as shown then type hugging face CLI login and press enter here it will ask us for our access token which we can get from our hugging face account under settings copy it and provide it in the command line we're now all set to run our script to run our script type python with the name of the script which in our case is Lama 3 HF demo. pi and press enter wec downloaded the model shows us the progress on the pipeline along with our question and the answer are generated after running the script it goes over the list of ingredients I will need to make the suggested dish and with steps on how to make it in this example we saw how to run llama models on windows with the meta Lama 38b instruct model using the hugging face Transformers Library if you'd like to check out the full example and run it on your own local machine our team has worked on a detailed sample notebook that you can refer to and can be found in the Llama recipes repo in the Llama recipe repo you will find an example of how to run llama 3 models using already converted hugging face weights as well as an example that goes over how you can convert the original weights into hugging phas format and run it using those a link to this notebook can also be found in the description box below we've also created various other demos and examples to provide you with guidance and as references to help you get started with llama models and to make it easier for you to integrate these into your own use cases to try these examples check out our Lama recipes GitHub repo where you'll find complete walkthroughs for how to get started with llama models with installation instructions and dependencies and recipes where you can find several examples for inference fine-tuning training on custom data sets as well as demos that showcase Lama deployments and basic interactions as well as specialized use cases so here we are we saw how we could set up llama models locally using hugging face apis with a step-by-step demonstration of how you can set it up yourself if you prefer to consume information in a written format we've also published a Blog that will go over all the things we discussed in this video and more the supporting blog can be found on our website and will also be linked in the description box below in our next video we'll see some of the other ways in which you can run llama models and go over all the resources available to you to help you get started we hope you found this video helpful to provide you with insights and resources that you may need to get started with using Lama models thank you and see you in the next video
Info
Channel: AI at Meta
Views: 6,294
Rating: undefined out of 5
Keywords: Artificial Intelligence, Meta Llama, Llama 3, AI tutorial, Llama Hugging Face, Llama Python, Run Llama locally, Llama LLM, AI on Windows, Local AI on Windows, Local LLM Windows, Local Llama Windows, Windows AI tutorial, Local PC AI, Hugging Face API, Hugging Face Transformers
Id: a_HHryXoDjM
Channel Id: undefined
Length: 10min 28sec (628 seconds)
Published: Thu May 23 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.