Running a Hugging Face LLM on your laptop

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hugging face has become the home of Open Source large language models and in this video we're going to learn how to download one to our machine and ask it some questions let's open up a jupyter network we're going to start by manually downloading a model to our machine so we're going to import from hugging facehub the HF hub download function now if you don't have a hugging face key you'll need to go to the website to generate one and once you're there go to your profile click on access tokens and then click on new token and you can put in a name and then a role so I found I could get away with read once you've done that click on generate token and then copy that token to your clipboard yeah I would advise you then put that as an environment variable so I'm using hugging underscore face underscore API key and then we're going to use the OS environ.get to load that into the hugging face API key variable but if you're just playing around you could just hard code this in there next we need to choose a model to download now the general advice is to pick one with a lower number of parameters and those are usually included in the name so on this one here fast chat t53b is the number of parameters so this is three billion and the suggestion is that 7 billion or lower is supposed to work well on consumer Hardware such as my laptop so three billion should be pretty good but you can browse around and pick another model if you don't like this one we can then click on the file names so we can see which files we need to download to our machines you can see there are a bunch of files pytorch is the main one but there are then a bunch of configuration files as well so we're going to grab a copy of all of those file names and what we're going to do then is we're going to put the model put model ID as a variable we'll put all the file names that we need to download in an array and once we've done that we're going to iterate through them and we're going to call this HF hub download function passing in the model ID each time the file name each time and then our key and it's going to download those and it's going to print out where they go so you can see they go into this cache folder hugging face and then the name of the model and if you downloaded other models they would go in there under a different directory now let's try running the model but before we do that we're going to disable the Wi-Fi on my machine just so you can see that it is genuinely using my machine and not going out to the internet and so I've got a couple of functions I'll include all the code in here in the description so we're going to check our connectivity then we're going to toggle the Wi-Fi off then we'll sleep just for a little bit and then we'll check the connectivity again and you can see if we run that you can see it says I've got a an IP address on my local network I then disabled the Wi-Fi and now I don't have an address anymore so just our machine sitting on its own next we need to initialize the model so we're going to import some some classes from the Transformers library and we're going to create a tokenizer and a model now when you're creating the model the the class that you're going to use there can be Auto model for seek to seek LM or it can be Auto model for causal LM and which one you use depends on the the type and so you can find that by looking just underneath the name on the hugging face website you can see what what exactly type it is and so for for us it's text to text generation once we've done that we can create our pipeline this will take a little bit of time so we're going to speed things up a bit and you can see the only output we got here was a HTTP head request to the config file I haven't now I haven't looked at the code but I assume it was checking whether we've got the latest version but actually the pipeline will continue to work even if that check fails we can ignore that for now right now we can actually give this model a try so let's ask it a question so let's say what are the competitors to Apache Kafka and so you can see give it a few seconds and it gives back a response so the first sentence is pretty good so popular open source message broker used for streaming and aggregating data from multiple sources so that's pretty good at the competitors I would have said it's something like red panda or Pulsar it's in fact said spark storm Flume and Flink as if perhaps it needs it needs to have a little bit more up-to-date information for that one but that's not a bad not a bad answer where I think this could be really useful is if you want to ask questions of some of your own data so like put some of your own data into the context where you don't necessarily want to send it like out to the to an API where potentially the data is then looked at by someone who works for one of the llm companies so for example let's imagine I'm putting in a bunch of my I've made up this data so my name is Mark I guess that's correct let's say I have some imaginary brothers and um and a best friend called Michael and I said just using this context do you know if I have a surrender says no I do not have a sister but you could imagine putting different different types of data in so you could imagine putting a bunch of data and say hey summarize this or or something similar to that if you liked this video you might like this other one that I did showing how to get a consistent Json response when using open AI
Info
Channel: Learn Data with Mark
Views: 5,160
Rating: undefined out of 5
Keywords:
Id: Ay5K4tog5NQ
Channel Id: undefined
Length: 4min 35sec (275 seconds)
Published: Fri Aug 04 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.