Ollama: The Easiest Way to Run Uncensored Llama 2 on a Mac

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is what I think is the simplest way of getting llama 2 up and running on your Apple silicon Mac so today I'm going to be showing you how to run a llama which is a way to create share and run large language models on your local machine on your Apple silicon Mac specifically and we'll be running a few tests as well so with all that said let's get started so all we need to do is head to olama.ai and click on this download button so once you've downloaded it just drag it into your applications and then you'll see we have this olama app available so with that ulamara we actually just need to fire it up we are say open as we can see Welcome to our llama let's get you up and running with your own large language models install the command line yep we want to do that and then it's as simple as running a command on our command line so I'll copy that finish open a terminal so we just run olama run llama2 and this will pull down the appropriate alarm 2 model from hugging face we don't need to sign up to a meta or anything like that report email address or agree to any kind of terms service there is a model that the bloke has made available on hugging face for us and we'll just wait for this to download because it's quite Hefty downloads it's a default one which is the 7 billion parameter model which is 3.8 gigabytes so you might want to run one of these large language models locally if say you don't want to be using the web services to do so if your company's got restrictions or if you're worried about security and sending messages out of the internet Etc um so yeah being able to have the whole model locally and only kind of fetching that first model file and interacting with it entirely within your network it might be something that you find important okay well that's finally downloaded that took a little while um let's and it's running it's running on my local machine here let's say hello and let's see how it deals with a little bit of python uh let's ask it if it can create a regular expression to match email addresses Maybe and wow so there is refused to say that it would write a regular expression to match email addresses because it is not within my programming or ethical guidelines to assist with tasks that could potentially infringe on individuals privacy so matching an email address it won't write a regex for that okay so I'm going to ask it to figure out the largest two inches from a list so let's see if we can do that first of all probably should have specified that that was in Python but there we go okay so we can't do that because it doesn't have access to a list so this is running pretty well this is an M1 uh Pro MacBook um I have run it on a just a M1 Mac Mini and that ran quite a bit slower and actually interestingly my fans are kicking in which doesn't happen that often so must be really putting it through the paces okay so the example is given there is not so great um because actually it prints out the same number twice which is not what I want so I know the coding capabilities of llama 2 is not as great as say chat GPT so gpd4 or claw 2 that's one of the things that it fails quite badly on and one of the things we obviously don't get here is like the real ability to be able to run this because we don't have a code interpreter thing be interesting to see somebody build something like that on top of llama too I'd imagine that's coming so I've asked it to modify the function to ensure that it returns unique elements from the original list okay it keeps returning the system prompt which is not particularly helpful that it dumps out at the bottom every so often um so it's given us the same response saying that it should print each number twice which is not what I want and so yeah I'm going to say that that's failing badly at that I'm going to also do let's do now a logic problem so these are just testing to see if it actually works so we've obviously got a large language model running locally it's returning responses to us I've given it another one now which is a prompt that I've used before which is if it takes five minutes five it's five machines five minutes to make five devices how long would it take 100 machines to make 100 devices the answer here should be five minutes as well because each of those things have scaled proportionally we'll see what it says okay so the answer is wrong it's given the answer is 20 minutes it's kind of expected is because I'm assuming that it's not as good at this so gpd4 actually gets this right uh Claude 2 doesn't and we can see here as well that llama2 doesn't as well anyway we've got llama 2 running locally which is great we can just work with it use it if you want to so we get a local directory within olama within that there is a models folder and within that is some blobs and manifests in the blobs folder is actually where our models are stored so if you expand that out we can see we've got one of them there and if we call an alarm list it shows us the models that we've got locally so we saw within the repository we can actually get these other models as well we've got Orca mini which is smaller and we literally just need to pull those models and we run them in the same way that we did um with llama there one of the other things that's quite interesting is that we can create model files and that means we can create our own models locally so I'm going to do that and that means we can set particular preferences so you can see the example is here you are Mario from Super Mario Brothers answer is Mario the assistant only so let's do that so I've just created that model file there so in order to do that with your llama creates and the name of what we're calling our model so here we call it Mario and then the name of the model file so let's create that and have a look so you can see within the model file that it's used it's got a base model it's got a parameter and it's got a system so it's similar to how you might expect a Docker file so it's got this base image that it's working off of after doing that we just need to do a llama Run Mario let's see what it says well it's me Mario I just cap hey there buddy what can I help you maybe you need a jump start or a power up just let me know and I'll do my best to assist you who is your arch enemy I must actually responded with some emojis though which is kind of interesting Oh you mean Bowser yes that no good nasty stinking bad guy he's always causing trouble and trying to kidnap Princess Peach but don't you worry I'm on the job so I'll get him good one of these days and then it's respond to this Emoji in the examples directory of the repo we can look in here we've got Mario which is what we've just done we've also got devops or engineer there which is using the 13 billion parameter sets and saying you a senior devops engineer action assistant you're full of cloud Technologies like terraform AWS etc etc they've also got a model for a tweet writer so pretty much anything you want to do or any way you want to customize it you can be able to use it for that one of the other interesting things is that when alarm is running we are actually running a server locally so if I hit this address you can actually use it as an API so if I go to localhost 11434 alarm is running so if I was to kill the little llama then the server no longer be running we better run the thing but if I submit a post request to back out of here we would just do um Ctrl C and so we've got their why is the sky blue so interestingly it's going to respond with Mario now but let's see I actually know it's not because we specified here that it's Lama two and you can see this is a streaming response that's coming back yeah so we've got a nice API there's a streaming API responding with Json if we want to use it for our local development um one last thing is that if we are running a particular model so if you do timer two again we can specify verbose at the end and we actually will get back some performance stats of how it's running so why is the sky blue and it gives this response it's because of phenomenon called really scattering which is scattering of sunlight and we can see that we're getting 22 tokens a second there um on my M1 Mac Mini I was getting around 17 so slightly less so having logged it into alarma's Discord server I have discovered that there's actually another model that isn't listed and that is the uncensored version of llama2 so that means we can try and create this uh annoying python regex in using that model instead so I've already pulled it down I'm gonna try and run it now and you can see that the name of the model is just literally llama2 Dash uncensored so that's not going to appear on their website but it is there so whether when they list it in the future hopefully without actually not a llama 2 it is a llama run llama 2 uncensored can you create a python regex to match email addresses Moment of Truth yes I can help you that and it gives us a regular expression perfect okay there's also another um problem that I want to give it which is known as The Killers problem and I know for certain that it would restrict you if you try to enter that so so basically the problem goes like this two killers are in the room another killer ends of the room and kills one of the killers how many killers are now in the room there are two killers in the room before and one killer after so three killers in total well that's wrong anyway so it's failed on the logic problem but it's giving us an answer so that's kind of what I wanted to see so just to prove my point if we get into that same question in llama 2 to check let's see what it responds with and it blocks it basically says I'm here to help but I basically the question involves harm harmful and violent content so if you want to get unrestricted access there's this uncensored model as well so be sure to check that one out so yeah that's the alarm or two really dead simple way of running large language models on your machine and in this case we've been able to run llama 2 really easily without having to do much at all um I hope you enjoyed the video subscribe for more of this sort of stuff and give it a thumbs up and I'll speak to you soon in a new video alright bye for now
Info
Channel: Ian Wootten
Views: 14,784
Rating: undefined out of 5
Keywords:
Id: tIRx-Sm3xDQ
Channel Id: undefined
Length: 11min 30sec (690 seconds)
Published: Fri Jul 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.