Easiest Mistral 7B Installation on Windows Locally

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys in this video I'm going to show you the most easiest quickest and feasible way to install mrol 7B on Windows on your local laptop on your screen you can see that I'm am on mrol 7B model card if you are not aware what mrol 7B is I have various videos on it where I go in detail in this model but just to give you a quick overview mrol 7B is by far the best 7 billion model to date and this model has already shown that it outperforms Lama 2 on 13 billion on all benchmarks plus it has also outperformed llama 1 34 34 billion on many benchmarks it is use it is using grouped cury attention or gqa for foster entrance plus it also uses sliding window attention to handle longer sequences at small cost and I have seen it in various benchmarks while using this model that it in fact is quite cost efficient okay now now we know what Mr all 7B is now let's see what is the easiest way to get it installed locally on your windows on your laptop or anywhere in cloud or on Prem wherever you like the tool which I'm going to use for it is called as LM Studio I already have done few videos around it and I'll drop the Link in video's description so simply go to LM studio. from there click on download LM Studio for Windows and then it will start downloading as you can see on the top right the size of this LM studio is around 400 MEC which is very small and the beauty of this LM studio is that with this you can very easily in download and then run model on your laptop it uses the conted vers versions of the models and you can it gives you a drop down which I will show you shortly which you can easily use so let's wait for it to finish shouldn't take too long it's almost done let's wait for it to come back the download is finished and in order to run it simply click on open file and this is going to open it on your local system and there you go it is still opening so let's wait for it to finish that's it let's close it I think I have opened two windows let me close the other one okay there you go so this is what LM Studio looks like now simply um you need to do is to search for Mr all 7B in this text box click enter and then you can see that there you have lot of Mr all 7B variants like instruct um 7B open ARCA and then there are a lot of them so if I select this one Mr all 7B ggf or if I select open Ora 7B and then you can the list goes on and on I'm just searching for something with uh gptq or ggf or so let me go through [Music] that so let's go with this uh blocks quantied version of mrol 7B so once you select it on the right hand side you can see the available file for it and there are lot of quantized version variants here I normally try to go with Q5 but as you can see that it is grade out so it might not be supported because you can see that it is saying possibly supported so it's not sure so we need to check it out um so for this let's go with Q5 KS to see if it works or not if if you want to uh so in order to download it all you need to do is to click on this download button and as you click it you can see that on the bottom section of this page it has started downloading the file size is around five gig so let's wait for it to download mrol model has finished downloading took around 10 minutes on my system now once it is downloaded then you need to go to this chat icon on the left hand side click here and then on the top middle just select on select the model to load and then click here to load the model and you can see that now the model is being loaded on our local system the model is loaded on your local system now now just drag this bottom section to downward by clicking on it or just dragging and Dr dropping to the bottom and there you go now you can chat with your MW model it is downloaded and installed so let me ask it what is the capital of Australia now this response time will depend on the capacity of your system how much memory you have and all that stuff as the cool thing about this is that you can see on the left top left it is showing you the CPU consumption and the memory usage and you can see my CPU consumption is hard as 320% so I definitely need a BP system but still it hasn't crash and it was able to give me a result which is spot on and correct and then on the right hand side you can also play around with model configuration so you can see that there are some inference parameter like what sort of um repeat penalty you want to give which is 1.1 which is fine and then Randomness which is temperature and lot of other things like prompt format model initialization and stuff and I'm just keeping it um I'm using mlog to keep entire model in Ram which is a default all good you can even select on T size which is again a default anyway so you can play around with all of these parameters easily I have another video where I go in way more detail what this parameter what these parameters mean now um let's give it another run let me copy it from my system and then paste it to save time so I'm going to ask it to play a role of bathroom renovator and I'm giving it a scenario that I have a 24 year old house with an old bathro and I want to renovate it so this model needs to think it step by step and give me the steps to renovate it and then also give me the cost in Australian dollars so let's see how it goes I'm not expecting it to give me the latest cost because it's not fine tuned on the latest data so I'm not sure how old the Corpus of data is for this model but still I just want to see if it is able to do the entrance here because when I tried it out on my notebook and Linux instance it was able to do it quite nicely so let's wait for it to come back and here you can see that it is processing and now generation is about to start and you can see that there's some lag uh because U of the way my system is configured that is still stuck with the previous one so I'm just going to say stop generating and I'll say continue and then now let's see what it does I'm not sure if it this is the bug or what because sometime it just does this where it just answers the previous question so we need to sort of stop it and regenerate or click on continue so that it would go to the next one anyway so you can see that it has started splitting out the response quite slow because of the limitation on my system this is not the fault of model or this software I just need a b laptop but you can see that it has started doing it so let's wait for it to print something more to see if it is able to give me the cost of steps or not so you can see that it is successfully able to decifer my question and it is giving me not only the steps but also the cost let's click on stop generating so this is it guys um this is how easily you can download and install and then run this mol 7B let before I close this video let me tell you another great feature of it if you click on this double-headed AR on the left hand side you can even start local inference server and then share the um endpoint with your user so that instead of logging into this server to use it they can simply make an API call like this this is a restful API where they are just making a call to this endpoint and then passing it the prompt and then it will return them the answer how cool is that so you can build any application on top of this so very impressed by this and I have various videos on it where I discussed lot of other things related to this so pleas please feel free to search on the channel or see it in the video description thank you very much I hope that you enjoyed it if you like the content please consider subscribing to the channel thank you very much
Info
Channel: Fahd Mirza
Views: 4,838
Rating: undefined out of 5
Keywords:
Id: -49iDIL7lz4
Channel Id: undefined
Length: 10min 16sec (616 seconds)
Published: Sat Oct 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.