Replace Github Co-pilot with a Local LLM (CPU, GPU, individual, teams)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to this new video so today I will show you how to create your home cyot what exactly is cyot that's the first question cyot is this new and extremely helpful tool created by GitHub and based on GPT 4 so it's an AI that autocomplete a lot of your code so it means you can have this kind of use case that you can see here where you start typing a function and the war function is written by AI for you so it speed up your um coding capabilities and you also have chat on the side where you can ask more small things about the code maybe if you want to understand some code part if you want to debug them if you want write test about this code part and much more so today we will not be using GitHub copilot because it's actually expensive expensive for you as an individual and for business if you have a lot of people but it's also not a pretty good solution when you want to keep your code for yourself so if you use the GTO compil individual for example your data will not be excluded from the trading data so you will have your paying it up but they are still using your code to train your model so if you want to avoid those two things paying GitHub for this service and also getting your code into the training data of these models you can use um the solution that we will present in this video so this video is actually made for you so let's just get started I will show exactly the tool we'll use the process we'll use and how we will solve that first the tool we'll be using um AMA I think I present AMA a lot of time on this channel AMA is actually this pretty interesting tool that help you run models locally so you can run a lot of Open Source model including Lama 2 Cod Lama D coder star coder and a lot of other models so it means we will be using AMA to run a an open source model that's purely made for coding and we will connect it to our vs code and to connect that to our vs code we need an extension and for that we have this extension called continue this extension is actually pretty interesting as well you can install it into your vs code and connect it maybe to gbt 4 uh to Google API to entropic and a lot of other providers including local models same for GPT same for code GPT and much more tools so but we will just be using continue for the moment and we'll be connecting it to a local API that we have that will be serving a local model so this is the tack we using and we have a last tool that we using that is openg g open UI this open word UI is a UI for the AMA thing so we can actually P model our models and try it out before choosing the model that s best our need so we have the possibility to browse different models to check if the generation uh the content generated by this by this model are better for us are good for what we want to do before integrating it into our vs code extension and using it into our code so we have a lot of models here we can use the latest model from Google this JMA uh uh model I actually made a video um this last week um showing how to do SG locally you can check it out here uh and also um we have much more choices when it come to models we can use Maybe Star coder if you like stock we have stock coder uh we have dips as well DPS is a really great model for coding and we have the good part is we have the 1 billion parameter DPS model so we using the 1 billion parameter D model when we have only the CPU and when we cannot use the GPU of our computer so the 1 billion model is actually useful when we just have a CPU when we don't have a GPU okay so this are the stack so let's get started with the code itself let's go let's create a new folder called copilot and in this folder uh open it with vs code and now let's create some small file we'll be using Docker for that uh for what we are trying to do now so let's just create a Docker compost the jaml file in this file we will base a lot of ques I will just provide this file at the end of the video um yes we will need uh some Docker file that we have here so let's just come to this doer compost file first uh copy all the content here and paste it into our file here and beside that we need the API as well so we need to go to the API part here copy this part uh this part part the part that we will be exposing so that um our continue can connect to this uh to our local uh model we also need this if we need if you have a GPU so if you don't have you don't really need it but if you have a GPU you can copy this part here for the deploy part and actually paste it here as well yes and discount we can just put it at one um to our issues because we don't have access to a lot of uh environmental variables um yeah otherwise we can just use this tra here this trip called run compose here actually uh set this variables to you and uh you can use them with this enable dpu and specified account or yeah and it will actually set up this VAR to you and run this Lo compost for you as well so but in our case we'll just do this and this is enough for us um yes so to run this it is pretty simple and pretty straightforward we go to the terminal and we do a do compose up in detach mode and will pull the images and um yes it may take much more time for you I actually have this model actually uh it this images in my Docker so it's not that long but now if we come here Docker we'll see that these two images are running this is theama and this port is a port we will be using to connect so or might give you an API the an an API that's compatible with um um chpc API actually or open a apis so you can use it maybe directly into your application as your whole API or you can use it as we will be using it right now coupled to external tools like uh continue and others and for that we need to ex install the extension continue so let's look here continue yeah continue here install it and in the meantime we need to test our webui and download some models Lo okay perfect so I have an account and I have this two model already downloaded but I will show you how to download this model as well you just come here you go to models and you click here you come here and you choose the model that you want to pull so let's say we want to pull this L 2 billion parameter for example we copy this part after the run and we paste it in in this section here you can actually just run this and download this model diu in the Comon line it's not really a problem you can if you install AMA you can just use this AMA command to run and pull this images if you want but we wanted to have this gii so that we can test and play around with different uh possibilities to choose maybe the best model maybe between dips dip coder and Lama code as well but yeah you can actually do that without having to use uh open uh your I and in that case if you don't want to use open UI you can just delete this service here and uh this disappear and this the volume as well and then you just have Ama that you can run and you can just run it with Docker and you will have only AMA and you will just have the API uh Point access point okay so when the image is downloaded and everything is okay I hope everything's okay yeah perfect this verifying and then yeah perfect everything's okay now we can come here and choose the model that we want let's say we want to use Tima and uh we want it to write let's say uh write ass s not a sorting Al um uh search algorithm in Python uh all o o n log n hopefully understand the complexity I'm not sure it will understand but hopefully it understand the complexity uh yes it should be uh it should be less than that this is the complexity of a search algorithm of um of a salt algorithm I think the search is just o n yeah and maybe can be less than that log N I think yes so that's perfect so it tells that exactly and it's extremely fast that's amazing so we'll be using maybe we can try different other model so we can take the same prompt um give it to another model just open this again and choose this Slama and run the same command just remove the end so we have complex so we will see which model perform give the best result and we'll use it into our continue thing so in the meantime we have this continue here that's actually installed uh I think yes we have continue that's installed and we can do continue can we okay so now we can do this and do this continual and do a lot of thing with the continue thing we can um we can actually show this you for continue and ask the question questions that we have or whatever um so what I had I had configure this launch before so let me just delay the models and I will show you how to configure them again so you come here and you click on this ad and this ad will show you this list and you choose the AMA list uh directly here and uh here you just choose maybe you can actually choose the model that you want to use directly here um here for example you can choose Auto detch as well uh or yeah it will detect the list of model that you have into your and uh p p push push them into the into the possible model that you have as you can see here you have gima you have tipic you have Cod Lama to 7B um yeah so I think yes this just the the algorithm I'm not sure which one is efficient enough um so maybe they try it out and see the difference between those two models um but I feel like Google model is faster than um the Lama model so they just see which one is interesting this actually explain what is throwing so I think J JMA is better actually I think JMA is better um than the maybe Cote Lama so we will be using JMA so you just come here you choose JMA and maybe you can ask something pretty simple uh like [Music] um we can do this you can come sorry not here yes so maybe the question that we had here we can still have I ask the same question here and uh we'll get the response um from the model and the other thing we can do is actually pretty nice we have kind of the same thing here uh and us this it's pretty simple so you have this uh useful assistant directly into your visual code and it's extremely fast that's the best part the best part extremely fast and you have the possibility to actually um f tune this specific small model to your code base and have a model that's much more uh relevant that give much more relevant responses Bas based on your code and uh yeah so they just test something else we just create a file here let's say call this file tested by yes so um comes here and just start writing something like um um now let do I don't know um what can we do create a flask server running onp 5,000 to something pretty specific okay let's try something that specific and uh we can just do this um do this and do continue click edit and say right this code look at this this is pretty amazing extremely fast right this is extremely fast and uh maybe we could try it out um but we will need the P stall and much more things but this should work normally so this this main function just creating this app and it should normally work when you have the the world setup for your code and the good part is the speed of the GM maybe the G model but this actually depend on model you use because if we choose this code Lama and we ask the same thing uh like just create this this flx server and we do continue quick edit write this code you will see that it's becoming a bit slow and the response is not that good so JMA 2B is actually much more interesting than the code Lama 7B I think even in coding and also um yeah look this is pretty slow and it's configurating a lot of things I just want to have a simple server I don't know yeah why I have all these rout they're not relevant for me right now um anyway so I think JMA is much more better than klama in this specific use case in this specific case um yes so yes so afterward you have this pretty small interesting button that can just you can just use to accept the code or reject the code that has been generated by the model and uh yeah you can follow up here and ask much more things about the context that you in and um yeah so that's it so I hope this video was interesting for you I hope you learn something new and um in the upcoming video we'll do much more complicated things like fine tuning models and using these models into real AI applications um yeah so if you're interested by this topic don't forget to subscribe and if you have any question if you have uh any problem with the code or whatever you can just comment down below I try my best to answer all the comments that are um helping people to use AI tools and um yeah so um thank you for watching and see you in the next video ciao

Info

Channel: LTPhen von Ulife

Views: 884

Rating: undefined out of 5

Keywords:

Id: YB-FgwaTul0

Channel Id: undefined

Length: 16min 8sec (968 seconds)

Published: Tue Feb 27 2024