Using Llama Coder As Your AI Assistant

Video Statistics and Information

Video

Captions Word Cloud

Captions

hi recently on this channel there was a video about using an offline coding assistant in vs code there are a number of them out there but I've had the most luck with continue and with llama coder LL coder works in the code file completing what you're typing whereas continue will give you a chat interface where your code AS context one of the common questions about that video is basically what languages can I use with these assistants and to answer that we should look at what they're actually doing so you can break down the process into a few different steps first you the developer start writing some code maybe you add a comment and want the assistant to write whatever you asked for or maybe you want it to fill in the blank in the middle of some code the assistant copies the code on the page and maybe some other Pages for context it then formats that code in a way that a model will expect to see it so what is the format well many of the models use their own format with their own keywords figuring out how to build these models takes a long time and often the keywords and formats weren't published for the other models when the researchers started building the one that you're using the model developers aren't trying trying to make our life hell I think I I hope so what defines the format well the researchers go through a process called training that formats the inputs in a special way feeds that to the model and if the model answers the right way adjust the parameters of the model to try to ensure that it answers that way every time and then it repeats that with a huge number of inputs and outputs put and hopefully at the end we have a smart model that's a super simple way to look at it I'll cover training and what it means in another video but the model expects all future input to look just like the training input so we have to stick with that special format for models that use the Deep seek format that looks like fimor begin and then your input fim whole which indicates where you want the answer to go and then more input and finally fim end those angle brackets and pipe characters that you see are really important to the format but hard to say so I skipped them having that fim whole command is pretty cool and filling in the middle is often referred to as infilling so once it's formatted The Prompt it hands it off to the model but tools like this often don't actually run the the models in the case of llama coder it uses AMA to run the model so you need to have Ama installed to run it olama has a special endpoint it listens on for requests gets the specially formatted prompt and then outputs the answer one token at a time a token is roughly a word or common part of a word serving this back a token at a time is referred to as streaming the assistant takes that stream and outputs the answer to the code window so let's use llama coder to see this in action in settings for llama coder we start with an AMA server endpoint if you're running AMA on the same machine as VSS code this will be blank next is the model to use you can see that as of this recording at the end of January 2024 there's stable code code llama and deep seek coder these are all special models that Focus on writing code and they all have formats that allow for filling in the blanks in the middle of a code block next is temperature this is often associated with creativity though that's not really a great term for it models work by guessing the first word and then figuring out what is the most likely next word having a higher temperature means that the most likely next word is not always going to be the same and you may have something very different if you want to use a different model you can specify it in custom along with the format to use this is great if you have a fine tune based on deep seek stable code or code llama which will use the same format as those other models the final two options limit how much can be written in one swoop so now you can start writing code or add a comment when you press the space key and nothing else for a second it'll feed what you're working on to the model and that gray text is what it suggests that you might want to use you can accept the full suggestion with tab or just the next word of the suggestion with command Arrow I assume that's going to be control arrow on Windows and Linux but when we did that we didn't see any special format that the assistant sent to the model is that just a black box well it used to be let me show you something that'll probably come out in in version 0.123 or another future version if I quit the olama server and then build AMA myself and run that with the olama debug equals 1 environment variable then I get extra stuff in the logs now I'll start olama run in another window and ask my favorite prompt now I'm pretty sure I was the first one to use this in this space and now it's pretty common though if you find someone else doing it first I will ignore that and keep thinking that I am special why is the sky blue and we see my answer now go back to the server logs and scroll up we can see the prompt in the logs along with the full answer this is great because you can see how the template was applied to the prompt now let's try using vs code and if we look at the logs we can see the fim begin f M whole and fim end just as I described there are a lot more entries here because it's running every time I press the space key and then pause for a moment but yeah know the concept the same so that's how this stuff works but I still didn't really tell you what languages are supported the best way to figure that out is to go back to the AMA AI page for the model and then find the link to the hugging face repo for the model and read their docks for deep seek coder that points us to the GitHub repo for the model and that shows us a list of programming languages supported it looks like it's pretty much every language you can think of plus a few I'd never heard of Idis Isabelle blue speec augus so yeah your language is probably there though here is a full page of languages not supported for stable code the list of languages is actually on the yamaai model page a shorter list of about 18 languages code llama seems to be a much shorter list but I couldn't be certain what the list actually is now all the models reference various benchmarks that all prove that that model is the best model but if you've seen my videos before you know my opinion on bch barks they all suck wind to understand what is the best model for you you're going to have to try them all also try the different sizes of the models but I tend to choose the smallest that gives a good answer having to wait for a model to generate an answer means I won't and we'll just keep typing and I really don't want to Peg my GPU constantly especially when on battery because my M1 all day battery will be closer to my Intel MacBook Pro with 45 minute battery and I don't want to go back to that so that's a little bit about how the VSS code uh assistants work and what languages they support if you have any questions let me know in the comments below if you have any ideas for future videos let me know in the comments below this one this video was because a few people left a comment asking what languages are supported so thanks so much for watching and have a great day bye for

Info

Channel: Matt Williams

Views: 53,772

Rating: undefined out of 5

Keywords: ai, llm, llama, llama coder, vscode, golang, tab completion, code generation, software engineering, software development, llama 2, github copilot, software development life cycle

Id: fT-sUUq48Xk

Channel Id: undefined

Length: 9min 18sec (558 seconds)

Published: Wed Jan 31 2024