NEOVIM CONF 2023 - Introducing nvim-llama

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello neovim 2023 I hope you're having a wonderful conference my name is John and in this quick lightning talk what I want to discuss is the state of large language model and AI generation technology within neovim and I also want to show off and demonstrate a small proof of concept plug-in that I've built that brings some additional functionality to the neovim ecosystem and I hope uh people find useful so let's uh dive into it a little bit now if you're not familiar with large language model technology or you're not a data scientist or you're just not in this world day-to-day you can sort of think of that as just chat GPT that's sort of the quick mental route to get to uh in regards to llm stuff and probably the most well-known uh is github's co-pilot and there is a GitHub co-pilot uh extension for Vim written by none other than Tim Pope uh the legend himself and it's very good it brings a lot of that co-pilot techn technology into neovim the generation and sort of getting you chunks of uh generated code right within your editor it's very good there's one for codium which also brings a lot of that same functionality right into the editor you kind of get where I'm going with this um and one of the things that I found was was just really annoying and I was not super pleased with personally at least was I didn't want to be using a thirdparty service consistently within neovim to get a lot of this same really great functionality with code generation using llms well you may be thinking to yourself John there is good technology that exists today for running large language models locally even on you know kind of consumer Hardware without like tons of gpus and stuff uh just on this MacBook for example uh and that's where this project uh llama.com in llama CPP is an interface for llama models and really llama like models uh in pure C and C++ uh it's really a fantastic way to run large language models uh directly on consumer based Hardware essentially uh and if you're unfamiliar with what llama is uh llama is a open- sourced large language model that came out of meta and is is very very good it's just about as good as GPT 3.5 or something from open AI so here I have llama CPP I've built everything and I have all the necessary dependencies here if I look in the models directory uh here I have code llama uh the 13 billion parameter one uh Guff file and you don't if you're not into Data scientists or into like building llms and all this stuff this is this is the whole thing this is the whole shebang this is the whole trained model with all the parameters and stuff that llama CPP can use to actually start doing generation and stuff and now from the root of this directory I'm going to run the main program that I've compiled the main cc++ program that I've compiled uh with a bunch of parameters and I'm going to give it specifically that code llama model that we just talked about and uh a bunch of other stuff uh just to get uh just to get us going with a bunch of parameters that I've tweaked to work for this so here I have a chat prompt and I can just tell it something I could say uh write a rust program uh that prints hello world and it'll start doing actual code generation it's got some Rust there and this is pretty fast actually U compared to maybe something you would notice in uh chat GPT maybe just about as fast so that's llama.in a nutshell we can run some local models that we have downloaded that we have installed uh and we can actually do that from our own hardware and our own terminal very exciting that brings me to this proof of concept project that I've worked on called envm llama and what this is is essentially an integration with llama.com and I have this installed via Packer uh nothing really exciting going on there but what we can do is we can go into vim and we can start a similar prompt chat within neovim here you can think of it as basically getting access to a chat GPT like prompt but right inside a newm so I'm just going to give at the one autoc command llama uh that we need it does the startup of uh llama uh. CPP that we've seen and it gives us the prompt there uh like we saw previously so we can do something very similar uh write a hello world program and rust and it'll start generating right from within here and there it starts generating some code and that's very similar to what we saw before when I finished I can hit contrl C and I can command Q uh right out of there and then I'm back to editing you know whatever files and stuff I can get back into my typical neovim workflow you'll notice a few problems here already with this proof of concept and essentially what makes it not ready for the wider world of adoption um is that building uh ll. CPP or really any C or C++ program is that the dependency tree for uh getting that to work on any number of systems uh is very large which uh I as a maintainer of this plugin don't really want to have to handle and don't really want to have to maintain now enter ol Lama which this is going to basically be the future of the envm Llama plug-in project uh but oama attempts to abstract a lot of that away by giving you a command line interface to not only download and install uh llm models and Guff files and all that stuff but also an interface within docker to actually run a lot of this stuff uh via containers or you could run it locally um and kind of abstracts a lot of that uh ll. CPP stuff away under the hood it is also using llama CPP but abstracts having to build it for different architectures having to run it having to give it those optimization tweaks you saw and that one command that I ran um a lot of this gets abstracted away within olama so now let's look at what running ol Lama would look like as a service from a container that the future of the envm Llama project plugin uh could start using almost as a service so it wouldn't have to run ll. CPP and build it and create a bunch of that Cru uh and we can do that with Docker and the container that olama provides uh I'm going to give it uh this Docker run command here we start running that and it boots up and there it is it's ready to start accepting uh some different requests so what I can do now is I can actually make just typical HTTP requests like a service uh to this thing on my local machine remember this is all locals that's sort of the uh whole antithesis of this and so what we can do is curl uh we're going to post to Local Host 11434 which again is that ama container and uh we're going to give it API generate uh with this payload in the model uh for the payload we're going to say llama 2 and we're just going to do the 7 billion parameter one uh and that'll be just fine we will then give it a prompt uh why is the sky blue and we will see it uh actually up here so fast forwarding a little bit what we see here is a stream payload uh from the oama service giving us the generated text uh if we were to put all these together each of these little response bits we would actually get the whole tokenized response so this is all great and this is fine but there's another problem that you may be thinking of um we have our service that's sort of running an agnostic model we can just give it some HTTP curl response but uh Lua and neovim and a lot of these uh plugins are not in my opinion very well suited uh to interface with a service like this uh again people have done it uh you can do just about anything in Lua but uh in my opinion there's there's better ways to do this and integrate uh with these HTTP sort of services and that's where I want to talk about another proof of concept Library written in Rust called o Lama RS that I've written and this is a very rough proof of concept but essentially what this is is uh that similar interface that we saw in the envm Lama project where we're basically getting that chat response prompt uh like you would in chat gbt uh but this is interacting with instead olama as a service so let's just do cargo run and it's running there and we can see it's generating that same response I've hardcoded the request for why the sky is blue inside of the code just as a proof of concept but we can see this stream is it makes much more sense um and we can use this uh within our envm llm model very exciting times to be involved in the envm community and to be looking at large language model technology uh if you're curious or want to get involved I could always use really any help at all uh please check out the envm Llama direct uh repository under jpmcb uh envm llama uh and be looking forward to these changes coming soon for integration with AMA and getting a lot of this power and stuff uh hope you have a great conference thanks everybody bye

Info

Channel: John McBride

Views: 8,404

Rating: undefined out of 5

Keywords: software engineer, coding, nvim, neovim, neovim conf, vim conf, nvim llama, nvim-llama, large language model, large language model nvim, llm nvim, llm neovim, vim, nvim plugin, neovim plugin

Id: Jx4IEJRlPQw

Channel Id: undefined

Length: 9min 46sec (586 seconds)

Published: Sat Dec 09 2023