Deepseek Coder vs CodeLlama vs Claude vs OpenAI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
deep seek coder is a new model that has been released by researchers in Beijing it's outperforming code Lama on a number of benchmarks perhaps more importantly though it's a very permissively licensed model which means it can be used in an open source and Commercial Way without having any limitations including limitations like Facebook's llama which does not allow you to train other models with it for agenda I'll give you an overview of the model it comes in 1 billion 7 billion and 33 billion sizes I'll talk through the prompt The Prompt format which is a little different than the Llama model then we'll go through inference using runpod which is inference on a GPU I'll show you chat UI briefly a chat interface then I'll get into comparisons with code Lama and particularly want to look at long context inference then to finish I'll talk you through function calling and some of the models that trellis has fine-tuned if you want to use the Deep seek model for function calling and last off I'll just review on resources deep seek as a model comes in three different sizes 33 billion which is very similar in size to the code Lama 34b deep seek also comes in the 7 billion size it's actually 6.7 billion and in a 1.3 billion parameter model that's a really small model it's smaller than the bass Lama model of 7 billion it's actually a bit similar in size to Tiny Lama you can check out the earlier video on Tiny Lama I think the very small model is interesting if you want to run on edge devices or if you need very quick inference so it's pretty cool they've made a model that small as I mentioned in the introduction deep seek is very permissively licensed there aren't many restrictions on commercial use there's no restriction on training other models using the Deep seek coder model so this is an improvement over the LMA open source license and I think even if it performed just the same as the meta model for this licensing reason reason it probably has an advantage before I get started with the inference just a quick note on the prompt format for this model those of you familiar with llama will know that you often put a system M message wrapped in CIS and then you wrap that along with the user input with inst and end inst so this is how you might format your prompt preparing an input for a Lama model or Lama 2 of course now it's a bit different here for deep SE coder they allow you to use token izer do apply chat template and this applies the chat template to an array of messages user or system or user system and assistant messages when this is applied it automatically injects this system prompt in addition in addition to whatever system prompt you add it says that you're an AI programming assistant you'll only answer questions related to computer science and it says not to answer politically sensitive questions security and privacy questions or non-computer science questions and then it us uses this uh triple pound instruction before the question and then triple pound and then response or triple hash before response before the assistance response would be generated by the machine of course you can customize this yourself if you don't want to use the apply chat template you could simply formulate The Prompt just as we do for the Lama style and indeed that's what I'm going to do for the rest of the tutorial if you want to serve the Deep seek coder model one easy way to do so on a GPU is using runp pod I'll provide this link here in the description of this video if you follow up on the link it's going to take you to a template deep SE coder 33b API by trus research and you can run this model quite easily I recommend an a6000 which has very high availability you can directly make API calls to this by clicking on the read me you can check out a little more just by adding in your pod ID here to the URL um that is a URL then that you can make queries to and you can ask questions like what is deep learning so this is a very quick way to get an API started that allows you to make API requests in parallel as well if you're interested in doing inference on chat UI uh using their chat interface you can use the code provided in the Lama server setup repo to setup inference and here is the model setup and when you go into chat UI here you can see I'm running the Deep SE coder 33b instruct awq so it's activation aware quantization type uh quantization From the Block and we can just have a short chat here say hello and then write some short code to add to numbers you can also check out the blocks repositories on hugging face where you can also find inference instructions as well for different models awq gptq or ggf which is the format if you want to run inference on your laptop next up I want to show you some comparisons between deep SE coder and llama H code Lama in fact what I'm going to do is load both of the models so for model name a I've got the Deep seek AI model which is 33 billion parameters and I have for model B code Lama which is 34 billion parameters so two models that are very similar in size and probably are really similar in architecture as well there's some small differences with the grouping of attention which has effects upon speed but primarily it's the data sets and supposedly the improved quality of the data set in deep seek coder that allows it to outperform code Lama in a number of benchmarks of course we're going to see for ourselves in these benchmarks that I'll test today after doing some installation um we will load each of these models Model A and model B I'm going to load them quantized using bits and bytes and F4 that's probably the most accurate form of quantization available right now although it's not the fastest compared to awq we're not too concerned with speed so that's why I'm going to go with this option today it also allows me to do quantization on the Fly which is a nice attribute if you're trying to speed up development so when these two models are loaded we'll move on to set up the tokenizers when I do any testing I like to use a really simple example so I'm going to start off just by asking each model to list the planets in our solar system and I'm just going to allow them to generate text without having any stop token so here you can see the Deep seek model it indeed lasts the um eight planets is it eight yep eight planets in our solar system and it keeps on talking afterwards cuz I've got no stop token setup and in the code lamama case it also list out the eight planet um so that's my little check everything seems to be working fine and we're going to move on to the first test which is to Res return a sequence in reverse the idea of this text uh this test is simply you provide some letters like ABC or one 12x and you ask the model to return those letters in Reverse it's quite a difficult task and as we'll see here the Deep seek model is able to return the first sequence in Reverse so it returns a in Reverse as ba AVB it returns as VBA ABV zp it returns correctly as well ABV zps uh it does not manage to return that correctly so it's able to get as far as this token here which is this sequence here which is six so it's able to basically get five in a row correct but um now we can look at code Lama and you can see that code Lama actually struggles in this this case with even doing um even doing the sequence of ab it has trouble so there probably is some strength Improvement here it seems just with this simple uh token reversal test when we compare both of those models what you can see is that this challenge is just very difficult um you can try and do it with another AI let's say for example let's try chat GPT here we are in chat GPT and let's just see so IGA that works fine and this one here 4089 so it works fine I think as you get longer and longer the model will slowly have difficulty so 89 f as D9 agf SF 0 u n 32089 i g so this looks good as well and for this sequence here okay so when you get to a longer sequence length you can see that there are some errors for example with saf here that in Reverse should be FAS but you can see that within the reverse sequence according to to gbt 4 it's actually still forwards instead of backwards so this is indeed quite a difficult test but gb4 does way way better than what uh GPT um than what the code models do now we can quickly look at GPT 3.5 and let's just do something like this 7498 y edit Dash and this still looks correct 7D and4 DF 098 Y and here for GPT 3.5 you can see there is an error so d l FF and here when it's reversed it leaves out an f and so you can see GPT 3.5 is weaker than uh GPT 4 on this metric although it also does appear to be somewhat stronger than the code Lama or indeed the um deep seek coder model the next test we'll be doing is Pass Key retrieval which is where I embed a random Pass key in the middle of the text right in the middle of the text it's in the middle of Burkshire hathway transcript and then see whether the model is able to retrieve it more specifically I'll ask the model to respond um only with the pass key contained within the text so respond only with the pass key contained within the below text then I give them text and then I say respond only with the pass key contained within the above text and then the assistant should respond the pass key is and continue that phrase before I do an actual Pass Key retrieval I've earlier run a check where I give a long piece of text and then ask for a summary and I've done that with both models so we have a really long transcript of the Burkshire hathway 2023 meeting it's about how long is it about something like 30,000 tokens in length let's see here um it's 29,000 tokens in L so almost 30,000 tokens and at the very end I ask for a summary to be given so let's go down wow this is quite a long piece here we go so here's the summary from the assistant with the the Deep SE coder model the text discusses The Berkshire hathway annual meeting including Warren Buffett Charlie monger and a Jane they discuss the company's earnings the impact of AI and Robotics on the stock market yeah so this is actually in line with the content of the passage let's take a look at code Lama so for code Lama um yeah we don't quite have a meaningful response here one thing that's interesting is both of these models are stated to work for 16,000 tokens although it's known Cod Lama can work for a bit longer I have run these models clearly on 30,000 tokens and they seem to give sensible although in the case of code Lama not useful responses here this is pretty interesting because I haven't used rope scaling it's commonly uh commonly rope scaling is used when the model is loaded I'll just show you that right up here so right back at the start when we loaded both of the models there is an option to include rope scaling with with a factor of two which in principle should allow you to get performance at twice the length the trained length by interpolating the positional encodings but I haven't even used that and I'm still running on a context length that's far longer than the number of tokens you can see I'm actually getting a warning when I do run that so it says the token index's sequence length is longer than the SP fight maximum it's 29,000 instead of 16 and yet I'm getting a pretty sensible respon response so this is pretty interesting and in fact I was able to run this then for the pass key retrieval and we're going to run that live right now so we can see the performance of both of the models retrieving a pass key but in the meantime I'm going to show you a run from earlier where I did 50,000 tokens and I did it only using the Deep seek model so I ran it uh for a context length which we should see here of 43 3,000 tokens and here you have the really long transcript snippet and right down at the bottom uh you can see here that deep seek is actually able to get the pass key so the pass key retrieval is pretty phenomenal uh for these models it's so good I even looked at it in claw so what I did here is I copied all of this text just literally like this here leaving out the pass key of course and I copied it into cloud and I clicked enter and Claude says I do not see a pass key explicitly stated in the text provided the text appears to be a transcript of a discussion between various people without any Pass Key included so this is pretty interesting um it seems that for passy retrieval we're getting better performance on The open- Source models than the cloud chat model here of course I can't test chat GPT at that kind of length because well I don't have access to the 32k model and I'm actually testing even longer than 32k I'm at 43k input tokens uh so there's no model from open AI right now that offers that long of a context I'm not sure exactly why Claude doesn't get this it's possible that there are different tricks being used for the attention and that's resulting in a loss um of information in the middle of the text or it's possible they haven't got the same amount of training on Long context data as the open source models in the meantime we've been waiting to run our tests uh directly comparing deep seek and code Lama so let's just scroll down this test just as a reminder is only on 29,000 tokens so nearly 32,000 I'm just doing it a bit smaller so I can fit both on the same GPU so let's scroll down and we should see here Pass Key coming up wow this is pretty long okay here we go so yeah clearly the pass key is correctly received in the code Lama model and we have to scroll up to try and find where the Deep seek model is so here in the Deep seek model the pass key is also found so basically we get very strong performance in both of the coding models um with doing these Pass Key retrieval tasks now the very last comparison I want to do here is on a website development task here we're going to ask the model to create a website um not only uh create a website but create an sh script to create a website so when I run the sh script it should create the folder structure and the files needed and the website should allow a user to find the first Prime first n Prime Fibonacci numbers by entering a number n on the website the script should set up everything necessary and create all necessary code in logic and lastly should give me the command to run the website on Local Host so I'm basically asking the model to in one shot create uh a script that allows me to deploy a website to my local host now I'm going to need uh more tokens here because the script is going to be longer than just 50 tokens so I'm going to go right back up earlier in my script and I'm going to adjust the inference code so the inference code here is set to 50 tokens I'm going to allow it to generate 1,000 tokens and then going back down to the bottom my script here that's set off the coder to create our website all right we have a response now from Deep seek so we have our simple bash script that creates a website uh you can see it creates project directory structure files um it provides some code some pug code here and looks like a basic we Website Layout now it does say that uh there isn't any code included for actually finding the first n Prime Fibonacci numbers uh so that is definitely not what we asked for because we asked it to specifically include everything and create all necessary code and logic and nonetheless let's just quickly test this code here so what I'm going to do is copy all of this sh script now it should be careful running sh scripts by the way make sure that you read them first so that you don't run up against any issues so here I'm in a folder and I'm going to create I'll call it uh deep seek sh and chod plus xdeep seek. sh just to give it permission to run Nano deep se. sh and I'll paste in the code here control x y and enter and now I can just run this by doing deep seek v h so let's see what this brings okay it looks like it's created the website so let's move to um where it's created the code and it's called Prime Fibonacci website so we just move into that and I did npm start and we can now run presumably here on our Local Host 3,000 okay so I mean it's giving a reasonable website but it's not able to post to Prime Fibonacci which makes sense because um it has not provided those scripts so it's good at making uh the initial website but it's not providing the full scripts unfortunately now if you did prompt it further for the scripts it might give you that but I'm testing the oneshot performance right here now let's take a look at code Lama so here's um it's not sure how to create the file structure interesting all right so you can see here code Lama is not able to stay on track with the problem quite as well um it's kind of refusing to produce an sh script uh so I would say in this case that probably the Deep seek model is a bit stronger it did a good job and I feel like if I had further discussion with it it probably would be able to give me the code for doing the Fibonacci problem since we already know from the earlier coding problem that it can do that so I hope that gives you a sense I would say maybe um let's just for comparison compare it uh to chat GPT so let's clear the chat and let's uh compare it to GPT 3.5 yeah it doesn't well it does create index. sh so let's see here okay this is good okay this is good so let's let's follow the instructions it's not exactly one script but it's something I can actionably do so let's run this and copy this code here now again just make sure that you skim through the code that you're not executing anything malicious oops that was a mistake so here I need to create index on plus X xsh Nano is that correct yeah index. sh run index. sh makes sense what I was doing start. s it says it's running on Local Host so let's see here okay let's try 10 all right that did not work interesting okay if you iterate with chat GPT you would eventually and I'm sure also with the code or the Deep seek code or code Lama you would be able to get a fun functioning website but this is a one-hot test and as you can see the website does display but it doesn't actually do anything so I would say all in all the performance if you measure it very strictly is maybe somewhat similar chat GPT to the Deep SE coder with the Cod Lama version it's not able to provide as easily a fully ready website very briefly before we touch on overall resources I want to show you function calling there are a set of function calling models available now for the deep deep SE coder model I've made available a 33b model for purchase the 7B or 6.7 B model and also the 1.3 billion model available on hugging face just to give you a sample of how those perform they allow you to input function metadata here I have two functions delete file and also get current weather and when they are provided to this fine tune model is metadata if somebody asks how hot is it in Berlin I'm wondering if I go out recycle there response uh will be a formatted Json object that calls the function gives the location and the unit that's relevant to that location in this case so this can be very helpful if you want to connect a deep seek model to apis of course the Llama models are also available in fine tune function format as is the Mistral 7B as well we've jumped around a lot in this video on deep SE coder just to give you a brief summary and then try to lay out the resources that might be helpful if you want to use this model in brief I would say this model seems to be performing at least as well as Cod Lama it probably is performing a bit better we could see that in how it generates a full website it was a little bit more complete and following the instruction than code Lama it also seems maybe to be a little better on longer contacts in providing summaries in terms of resources if you want to mess around with this model and I would say this for most models that are available open source head over to hugging face and check out the blogs repositories there are guides there on doing inference that you can follow you can follow a guide for ggf if you want to run it on your laptop I recommend having a Macbook for that it's going to be slow on a computer but if you do want to run on a computer probably best to use the 1.3 billion model which is the smallest and quickest if you want to run in a Google collab notebook which is quite a handy way to go you might want to use a WQ that's the activation aware quantization method it provides the fastest model for inference it reduces the model size as well so you should be able to run the 6.7 B model very easily on a free collab notebook if you want to move to higher levels and run inference on a server maybe it's a server you're using to generate prompts or to process data for training language models one of the easiest ways to get started is the one-click Run pod template which I will link below below if you want to get more into the formatting you can check out and purchase the inference repository which I'll link as well below last off I haven't talked much about fine-tuning in this video but it's possible to fine-tune the Deep seek models in the same way as the Llama models and so they will be compatible with all of the scripts provided in the fine-tuning repository available for purchase as well below let me know your questions on the Deep seek model looking forward to the next video cheers
Info
Channel: Trelis Research
Views: 4,385
Rating: undefined out of 5
Keywords: deepseek, deepseek coder, deepseek 33b, deepseek inference, deepseek llm, deepseek ai, codellama, deepseek tutorial, deepseek language model, claude, openai, deepseek vs codellama, deepseek v codellama, claude vs chatgpt, claude ai
Id: g4UFDb3ySmY
Channel Id: undefined
Length: 27min 29sec (1649 seconds)
Published: Mon Nov 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.