GPT4All V2 Upgrade: Commercial License, 1-Click Install, New UI, New Base Model, V1 vs V2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well we have a new version of the gpt4hole now trained on gptj these models are released lately in the open source community so quickly that it's hard to keep up and be able to try all of them this one is particularly very interesting because now it has license that allows commercial use and also it's interesting they publish a paper and describing how to train the model and how they obtain the data it seems that they no longer use the data that obtained using gpt35 turbo which is really very good they also seem to change the existing data that they use originally to train the gpto in the document they stay that they created the original 400kgpt all examples with new examples Now using multi-turn q a samples and creative writing such a poetry wrap and short stories that's very interesting they're also very open about how they clean the data and how they created the data most of the companies using commercial large language models are not really that open to how they did it there is also information and you can commentation about how you can try to reproduce the training this gives you the opportunity to First download the training data that they use to turn the gptj model but then also add your own training data and extend the model or additionally tune the model if you think that that makes sense for your use case what's really great in this release also is just it's very easy to install on Mac Ubuntu or Windows especially for people that are not that into the tech so let's see how the installation is going to work using my Mac here so if you click on this link a file will be downloaded you just have to open it click double click to install it now if you give something like this just cancel it with a right Mouse click and then open and then open again and then the setup is going to start you have to specify where you want to install it which component currently we have only this one it's gonna take about about four gigabyte of space accept the license and start the installation this is going to take a little bit time as you can see it's downloading everything from internet okay this takes a little bit time but looks like we almost there so let's click on finish okay so now we install it but what is not exactly clear is how to start it I think this is not really into the documentation or at least I haven't been able to see that if you try your Spotlight search and just look for GPT for all well I found probably the folder that this was installed all right so this is the one and I guess here inside the bin there is a chat so let's start this one looks like this is taking a little bit time this is also Mac using M1 Apple silicon so I hope this is not a problem all right just pop up it take time so it was like about 30 seconds so let's see um maybe the first thing that I would like to try is if you've seen my previous video I when I test the GPT for all versus uh vanilla llama model let's get some of those questions and see how it's going to perform so if you remember one of the questions I had is to try to summarize the text which I took from The Berkshire Hideaway 10K report so let's see how this is going to work here space this it's thinking to check our memory and CPU consumption well it's definitely a CPU intensive and currently is using about four and a half gigabyte of memory all right it starts the summarization this took about 20 to 30 seconds okay we see it's definitely not one sentence it's really elaborating a lot but it's good so let's compare this with the previous response here maybe in the next video I'm gonna try several models and put the outputs side by side to see how the different models are performing so the next the next one where the llama and the GPT the original gpt4 model was struggling a lot where greater python function that detects byte that detects prime numbers so let's try this again here and see how well this is gonna work all right so this is actually starting really very good now it writes even comments inside which was not the case before it's finally realizing that it should take numbers or should check numbers one and Below one very interesting looks way better than the previous version of the model not sure why everything is so big and but why the resolution is like this maybe that's something that we can fix later oh and really starts also to describe how to what the function is doing okay I know this is a simple question but that's the output is definitely big progress comparing to what I've seen before the output is slow so this output takes now probably about more than a minute almost two minutes to complete let's see the CPU utilization and the name of the memory utilization as you can see increased and the CPU is still keeping up very high and it's still producing content for that question that's amazing it's probably a little bit too much now a lot of what you see here I I haven't seen that method before I'm not sure if that really makes sense that's something that I need to validate I wonder if it's going to work faster if I run this on GPU on machine with a Nvidia GPU amazing it's producing content already for three minutes or even more it takes a lot of time wow but on a positive side this definitely gives you this time a very elaborate description and all this by only asking to create a python function that detects if a number is prime or not all right it's finally done and it's probably working for three minutes or something so look how much content generated so the function at the first Loop looks probably okay it's definitely better than before and it's a lot of x explanation that's amazing so I need to double check over later if this really makes sense let's try the uh another question let's see how it's going to your answer on this C15 is a prime number this was something that the previous version of GPT train on llama was actually providing a good response but it was the take but it was saying that 15 is a prime number which obviously is wrong so let's see how the how it's going to perform this time oh no it says again that 15 is a prime number that's a disappointment okay that's messed up not good let's double check this answer is 15 a prime number no is 31 a prime number yes but definitely not for the reason that was provided here it says that it's a prime number because it's divisible only by 2 3 and 31 so Y is 31 a prime number then 31 is a prime number because it's divisible only by one and the number itself that's what pi number is so that's here very wrong which is very sad by the way because the previous version that was tuned on llama was at least giving us the right the right explanation why prime number is prime number all right let's move on let's see if it's going to give me the first five prime numbers this time well I actually was giving correctly the first prime numbers also in the previous version but let's see what's going to happen here okay it's not able to generate prime numbers it no that's a disappointment again okay the last thing I want to try is the error messages I had when I was trying the previous version of gpd40 where they tuned the Llama model so the question now is where's the Llama base model better than the gptj model and that's why this tuning didn't work that well what would be interesting to try is to use the same data set which is publicly available and to apply that data set on llama and compare the results even though technically this won't be something that you can use commercially because of the Llama license at least you will know if the you will have some ability to do comparency if the underlining model was not okay trained in this case the gptj okay so we see here looks like it's providing a good answer but it's not telling me what this well it tells me that it's outdated but it's not showing me how to okay let's ask the python let's see let's see if it's gonna tell me because before it was responding that it's outdated and I have to update but it also giving me how to do the update and that's not happening right now okay so now probably it's going to give me the command what is nice is that it also has this formatting now so theoretically in the next version of this UI taken update that you look a little bit better like it is in the chart GDP when they when they produce a code to have a better formatting okay so this is good it's kind of okay it's great that it has license and the data was obtained in a way that likely allow commercial use but the response especially the prime number questions this was kind of a disappointing that's interesting here well that that's okay but what is the tokenizer so uh is there some python version just wanted to check if there is a version that has tokenizers with double I but there is nothing like this so that's actually very interesting where did it came from did it hallucinate all right so I think that's for today I hope you learned something and see you next time
Info
Channel: Lyudmil Pelov
Views: 9,416
Rating: undefined out of 5
Keywords:
Id: oV_NyulVqXg
Channel Id: undefined
Length: 8min 50sec (530 seconds)
Published: Fri Apr 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.