OpenAI's NEW Embedding Models

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
way back in December 2022 we had the biggest shift in how we approach AI ever that was thanks to open aai releasing chat GPT at the very end of November chat GPT quickly caught a lot of people's attention and it was in the month of December that the interest in chat gbt and AI really exploded but right in the middle of December open AI really another model that also changed the entire landscape of AI but it didn't go as notic as chat GPT and that model was text embedding order 002 very creative naming but behind that name is a model that just completely changed the way that we do information retrieval for natural language which covers rag FACS and also basically any use case where you're retrieving text information now since then despite a huge explosion in the number of people using Rag and the really cool things that you can do with rag open the eye remain pretty quiet in their embedding models right embedding models are what you need for Rag and there has been no new models since December 20122 until now open AI has just released two new embedding models and a ton of other things as well those two embedding models are called text embedding 3 small and text embedding three large and when we look at the results that open is sharing right now we can see a fairly decent Improvement on English language embeddings with the mte Benchmark but perhaps more impressively we see a massive Improvement in the quality of mul IL lingual embeddings which are measure using the miracle Benchmark now 002 state-ofthe-art when it was released and for a very long time afterwards and still still a top performing embedding model that had an average score of 31.4 on mirle the new Tex embedding 3 large has an average score of 54.9 on Miracle that's a massive difference now one of the other things you notice looking at the these new models is that they have not increased the max context window so the maximum number of tokens that you can feed into the model that makes a lot of sense with embedding models because what you're trying to do with embeddings is trying to compress the meaning of some text into a single point and if you have a larger chunk of text there's usually many meanings within that text so going large and trying to compress into a single point doesn't you know those two things don't really go together because that large text can have many meanings so it always makes sense to use smaller chunks and clearly opening eye of are aware of that they're not increasing the maximum number of tokens that you can embed with these models now the other thing which is maybe not as clear to me is that they have not trained on more recent data the knowledge date cut off is still September 20121 which is a fair while ago now and okay for embedding models maybe that isn't quite as important as it is for llms but it's still important it's good to have some context of recent events when you're trying to embed meaning so things like covid you ask a covid question these models I imagine are probably not going to perform as well as say coher embedding models which have been trained on more recent uh data nonetheless this is still very impressive and one thing which I think is probably the most impressive thing that I've seen so far is is we're now able to decide how many dimensions we'd like in our vectors now there is a tradeoff you reduce the number of Dimensions you're going to get reduced quality embeddings but what is incredibly interesting and I almost don't quite believe it yet I need I still need to test this is that they're saying that the large model Tex embedding three large you can cut it down from 372 diam Dimensions which is larger than the previous models you can cut that down to 256 dimensions and so outperform order 002 which is a536 dimension embedding model compressing all of that performance into 256 floating Point numbers is insane so I'm going to I'm going to test that not right now but I'm going to test that and just prove to myself that that is possible I'm a little bit skeptical but if so incredible okay so with that out the way let's jump into how we might use this new model okay so jumping right into it we have this notebook I'm going to share with you a link either in the description I will try and get a link added to the video as well and first I'm going to do download data set well pip install first then I'm going to download data set okay so I'm using this AI archive I've used it a million times before uh but it is a good data set for testing going to remove all of the columns I don't care about I'm going to keep just ID text metadata okay typical format then I'm going to initialize or I'm going to take my open a API key okay so that's platform. open.com if you need one and I'm going to put in here and then this is how you create your new embeddings okay exactly the same as what you did before you just change the model ID now okay and we'll see those in the moment as well so that is our embedding function then we jump down we're going to initialize connection to Pyon serverless so you get $100 free credit and you can create multiple indices which is what we need because I want to test multiple models here with different dimensionalities so that's why I'm using serverless alongside all the other benefits that you get from it as well now taking a look at this the these are the models we're going to take a look at using the default dimensions for now we will try the others pretty soon so we have the original model well kind of original the you know V2 of embedding from open AI so this is the one they released in December 2022 the dimensionality there 15 36 most of us will be very familiar with that number by now now the small model uses the same dimensionality and you can also decrease this down to 512 okay nice nice little cool thing you can do there the other embedding model so the large one the one with the like insane performance gains is this one so three large higher dimensity that means they can you pack more meaning into that single Vector so makes sense that this is more performance uh but what is very cool is that you can compress this down to 25 Six Dimensions and apparently still outperform this model here and I mean that is 100% unheard of within like vector embeddings like two five six dimensions and getting this level of performance is insane let's see I you know I don't know maybe I mean they say it's true so uh then I'm going to kind of go through I'm going to throw I'm going to create three different indexes one for each one of the models okay and then what I'm going to do is just index everything now it takes a little bit of time to index everything but we can see know while I'm waiting for that we can have a quick look at how long this is taking because this is also something to consider when you're you know choosing embedding models and you know looking at these so straight away one the apis right now are I think pretty slow because everything has just been released so I expect during normal times this number will probably be smaller so for 002 I'm getting 15 and a half minutes to embed everything okay it's to embed and then throw everything into Pine going slightly slower for the small model which okay probably maybe hasn't been as optimized as 002 and also maybe more people using this right now but generally it's I mean pretty comparable speed there as we might expect embedding through large is definitely slower okay so right now we're on on track for about 24 minutes for that whole thing to embed so yeah definitely slower that also means your embedding latency is going to be slower so I mean you kind of look at this okay this is 2 seconds uh this is including like your network latency and everything thing and also you know going to Pine Cone as well so you have multiple things there it's not a 100% fair comparison but then this one is almost two seconds slower maybe make like a 1.5 second slower for a single iteration okay so this one is definitely slower it will clearly slow down if you're using rag or something like that is going to slow down that process a little bit probably not that much compared to you know the LM gener ntion component but still something to consider so I'm going to wait for this finish and Skip ahead to when it has okay so we are done and we now have okay it's like 20 just about 24 minutes for that final model so I've created this function it's just going to go through and basically return documents for it so let's try it with 002 and see what we get so keep talking about red teaming for llama 2 what do we get we got okay red teaming chat GPT not no not quite there let's try with the new small model okay cool let's see do we mention l two in here no no l 2 so also not quite there this was a pretty hard one not I haven't seen a mod get this one yet so let's see we're starting with a hard question okay let's see let's see what we have here okay so it's talking about R te exercises this and this but I don't see llama 2 no nothing in there so okay maybe that question is too hard for any model apparently so let's try all right let's just go with you can tell me why I might want to use LL 2 why would I want to use llama 2 now the models usually can get relevant results here so yeah straight away this one you can see L 2 scales up to this it's helpfulness and safety is pretty good per better than existing over Source models okay cool good that is uh you know I would hope they can get this one as OD Z2 can okay same result I think it's probably the most relevant or one of the most relevant so let's see let me see uh so what I want to use and then here we get so this is a large modeling us is it the same oh no same result okay cool that's fine let's try another question okay so let's try where we're comparing llama to gbt 4 and just see how many of these maners should get either gbt 4 in there or llama so okay this is are okay you know that's like four of Five results seem relevant are they actually are they talking about see they're talking about GPT 4 as well and yeah you can see GPT 4 in here don't actually see GT4 in here see gptj oh okay no no no so effect no of instruction tuning using GT4 but not necessarily comparing to GT4 okay this one I don't see them talking about llama or so okay these two here not relevant this one compar chat box instruction tuning of llama llama GT4 out forms this one this one but there still a gap okay so there's a comparison there fine here okay so that's a llama fine tuned on jt4 instructions or outputs but there is a comparison and again okay there's a comparison right so there's like three results there that are compared accurate for the small model Let's see we compare these okay relevant I would say this one interesting second one not relevant third one all chat BS against GPT 4 comparisons run by a reward mode indicates that all chat boots are compared against okay yeah yeah that's relevant two out of three here I don't see anything where it's comparing to GPT 4 so I think that's a that's a no so it's two out of four now okay and then here there's you know talking kind of like about the comparisons so three out of five but then the other model was slightly oh it was the same okay now let's go with the best model we expect to see more L and I think I do so this one has l in four of those answers We compare okay we're comparing this one no so look this one okay they're comparing so that's accurate this one okay here comparing again and then this final one here we have okay uh do we have gpg 4 here I think so they have like B chart GPT GPT 4 and then they have some I mean this is a table it's you know it's kind of hard to understand but it seems like okay that is actually a comparison as well so that one okay this one it got four out of five that's the best performing one okay that's good that kind of that that correlates with what we would expect cool okay those are new Elling models from open AI I think it's kind of hard to see the performance difference there I mean you can see a little bit maybe with the large model but given the performance differences we saw at the start in that table at least on multilingual there's a massive leap up which is insane I'm looking forward to trying the the very small dimensionality and just comparing that to 002 I think that is very impressive definitely try that soon but for now looks pretty cool definitely want to try the other models as well that opening I have released there are a few so for now I'm going to leave it there I hope all this has been interesting and useful so thank you very much for watching and I'll will see you again in the next one bye
Info
Channel: James Briggs
Views: 27,457
Rating: undefined out of 5
Keywords: python, machine learning, artificial intelligence, natural language processing, nlp, semantic search, similarity search, vector similarity search, vector search, text-embedding-ada-002, ada 002, openai ada, openai embeddings, new openai embeddings, new openai, text-embedding-3-small, text-embedding-3-large, openai embed 3, openai embedding 3, openai embed models, new ada 002, openai text embeddings v3, how to use new openai embed model, new openai models, james briggs, pinecone
Id: cUyw5eG-VtM
Channel Id: undefined
Length: 16min 31sec (991 seconds)
Published: Fri Jan 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.