Generate LLM Embeddings On Your Local Machine

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what is going on guys welcome back in this video today we're going to learn how to use large language models locally to create embeddings and to store these embeddings in a vector store in order to be able to do similarity search for example for recommender systems so let us get right into [Music] it all right so we're going to learn how to use large language models locally to create embeddings and embeddings are basically just representations in v VOR space of some given data so for example you can have a collection of news articles or titles of news articles and you can embed them into Vector space meaning that one title is then represented as a vector of a certain size all the vectors have the same size and all these vectors can then be stored in a vector database or a vector store and then you can perform similarity search to find the most similar uh articles or article titles given a new article title which can be very useful for recommender systems because you cannot just do that with Tex Text data that is naturally text Data you can also do that with uh everything that you can somehow put into text form so for example you might have different items in a store you can represent those items as a text prompt where you say this is the title of the item this is the price this is the category and stuff like that and everything that you can somehow put into text you can then embed into Vector space so the idea would be in a recommender system you have a new item you say I like this item and then you get the five most similar items based on the vector uh embeddings and of course the intelligent part here is to find good embeddings to find embedding embeddings that when you do a similarity search in Vector space you find actually similar stuff and the whole intelligence is already contained in large language models that's the basic idea here now to do that today we're going to need a couple of things first of all you're going to need ol Lama on your system I have a video on oama already it's basically uh a very convenient way to just run large language models locally the installation is quite easy you can go to the GitHub page and I think where's the installation part uh install there you go so on windows currently you have to use the windows subsystem for Linux on Linux basically this is the one command that you have to run and on Mac you have to download it it's a very simple installation process and once you have that you basically have this thing running on Local Host when you start it so what you have to do is you have to open up your terminal I think the first time you run it you have to do ol llama surf then it's running and then you can just do o Lama run and for example llama 2 or mistol or something like that um and then it's going to download the model if you don't have it and then you can use it as an as a normal uh llm here in the command line you can ask it questions but this is not what we're going to do here we're going to send a request to the API uh from Python and for this we're going to need a couple of packages first of all we're going to need obviously requests we're going to need numpy and we're going to need face now face is the vector store that I choose here you can also choose a different one if you know how to use it this is just some Vector database again there are multiple out there you don't have to use face I'm going to use face for this video today because it's quite simple so we start by saying import requests and import numpy SNP and then also import Face of course and then we're also going to or we're not going to report anything else but we're going to send a request to the embedding API to see how it works so the basic idea is the following you send a request a post request to a certain endpoint and the endpoint is running locally on your system so HTTP colon localhost and it's running on the port by default 11434 and now in particular we want to Target the embeddings API so we need to say SL API embeddings because we're not just asking for a response to a prompt we're actually asking for embeddings uh and this can be done like this and what we're going to Center is a Json object so a dictionary and this dictionary is going to contain two Fields first of all the model that we want to use I'm just going to go with llama 2 again you can also use mistl or all the other models that are available with o Lama um and then we're going to have the prompt and the prompt is the actual data that we want to embed again everything that can be represented as text can be put in here so it doesn't have to be actual text Data like a news article it can also be something like uh a sequence of user interactions in a session and if you represent it as text like action one action two action three and so on it can also be used as a prompt uh you just have to keep it consist consistent to be able to um to generate meaningful embeddings here so let's just go with a Hello World prom just to see how it works nothing too fancy here we're going to embed the word the text hello world into uh Vector space how do we do that uh basically by just running this code and then in particular we're going to get from the response the J an object and we want to get the embedding and this embedding is going to just be a vector so I run this and you can see it's a vector just contains uh numbers and I think the dimension is 4,996 yeah this is the dimension of the vectors that we get from llama 2 at least all right so this is the process that we use so we can Define d the dimension to be 4,096 can then copy this down here and what we want to do now is we want to get a collection of titles for example so I have uh titles of news articles and of course you can if you don't want to generate your own titles here go to chat GPT and ask it to give you a list of a 100 different titles that you can play around with I'm just going to write some myself so mindblowing AI um revolutionizes um field of biology or something like this or actually I'm going to copy what I already had here because I don't think that there's much value in me typing this out now while I'm recording so I'm going to just paste it down here um like this so this is what I did in my prepared code basically just five items here so mind-blowing new AI model revolutionizes biology Silicon Valley startup finds a new way to predict protein folding team XYZ wins World Cup in rugby I don't even know if there is a World Cup in rugby uh Biology one of the most underrated college majors and machine learning algorithm changes Neuroscience forever so you can see we have four things with Biology here or related to biology just to keep it somewhat similar one with sports uh and the idea now is to embed all of these uh titles into Vector space to have a um a collection of already embedded titles and then when we get a new title we can compare it to the embeddings of the already existing titles uh and we can find the most similar articles um yeah so what we want to do is we want to create first of all an index so basically a vector database index we created it by saying pH index uh flat L2 um and the dimension of the vectors is going to be D which is 4,096 um now what we need to do is we need to create an empty array so NP or actually not mty but full of zeros we're going to say NP Zer and we want to have length titles so how many titles do we have um five titles in this case and the dimension is 496 so this is the shape and the data type is going to be float 32 all right so this is our empty array now full of zeros what we want to do here is we want to say for I and then title in enumerate titles we want to take this we want to embed the titles so instead of saying hello world here we're going to embed the actual title um and then we're going to say the embedding itself is going to be rest Json embedding and we're going to replace that in X so we're going to say x at position I is going to be NP array of this embedding and we do that so that in the end we can just add all of this to the store to the to the index so we're going to say index. add X that's basically it so this should run without any problems there you go and now what we can do is we can get a new prompt so we can get a new title um for example we can say new prompt equals and then I can just go with uh recent progress in AI shows potential for manipulating brain chemistry for example um that's just a new prompt now and of course what we're going to do now is we're going to get this or actually we're going to get this here and we're going to prompt new prompt here we're going to get the embedding and with this embedding now what we're going to do is we're going to say d i d being the distances I being the indices of the of the neighbors and what we want to do is want to say index uh or actually we need to say uh embedding is equal to NP array otherwise we cannot use that and the D type is important that we specify manually float 32 so we want to say index. search embedding and we want to get to five nearest neighbors so all of them basically in the correct order and then we're going to just print NP array titles and we want to get the indices so I flatten that is it not enough values to unpack what's the problem here where are we actually index search embedding five uh did I forget something oh yeah sorry we need to put this into a list no still doesn't work uh let me just oh sorry we need to put it into a list but we need to do it here there you go so our prompt here recent progress in AI shows potential for manipulating brain chemistry is the most similar to novel machine learning algorithm changes Neuroscience forever of course because Neuroscience brain chemistry is s uh similar makes a lot of sense you can see that this one the sports one is the least similar now let's go ahead and change this to some stuff like fighter wins UFC title or actually wins is uh contained here so let's use a different word gets UFC title or something like this when I run this you can see that this is the most similar one even though you can see that there is no word that actually um in any way is the same so we have fighter gets UFC title and then we have Team XYZ wins World Cup and rugby but it recognizes of course that this is a similar topic so it matches it you can see that there's quite some intelligence involved here it doesn't just focus on similar words it focuses on similar content similar meaning um yeah maybe we can do something like California maybe we're going to get Silicon Valley so California seems to be on the rise for something like this whether that's true or not it's up to you but let's go and say oh actually this interesting it now tells me that this is the least fitting okay I wouldn't expect that California maybe companies on the rise something like this okay still I don't know why um okay this is a bit surprising but what what you would want to do here is you would want to fill up your database with lots and lots of whatever you're trying to to recommend so for example a lot of items from a store a lot of user interactions of a session database a lot of um news articles not just the titles but the full articles whatever you want to embed just make sure you have a lot of embeddings and then go ahead and um try to find you know try to build a recommendation system around it so for example in the case of news articles you could just recommend the most similar articles to the ones you liked or the most similar items so instead of having titles here you could say something like uh item name is I don't know toothbrush XYZ or something and then you could say price um $20 or actually I think you say dollar2 and then you can say uh something like category uh hygiene um and and so on you can just do it like that and you can use the same structure for all the different items and then you're going to hopefully uh get similar uh similar items recommended when you when you have a good item that you like you can go and search your large database full of items so yeah this is how you can use your llm locally to do or to generate embeddings so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you on the next video and bye

Info

Channel: NeuralNine

Views: 7,478

Rating: undefined out of 5

Keywords: llama 2, llama, llm, ollama, generate embeddings, llm embeddings, python

Id: 8L3tGcYc774

Channel Id: undefined

Length: 13min 53sec (833 seconds)

Published: Sat Jan 13 2024