Building a RAG System With Google Gemma, Hugging Face and MongoDB

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to this video we are be talking about the new Google open model I will also be showing you how you can use it um within a rag system just leveraging huging face as well so I'm not going to waste any time I'm going to get straight into it but for those of you who want to just maybe click on article and just read you can go over here this video is in article format and it's going to be in the link in the comments and in the description as well basically within seven steps I show you how you can leverage Gemma and hogin face um open source model and M Tob as your vector database to build a rag system and I'll be showing you the stepbystep process in this video as well who also there is also a Gil repository where you can download The Notebook I'll be presenting it's in this repository and don't forget to start and watch the repository cuz because I'll be upload I'll be updating it um regularly so we can get started and let's go over to the news this was yesterday Google released this models and Gemma are open models specifically they released two versions of it two variants actually Four variants of it we have the we have Gemma with two billion parameters and the other with seven billion parameters and we can get the instruction version of the models as well and you can access the model directly from hogging phase which is quite cool and we're going to be leveraging hogging phase to get the instruction version of the model um into our development environment and use it within our rag pipeline as the base model so going to go straight into the code so what we're going to be building today we're going to be building a very simple rag system that is going to be acting as a movie recommender we're going to use mongod DB as the vector database and that's your operational database we're also going to be using um hogin face to access models um open source models we're going to be using the GTE large embedding model we're also going to be using the gemr 2B instruction model as well and this is essentially going to beos we're going to be starting with installing a few libraries um data sets to get the data set we're going to be using for this uh for this demo and pandas to convert the data set into data frames and conduct some data structure manipulation and processing then Pongo to establish your connection to our database and access our collection and we could um conduct some database operations using Pongo then sentence Transformer is a hog andface library that will give us access to the GTE embedded model then we'll also be using Transformers to access the Gemma models specifically and one thing to know if if you're having problem with the Gemma mod models one of the solution to some of the issues that have been happening is just to upgrade the version of the of the Transformer Library you have on your machine I will also be using a GPU I'm using the a100 and it's got a high RAM and um I'll be using hogging face accelerate which just allows you to run the models and conduct inference using the hardware accelerator on your machine so as usual we're going to start with loading our data set into to our development environment um so we're going to be using the embedding movies data set which you can access at this link this is a data set with a bunch of movie records that contains the genre contains the director and most importantly contains the Full plot of the movie as well one thing to note is that with this data set we already have an embedding so this data set does come with some embeddings as well this was generated using open AI our 002 embedded model so this is if you don't want to spend any any time or money creating your own embedding you could just use this data set to conduct some experimental um experimental pipelines you want to build but we're not going to be doing that we're going to be creating new embeddings using the full plot um and in this line here I'm going to be dropping any empty um any empty movie records without any plots I'm also going to be dropping the plot embedding columns CU we're not going to use that we're going to generate a new column which means we're going to have every data every data point within our data set is going to have some new embedding so this is one of the key part here we're going to be using the centus Transformer um module using the Constructor to access the gpte the the GTE Lodge embedded models so um you can access a bunch of huging uh embedding models from hogging face so just to quickly show you um this is a leaderboard of the embedded models but I'm interested in the retrieval ranking and we can see here we have yeah we can see here we have the GTE large ranked at number 19 which is not too bad at all for an open source model um and I think we have the open AI models R around up six so ranked at six so yeah which is not too bad as well so 5544 on average and with the GT larger we are 52.2 two there are awful variants of the GTE models and the GTE models are embedded models um which were trained on English text and created by a research group over at Alibaba so you can access those embedding models as well and the variant on hugging face if you go back to the code what I'm did in this line is just loading the embedding model and I'm creating a function that takes in uh an input text and I'm checking if it's empty and if it's not empty then we can encode the model and get um the numerical representation of the model assigned to the variable embedding then we convert it into a a python list because I think this comes as a nonp array next we're going to call this um apply this function the get embedding function when the with defined up here to every single data point within the data data set most specifically we're going to pass the full plots which contains the the text that we're going to be generating embedding from so when when this process is done this should take a few um a few minutes or so when this process is done what we're going to have is some new embedding um a new embedding column and different embeddings for all the data points next we can connect to our DB database and I could quickly show you what you can do here within the step you actually need to have a mongod DB database set up and your collection set up and the steps to achieve this are very straightforward so we can start with going over to mongodb Let's increase this so once you sign into mongodb just register um with a free for your free account um you just create you click the create button and that will allow you to create a new cluster you can create a shed um a free shed claar which gives you the ability to create a database without having to pay for it um or if you want you create a dedicated closser as well but three is going to be enough with this tutorial um and you can change all the settings of your of your free cluster as well if you do want more storage you can just go on upper level on the tier and you can um increase your storage capacity for your data put cluster and no additional settings needed and you can just click create your cluster I'm not going to be doing that I already C created mine and once you get into your cluster um you can browse your collection and what you're going to what you want to do is create your new database and in this case we're going to call R1 movies and we're going to call um The Collection movies collection I think I believe I called it movies collection too but you can give it whatever name you want but once that's done you're going to be able to come into you see the creat a database here and in here you can see your movies collection too I've already gone through the steps so my data my collection is already populated with some data um but this is an important step here we need to create our our Vector search index because what we're doing is we're going to be conducting a semantic search that's going to use some embedding to basically get similar embeddings within our from records within our database so to create a vector search index you just create you just click the button create search index and then you create an atlas Vector search index with a Json editor and with going to we're going to name our index just stick to the name I'm going to stick with Vector index here um but one thing you have to do is definitely assign it to your desired collection and the type is going to be vector and the path remember is going to be embedding this correspon to the path that the embedding and stored within the the records within the documents in our database and the number of dimensions for this case is going to be 1024 um with the GTE large embedded model the DI mentions it creates the embedding in is 1024 and I believe the GTE base it's it's uh 784 and the GT Smalls around 300 so if you have limited um database storage capacity you can always go for the smaller embedding model to to to create your embedding vectors and the similarity function we're going to be using this cosine so once this is done you can click next but it's not going to let me because I've already done the step before this will be green once all the fuels are completed and you can click next and that will basically take you to to where your vector index has been created and is active so those that's very quickly you can create a a database using mongodb Create your collection and create your vector search index so let's go back to the code so here here is going to make more sense so now we're going to make a we're going to actually connect to our to our Mong M ODB database from your mongodb database you can get a connection URI which you're going to place within um which you're actually going to place within your environment variables within your Development Area so I'm using the Google collab secrets and within the secrets I have a key called URI and the value is my connection string which you can get from DB Atlas instance we're going to get our URI from the secrets then if we don't have a URI in the environment variables we're just going to have a print statement once we have a URI we're going to call that function we defined our pair which creates a p client which is essentially is a connection to our database and with this we have a connection to our database within this object uh client and we can access the database by name and we course access it by the by The Collection that we actually named named that we named earlier so this object is a reference to our collection and we have a successful connection and one thing I like to do is I just need I just like to clean out the collection because I do have some records there because I've run this several times so I'll just clean out the records and make sure I'm starting with an empty collection next we are going to convert the the data set that we have earlier we're going to convert it into a list of dictionary where each each data rle in that data frame is going to be a a single record so we're going to convert that and assign it to a variable called documents and then insert that in batch into the collection using the insert menu method available from the database client we created earlier ingestion is complete it takes a few seconds it's very quick we can get to to the actual Vector search semantic search functionality that we're trying to to build here we have a function called Vector search and this function takes in a user query which would be where the input comes in from the user and the collection this user query is going to be um is going to be converted to an embedding so we're going to use the same function we use earlier to generate the embeddings for our data set we're going to use that for the user query that that we receive so we assign it to this variable query embedding so we're going to define a pipeline that has two stages the first stage conducts the vector search the second stage actually projects out um any information we want and then we can exclude and include some information so the first stage is a vector search um is where the vector search actually happens using that query embedding that we have up here and then we're going to be we're going to be defining the path of where the embeddings for each of the record is located which is in the embedding field we're going to consider 150 um 150 embedding but we're only going to limit the result to four to be returned by the database and once we have that once the stage is completed we can then project out the fuse we don't want we don't want the ID filled you can also project out the embedding field just to reduce the amount of of of um data return from from the database if you're not using the embedding for anything else in in Downstream processes we also have a vector SE score just see the similarities SC return from the database as well what we're going to do is execute this this pipeline that we've defined our pair using the aggregate call and what this does is it executes multiple stages in within a pipeline and then we convert our results into a list and we return that another function we're going to be defining is the get search result so this is going to be using that Vector set query which is earlier we're going to pass in the query from the user and pass in the collection we want to get the we want to conduct the vector search on so we're going to create an empty string called search result and this line here just formats what is returned from the vector surge I only want to get the title and the full plot from from the actual results and this is where tools like P pantic would be useful actually I I would might do a tutorial on that as well um just to show how you can use pantic to handle like data validation within rag systems um but anyway let's get back to to this particular rag system so in the next line we have the key query what is the best romantic movie to watch and why I don't know why I chose this query maybe because it's February Valentine's just went so yeah I don't know why but um we're going to be using this query to get some um movie recommendation from our records we're going to be using that get search results and this is going to get us additional context with the query one thing I do do is I create another sort of string of the query and the source information combined and this is what I'm going to put to Gemma so just for the print statement Just sh you the print statement of of our query which is uh what is the best romantic movie and why and we're going to be using these as these were the the The Returned records from our Vector search and we're going to be using this I'm going to telling Gemma to recommend um one of this movies so to access Gemma is with loging face it's very very simple in in less than two line you have access to one of the most powerful open models out there so we're going to be using the a autot tokenizer from from from hugging face we also going to be using the Tim model for casual llm from huging face to access um to access gemma's functionality and one thing I am doing here is I'm going to be using um I'm going to be using the GPU as well so you if you're using the GPU you have to you have to specify this argument device map Auto if not then you can uncomment this line here and comment this line but this gives us access to the tokenizer um PMA which converts text into a sequence of numerical um of numerical values um which we can we can also decode them as well so using the tokenizer we're going to pass in that query plus the search results and we're actually going to and this Returns the tens up we're going to put this tens up into the GPU and we're going to use the GPU on this Google collab environment then very simple we're just going to call model. generate and unpack the input the tensor object and this we're going to set the max new tokens to 500 um and this essentially we allow Gemma to generate um a decent amount of uh response to the query that we're given and very simple we have the response in this variable uh model generate will will return a response using a GPU It's relatively quickly if you're using a CPU it might take a bit of time maybe uh a minute or so um but once you get your response you can decode that using the tokenizer and see what the text representation of that of your response is and for me um we can we can see what we have here is the the response is uh based on the search result it's selected shut up and kiss me I've never watched that before probably never will but that is a very simple step on how to use Gemma within your rag system using um hogging face to access the model and mongodb as your vector database there is more things we can cover such as reranking um what chunking strategy can we use and I mentioned using pantic for some data validation as well um so there's a lot going on in this space and um I'm I'm going to be here to try and keep up with it and create this videos an article and and cod as quickly as I can for you guys so um see you in the next video thanks for watching oh wait don't forget to subscribe like And subscribe as well
Info
Channel: Richmond Alake
Views: 2,069
Rating: undefined out of 5
Keywords:
Id: BNUpRW-Dk90
Channel Id: undefined
Length: 19min 40sec (1180 seconds)
Published: Thu Feb 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.