Personalizing LLMs: Step-by-Step with LangChain

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
when we're building machine learning pipelines for Consumer facing businesses it typically looks like the following customer or user data will be collected from a touch point for instance a web page using a JavaScript based tracking solution such as Google tech manager this data is then sent back to the data lake or data warehouse where the pre-processing transformation and modeling will take place from there the pre-processed feature data will be sent to a so-called online store like redis where it will serve as the foundation for the machine learning models in production this is what we call our business ready data reddis is typically used as the online store as it allows us to fetch the feature data with millisecond latency avoiding costly delays on the consumer facing front end now contrast this with how large language modeling pipelines and agents are currently being developed here we take a language model and we give it a tool that is the ability to call functions and retrieve any data that the language model might need to do its job then we might add retrieval capabilities that lets the agent fetch Victa data making this a rag agent we then tell the language model this is your retriever here are your tools now go do your job but this is handing off a lot of freedom and responsibility to a language model and I would argue that this is too much freedom for Consumer facing application in this video we're going to look at a different approach we're going to set up a pipeline that mimics the approach we use for machine learning that means we're going to let the language model consume Vector data and the contextual data from the same database in this case redis and the application critical information about customers is going to be injected into prompt templates where it can be utilized for prompt engineering that means we have one datab base will have lightning speed contextualization and we get to use the same infrastructure for both machine learning and language modeling in the example I'm going to show you I'll be using Shopify product data for the retriever I'll be using Lang chain and om AI for the language modeling Google bit query as our back end and I'll be using the open source feature framework Feast to build the pipeline from Google bit cery to reddish the video has four parts first I'm going to give you a brief overview of how Feast works with reddis and B after that I'm going to extract Shopify products from the API transform the data into Victor data and load that data into redis enabling vcta similarity search and making the products available for retrievers then I'm going to show you how to set up Feast with reddis as an online store and biery as the offline back end finally I'm going to give you two examples of personalized chains that could be used for marketing purposes the first one I'm going to build is a personalized load summarized chain to be used for personalizing entire emails or email phrases and the second one I'll show you is a personalized conversational chain to be used for developing personalized chatbots the first thing I'm going to do is to set up a database on r.com it only takes a minute and you can get started for free when you are signed up or logged in you just choose a subscription and then you set up a database with a few clicks and once we've set up the database we'll need the host port and password in order to connect to reddis and you get that by clicking connect it's fairly straightforward to set up but if if you want a bit more detail I'll put a link to our R's intro video below let's have a look at Feast so head over to feast. deev and take a look so Feast is a feature store or feature framework for production machine learning systems and it's widely used and open source so Feast will help you transform data from a back end like bigquery and transfer that data to an online database that's used for feeding machine learning models in production a database like reddis and I'm going to show you how to set up Feast with bit cery as a back end and reddish as the online store to install Feast you can simply run pip install Feast but if you want to follow what I'm doing here wait installing anything because we need a specific configuration of feast that allows us to connect to bit Andes let's head over to bit Cy and have a look at the data set that we need for this here I am in the Google Cloud console the bitc SQL workspace with an overview of the different data sets so here I have created data set that we're going to use to personalize the language models the code for generating this data set is also included in the collab notebook so we're going to be injecting incentive recommendations into Lang chain prom templates as this allows the language model to tailor the copy used for hooking customers based on what we think the customer might respond to in this data set we have email as the primary key we have a first name and last name the recommended incentive and two timestamps one when the record was created and one when the feature was recorded in order to create this data set and to connect fees to bitre we need a key that will give us access so I'm going to create a service account to do that so go to the hamburger menu in the upper left corner click apis and services and then credentials and in here we click create credentials and select service account we'll just give it a name so we'll remember what it's for click create and continue and we're just going to select bigquery admin as the role and then continue and that's it no need to fill out anything more just click done now to get the key we select the service account and we click Keys add key create new key and then we select the Json option this gives us the Json key needed to access bitre and to have feasted fetch the features from bitre and materialize the features into red all right so the cool thing about this is that we can do everything from a collab notebook so there's a bunch of libraries at the top here we don't need them all in this video you don't need the clavio API unless you want to try to feed the output of the language model to clavio and you don't need Shopify either because I'm going to provide you with the product data in adjacent file so you can just load the product data into a pandas data frame and then feed it to redish from there and note that I'm not pip installing Feast yet I'll do that further down in the notebook so I have my environment file uploaded with the needed API Keys as you can see on the left and I also have the Json key that we just created on gcp that allows us to connect to bit so I'll start off by connecting to the database that we set up on r.com using redish pi and we need the host port and password for that so I'll p it to see if there's a connection and as you can see there are no keys in the database so if you want to connect to Shopify which is something you need if you want to put something like this into production we have the code for getting the data fetching the product data from Shopify and transforming that data into a useful data frame so I'm using the get data function to call the sharify rest API and here use the attribute product to get the product data then transform that data into a data frame and from the data frame I'll create lists with the product titles and I'll add the rest of the product data as metadata and here you can see the product data frame that we use as a basis for this you can fetch additional fields from Shopify if you want to I'm just going to keep this light so next up I'm going to use the open AI embeddings because we need that when we want to store this data as Vector dat data in reddis and then I'm going to use the reddis connector that's built into Lang chain that allows us to write this data into redes as Vector data so I included a dictionary with a vector schema here that we're going to be passing to the method because that's what you need if you want to configure the red Vector database from Lang chain I'm using the hnsw algorithm here we could have used a flat one and you can set that in the vector schema so now we have the product data stored in RIS and we can do Vector similarity search over the products so on the next cell I've added code that you can use to delete these products again don't run this if you want to keep the products in there and as you can see we can now run a vector similarity search for added as shoes and get the top five results and it works as intended if you want to connect to this RIS database again from Lang chain without pushing products then you'll need the schema of the database so write the schema using the redist instance and the the method WR schema and save it as Braes schema. and remember to store this locally as well so that you don't lose it when your online collab notebook is reset so this was one source of data in Redd is everything we need for building the retrievers now we need the personalization data here I have the code that I used to create the data set that you saw in bigy so these are the incentive recommendations I talked about I'm not going to go into how these are created with machine in I'll save that for a later video but here I'm just simulating a data set that we can use for the generative AI part I'm using the fer library to create the names and I'm using my own domain here you don't really need that but I'm using that because I want to be able to test the personalization pipelines down the line and this will create a data frame with customer features that you can push to bigquery using pendes gbq and the Json key we downloaded from gcp and to do that you'll need to set the credentials you'll need a table ID at project ID and of course your Json key and this will simply create a new table in the personalization data set in B and if we head over to biur to the personalization data set you'll see that we now have a new table and here it is with the exact same format as the other data set in there all right so let's get to the most important part of the video setting up Feast with Big C and reddish and using that with Lang chain so we're going to pip install Feast with gcp and RIS dependencies and when you're done pip installing we'll run feast in it and the name of the repo we want to create in this case it's called Lang chain cavio and we'll tell Feast that we are going to connect to gcp and then you'll see a folder called Lang chain cavio to the left and we can open that folder and it's going to have a feature repo inside and the two files we're going to configure example repo py and featur store. if we open the featur store. file you can see that this is where we configure the connections to bigquery and to reddis and I'll show you in a bit what that looks like let's have a look at the example rep. pile this is where the features are defined and the sources of the features are defined and we're also going to configure that to be able to fetch the incentive data or the incentive features that we have in Bakery I have a feature store. DML file here on my local machine and you can see how it's set up I have the provider set to gcp I have the offline store set to bit query the data set is called personalization then we have the online store which is reddish and the connection string is simply the host with the port at at the end and then the password which is in the same string but separated by a comma from the host and the port and in the example re. piy file where we Define the features we have three things we need to set we have the entity which is the customer in this case with the join key that's going to be email in our case then we have the feature source which is bit query Source in our case the name of the feature Source the table including the project ID and data set name and then the two timestamps we had in the table and lastly we need to define a feature view that also needs a name we'll call it incentives we'll use the entity we've defined in our case customer and then we need to define the schema of the feature view we'll get the incentive we'll get the first name and the last name of the customer and of course we set the source equal to feature source which in our case was bigquery so I'm going to copy the content of these two files into the files in the folder in the Cod notebook all right so now we almost set up to transfer features from bory to reddis so I'm just going to change the directory so I'm in the feature repo folder and then I'm going to move the key I downloaded from gcp clay. Json into the same folder then I'm going to set the Google application credentials environment variable to the path of the service account key the Json key and after that we can run Feast apply which is going to apply the configurations which is made in the yaml file and in the exam repo P file and this is going to create a registry. DB file with all the configurations I'm going to show you that in a bit but first let's materialize the features and send the features from bigf Frey to reddis and we do that with Feast materialized incremental and then we pass the date and this will send the changes that has happened in the offline feature store to the online feature store so it's sending the Deltas all right so the features have been written to R let's have a look at the registry. DB file you can see in the folder menu to the left that there's a new folder called data and in here you'll find the registry DP file this can also be put on gcp I'm not going to do that here but you would do that in a production setting so if we take the redish pie instance and we check the key P you can see that we now have additional data written to the database besides the Shopify products we also have the incentive data in the same database and if you're taking this to production you probably want to use two different databases one for the features and one for the Victor data so now we can extract these features from redis and we can inject them into Lang chain prom templates to do that I'm going to import feature store from feast and then we're going to set the repo path remember we're in the feature repo right now and once we have the feature store and the repo path we can use the method get online features this will extract the features from reddish using the feature definitions and here have just hardcoded a customer name I know exists in the database you want to change that to fetch any customer data given an ID so if I run get features and extract the features for David Hill you can see that I get the name and I get the recommended incentive free shipping so next up we're going to inject this into Lang chain prom templates and create chains out of it the first example I'm going to show you is the chain we can use for email personalization so I'm going to import chat openi I'm going to import prom template and string prom template from L chain because we need to modify or customize the standard prom template then I'm going to import load summarized chain because that's the chain type we're going to use for the email personalization here we have the first template I'm going to use for email personalization and I'm going to inject the customer features into the customer data TXS and I'm going to use the get feature function we just defined to inject those features into the prom template I do that by using the string prom template that I imported from Lang chain promts to define a feast prom template and this allows us to inject the features as keyword arguments into the base prom templates so if you try to call the feature prom template with David Hill and edit as shoes as an argument you can see that we're injecting the features and then we're referencing the edit as shoes as the relevant product so I'm going to put this in a load summarized chain now and feed GPT 4 with this template and the products that we retrieved with Vector similarity search I'll feed the chain with the input documents and the email that is needed to fetch the features and if I run this you'll see GPT for which is the chat model we're using formulate an email and with this email the language model is trying to hook the customer with free shipping so it's says enjoy your favorite productss delivered straight to your doorstep without any extra charges and that's because we recommended free shipping as the incentive for David Hill and of course the language model also mentions the added as Superstars which is one of the products that we retrieved with the vector similarity search so instead of having gpc4 write an entire email we can also ask it to just write a few lines I've done that here and I've included the base prom template for that as well in The Notebook let's move on to the conversational chain the chatbot it's very similar to the load summarized chain the base template for the conversational chain will also have the customer data tag where we can inject the features the incentives in this case then we'll have a question post by a human the user and then we'll have the relevant products which is the contact that we get from the retriever then we Define a prom template that will condense the question and the chat history into a standalone question and that's basically what we need to set up the chatbot I'll build it with conversational retrieval chain I'll import async callback manager streaming callback Handler to be able to stream the output back to the user I'm also using a separate llm to condense the question we don't really need all that here I'm just showing you how to set this up with all the bells and whistles and still personalize the chain the important thing to note here is that we pass the feature prom template in a dictionary as keyword arguments to the chain so let's run this and see what we get let's first check that the feature prom template looks like we expect it to look and everything looks fine so I'm going to call the chatbot I know that we have some toddler sneakers in the product catalog so I'm going to tell the chatbot that I'm looking for some cool kits sneakers and I'll call the chat but with the query with the email of David Hill and the chat history and here we go the chatbot is giving us a response and as you can see it's referencing the Nik Toler shoes and some other kits items and it's offering free shipping on all items so the personation with feast and reddish also works for chatbots and of course you can build a lot more advanced examples if you enrich the feature store but I will leave that for a future video that's it for now if you enjoyed this video like And subscribe thanks for watching
Info
Channel: Rabbitmetrics
Views: 2,179
Rating: undefined out of 5
Keywords: large language models, langchain prompt, vectore store, machine learning, langchain chains, langchain chatbot, langchain model, personalization, personalization shopify, segmentation, customer data platform strategy, shopify, ecommerce personalization, targeting customers, langchain, redis tutorial, llm pipeline, mlops, google bigquery, feast feature store, retrieval augmented generation, langchain chatgpt, llm langchain, langchain prompt template, prompt engineering, llm
Id: IhEofeKXm7o
Channel Id: undefined
Length: 20min 4sec (1204 seconds)
Published: Thu Dec 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.