LangChain In Action: Real-World Use Case With Step-by-Step Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
at the heart of the language model Revolution and the link chain framework lies at the concept of a text embedding a text embedding is a learned representation of text that texts the form of a vector of numbers this Vector allows us to efficiently prompt and retrieve context from Vector storage to extract relevant pieces of information enhance the language model's memory and capabilities and ultimately take the action we want to take to generate value in this video we're going to have a closer look at this process by means of a real world practical application we are going to use Lang chain to extract information and value from Amazon review data one of the most slam dunk applications of length chain is customer experience Analytics I'm going to show you how you can take the unstructed review data and map the reviews into themes and a structure that allows you to act on the data I'm also going to demonstrate how the review embeddings can form the basis as inputs to other machine learning models and just how packed these reviews are with signal that can be used to further enhance the capabilities of the language models and Avenue we're going to explore in more detail going forward on this channel by the end of this video you'll have an idea of how you can generate value for businesses with link chain how you combine Vector stores with large language models and you'll have a small POC that you can further build on and put to good use to get started we'll pip install the needed libraries and we'll drop the needed API keys and variables into an environment file links to the code and the data will be available below the video Let's Go so here we have the Amazon data that we're going to use these are reviews for products in the fashion category what's important to note here there's an overall rating there is a review text with the actual review from the customer and there's an ID that allows us to tie this to a specific product and then we have a complementary data set called meta Amazon fashion with all the product information and an ID that allows us to join this with the review data set in our notebook the first thing we're going to do is we're going to import the utilities needed for loading the API keys from the environment file and also extract the Amazon data from the files and we're actually not going to select the data from the Json files as they are quite big but directly from the sipped files then I'm going to load both data sets into pandas data frames and I'm removing reviews without a review text as we need the text for the work in this video and here we have the review data in a data frame and this is the metadata also in a data frame next I'm going to truncate reviews so that we don't Process reviews that are too long and you can play around with the number of characters you want to use and then I'm going to find a product that has a good number of reviews for the sake of this video and I had a look at the data and it looks like the second one from the bottom is a good fit and if we extract that product you can see the name it's called Powerstep Pinnacle orthotic shoe insoles I guess that's fashion but this is good for our purpose so let's create the embedding vectors from these reviews so we're going to work on just a slice of the data frame with the power step Pinnacle insoles and as for embeddings I'm going to use Hawking face embeddings and this is just to show that we don't have to use open AI for this I'm going to create a new column in the data frame with the embedding vectors so I'm going to use the apply function on the data frame and a word of caution here another reason I'm not using the open AI embeddings is because of the open AI pricing model when you're running apply with an embedding function on a large data frame you risk incurring significant costs using open AI here we're only working on a slice of the original full data frame but the full data frame has more than 800 000 rows so please don't run the apply function with an open AI embedding unless you know what you're doing alright here we have the embedding vectors as a new column in the data frame and what I'm going to do now is I'm going to show you the richness of this review data and I'll do that by training a simple random Forest machine learning model with the embedding vectors as features and the overall rating as a Target I'll use scikit-learn to divide the data into a training sample and a test sample and then I'm going to import a random Forest regressor so I'm treating this as a regression problem meaning that the prediction is on a Continuum even though that we know that the rating is an integer but this is fine for demonstration and then I'm training the model on the training part of the data set and here you can also play around with the number of estimators you're using I found that 150 was fine to make this point and once the fitting is done we can evaluate by using the mean absolute error on the test part of the data set and we see that we achieve a mean absolute error of 0.53 which means that on average the prediction is off by around half a point and this is with a simple non-optimized machine learning model that takes five minutes to run if you spend some time optimizing this or building a more advanced model in pi torch you could probably get to 0.3 or below so there are significant signal in this data but why would you want to predict the rating well you wouldn't unless you're missing the rating for some of the reviews what you want to do is to switch out the target variable and use the signal in the review data to build machine learning models that will actually help you generate value and these are product recommendation models churn or retention models propensity models uplift models and so on I'm now going to load the review embeddings Into the vector database and show you how we can have gpt4 access the data and give us a summary of the reviews I'm using dbt4 as it's currently the most powerful language model we have to work with but you can switch out the language model as you see fit to upload the review embeddings We import and initiate Pinecone and then transform the truncated review text column into a list of reviews then we upload the reviews with the built-in from texts method using hacking face embeddings once the output is done we can head over to Pinecone and check that the vectors have been uploaded and here we have all the review vectors uploaded into pineco and then we can do a basic Vector similarity search to check that everything works what I want to do now is I want to add the language model access the data in the vector store so I'm going to import retrieval QA that is used to retrieve the most relevant reviews given a prompt and feed those to the language model and then I'm going to import chat open API that I'm going to use as a wrapper around gbt4 then we're going to define a chain let's call it review chain using retrieval QA that takes the language model the vector store and then the chain type as an argument and here we use chain type stuff which means that we stuff all the related data into the prompt and we use that as a context and pass it to the language model and then we simply write the query as we usually do when working with chat DBT and we run the chain with this query and here I'm asking gpt4 to give us an overall impression of the reviews and give us the most prevalent examples and bullet points and also give us suggestions for improvement and remember when you're working with chat models like dbc4 you can send system messages that will allow you to calibrate the system of the model which could significantly improve the quality of the output all right so here we are with gbtforce impression of the reviews some examples of good reviews mixed reviews some negative reviews and then based on those reviews there are some potential areas to focus on improving so pretty nice and I'm sure you can improve that by calibrating the system what's really powerful about this is that you can turn this into a weekly digest and send it out to an internal email list within a company and this can also be done with linkchain all right so the last thing we're going to have a look at is filtered Vector similarity search and what I'd like to do is to get all the reviews of a given rating that match a specific theme so we need to filter on metadata to do that I'm going to use Pinecone directly as length chain doesn't seem to support this out of the box yet to upload the data with the Pinecone python client I'm going to add a metadata field to the data frame and create two versions of the data frame one for uploading and a local version for extracting the actual reviews and I'm going to create the pine cone index directly with the Pinecone python client and while the index is initializing we can head over to the pine cone documentation and have a look at what the Absurd should look like so we can see that we need an ID field and a value field with the vector values and then we're going to add the metadata Fields used for filtering once the index has been initialized we can upload the data in patches and here I'm using patches of 50. all right now that the upload is done let's try to run a filtered query we do that by using a query string like we do without the filter and then add a filter that has a mongodb like syntax this query will give us the top 100 matching reviews with a rating of four and the result is a list of IDs with a score indicating the match rate to get the actual reviews we'll need to use the local version of the data and what I'm going to do is I'm going to wrap the query in a python function that executes a filtered Vector similarity search followed by looping through the local version of the data and extracting all the reviews from the local data now if I call this function with the query we'll purchase again filtering out the ratings with 5 Star reviews then we get all the happy customers that should be put in a repeat purchase flow and we can do something similar to get all the disappointed customers that gave a one star review which works even though I misspelled disappointed and we can use this in a win backflow being able to filter out reviews like this is very powerful as it allows you to divide reviews into themes and get the angles you need in your remarketing campaigns in the example we just saw in the notebook we have two different themes we have the disappointed customers and we have the customers that want to purchase again and if you have a customer ID and an email tied to the customer review you can use Lang change integration with sepia to create two lists a win back list and a repurchase list and then use your email service provider to run two different campaigns that's a topic for another video so that's it for now if you enjoyed this video give it a like And subscribe thanks for watching
Info
Channel: Rabbitmetrics
Views: 56,481
Rating: undefined out of 5
Keywords: langchain, langchain chatgpt, langchain own data, hugging face, gpt 4, cx analytics, langchain openai, large language models, pinecone, hugging face models, llm explained, langchain in python, langchain ai, langchain prompt, langchain agent, embeddings, vectore store, customer experience analytics, langchain tutorial, marketing analytics
Id: UO699Szp82M
Channel Id: undefined
Length: 12min 16sec (736 seconds)
Published: Sun May 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.