LangChain in Production - Microservice Architecture (incl. FastAPI and Docker)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi in this video I want to talk about using Lang chain in production most of the videos you will find out there will be about the library itself but not what it takes to create a production ready application this is what a typical engine application looks like at least as far as I have seen many examples use a front-end Library like flask or streamlit that holds a conversation with memory and uses some kind of vector database to retrieve documents often you see high-level chains like this for example the conversational retrieval q a chain which directly takes in a retriever and also memory the whole application is running on a single server and for prototyping this is perfectly fine but for an application that many users may use and you are working on a team for example this might not be the greatest solution an established standard for Enterprise applications is the microservice architecture you take the logical Parts out of a monolithical application and create single applications for each part for a chatbot application that might be a front-end a service for storing and holding conversations a service for making requests to open Ai and sending that back and also you don't want to store everything in memory which might blow up your service at some point so you want to store your conversations in a database a very good solution with this is redis it's very performant and easy to use if you've got your own knowledge base you will want to use another database for your embeddings I like PG Vector but of course you can also use other databases like Pinecone or bv8 so now you've got five different services but they don't talk to each other yet the data flow looks as follows you got your front end you type something in the UI and then make a post request to service 2. service 2 retrieves and stores the conversation and makes another request to service three service three makes a request to open my eye and Returns the answer back to service 2. service 2 now stores the updated conversation and sends back the whole new conversation to the front end this is the data flow and very important the front end should never communicate with the database directly okay now we've cut five services but that's really hard to deploy isn't it no modern Services uses container most of the time it will be Docker so each of the service gets its own operating system resources and environment when working with containers you also want some kind of orchestrator which starts stops and maintains the applications in a healthy State most Enterprises use kubernetes to do this locally or just for small applications you can all do your stock or composite orchestrator since it's easier to set up and use and it also runs on a single server OK that's how I would approach the architecture of a modern chat application OK after the architecture I also want to take a look at the code perspective and I've also got some rule of thumb for working with link chain the first one if you leave the prototyping stage don't use memory but always use some kind of database to store conversations Second Use low level chains and control everything by yourself instead of using high level chains like a conversational retrieval q a chain which is hard to customize and it's kinda hard to change the interface if you want to one example you can't integrate the conversational retrieval chain with radius the third point is to use a vector store that allows hybrid search but what is hybrid search and why would you use it let's say we want to build a chatbot for an automotive manufacturer and the manufacturer has got multiple handbooks for different cars so if a user now asks what happens if I press the red button the user might get too much information or maybe just the wrong information so with hybrid search you can now pre-filter the data set for example if you know that the user bought Model A then you can filter for Model A and get back the handbook for the model the user actually bought so if you want to scale your application always think about hybrid search okay I'm now at vs code and I want to walk you through the code in high level you don't have to understand everything bit by bit but just get the idea behind it so we've got multiple folders here and the first one I want to show you is the front-end application this is a react application so if you enter it here we can see everything is built in the app.jsx test or is of course not a perfectly valid setup because normally we want to split the application multiple component but since it's a small application I think this is fine and the most important part here is actually happening here so if you want to press a button we've got this function handle submit and this creates the conversation and makes a post request with the fetch API to Service Tool it will create a conversation ID and pass it to this service to endpoint so as you can see we make a post request and we post it as application Json and as the body we post the whole conversation which has this kind of style this is a an object here with a roll and the content is the way the openai API expected so now after getting the data from the post request here we're gonna work with it inside the Service Tool here we've got our app.pi and in the app.pi we first will create a connection to redis which we set up in our other Docker container and we've got two endpoints one post endpoint service two and it takes the conversation ID as path parameter so we've got the get endpoint and we've got a post endpoint the get endpoint is to retrieve the initial conversation and send it back to the front end and this takes the conversation and tries to retrieve it here we make a request to Reddit and try to get the conversation by the conversation ID so if it exists we will convert it from Json to a python dictionary and then append a system message here and at the end we will send the whole conversation to another service so we make another post request here and we wait what we get back from the response here so so service 3 will make a request to Opi and then back the new reply from of my eye so this will be stored in the assistant message and this will be appended with role assistant and content to the whole conversation we will store it with the set method with the current conversation idea and store it now as updated version here in redis and now at the end we will just return the conversation but let's look at this what happens actually here so we take a look at server 3 and here we can see we've got another endpoint and this is done now with the length chain here we can see we import link chain at the top and we also open some database interfaces like PG vector and we also import AI message human message and system message and most of you who already worked with link chain know this and we've got a prompt template to actually give the bot some kind of identity we get a connection string to the vector database cycle pg2 here and this is postgres PG Vector is just an extension of postgres we store it in the collection name called vectodpa and then we set up a contract for service 2 and service three so this contract or models how it's called in pedantic tells the API which way the body should be structured to actually make it work so now here we take the store make it a retriever and now we've got our setup here we've got our prompt template here we say as an effect keyboard for a restaurant you have the following information for a restaurant if we give it the context the context is the information which we retrieve from the vector store then we create a system message prompt and here we also Define some helper functions to actually make it look nice in the final prompt okay now let's look at the actual endpoint we can see we've got also this conversation ID this gets passed from the front end to the second service to the third service and now we get the latest message here this is the new question which was typed in in the front end and we want to retrieve the relevant documents from the vector store and these are the docs we format the docs so we don't get the class names but we just want to have text so we retrieve it here and as you can see here we get the meta metadata and just join it in a new line so we only get the text back okay at the end we will just create the final prompt this is the prompt we created here with our system message and we take in the context here and then we send it here as whole conversation to the chat endpoint we just use the chat functionality here so in this application we use chat of my eye it takes a list of messages and we get back a response here and and we send back the content of the result which will be the new AI message and again the result of the three will be used in service 2. this is the result of service 3 and now we send everything back to the front end okay this is the logic and now we want to create our Docker files to make a container we just use a base image here Python 3.10 and no install or dependencies and use Ubi according as entry point for our application so we run it here on Port 80 we map the port outside to Port 8000 and for the other application on pod 5000 so internally in the container we spot ADM but outside we use port 5000. so this will be done for every application we have three Docker files for the front end and we've got also two Docker files for the backend and for the databases we don't need a single Docker file because we just download it here from Docker Hub so this is actually another Advantage because we don't have to install redis and postgres on our local machine but we can just download it from docup and use directly so this is the way it looks like this is our orchestrator to run the applications on your own machine if you have Docker you can just type Docker compose up minus minus build this will download now the images from Docker Hub and also build the images with the local store Docker files and if everything works for you it might take a little bit longer because you actually have to download the images first you won't have it in the cache but if everything is started up correctly so we've got no errors here you can just visit localhost Port 3000 where the front end is running and take a look if the chatbot works as expected okay so before we can actually use the database we first have to run a function here and this is called python insert data.pi what is happening here we first create an instance of the openware embeddings and loop through all of the directories via the directory loader reload the documents in this FAQ folder make a a splitting with a chunk size of 1000 we can make make it a little bit smaller let's say 250 and then we store the documents in the vector store so this is be done with this function and now we can see this is done three of three files and now we can take a look here and we should now be able to ask when does the Resto front open on Sunday okay we get an answer not sure if this is 100 correct I guess on Sunday it might not be correct but in general it's more about the microservice architecture in general so if you want to improve it you may take a little bit more time to improve the prompt but in general as you can see this works we've got five microservices all are in isolated containers and orchestrated to your Docker compose okay that's it let me know in the comments what you think about this architecture and if you liked the video feel free to subscribe to my channel and like the video of course
Info
Channel: Coding Crashcourses
Views: 7,572
Rating: undefined out of 5
Keywords: fastapi, langchain, restapi, production, docker, rest, openai
Id: I_4jEnDwGwI
Channel Id: undefined
Length: 12min 3sec (723 seconds)
Published: Fri Jul 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.