LangChain Streaming - stream, astream, astream_events API & FastAPI Integration

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone today I will show you how to perform data streaming with a link chain streaming is critical in making applications based on llms feel responsive to end users we will explore stream and a stream the default implementations of streaming and also a stream events a new method that allows to stream intermediate steps and the final output from a chain we will also explore how to use streaming with L chain in combination with fast API and stream data from a fast API back end to a front end okay I'm currently in V code and you can also find the link to this repository in the description on the left you can see multiple files we will start with the basics IPython notebook and I will walk you through the streaming interface of Leng chain so first you have to make sure that you actually installed Leng chain I use Len chain open I2 because I use the OPI model this is a model that supports streaming but of course you can also use any other model that supports streaming when you create your model you have to set streaming to True otherwise the response will not be streamed so first we import our environment variables so this end file includes the OPI API key and then we create our first chain our chain is based on a Model A prompt and an output parer and this is how we construct our chain with the Lang and expression language so let's first create that and then we going to explore the normal stream method so if you want to stream you have to use the stream method and if you want to see the streaming inside the print statement you have to set the flush argument to true so we we've got a single variable in our chat prom template so we want the llm to perform a joke about a parrot with 200 words the llm will create a response by predicting a new token based on the input and the former predicted tokens we will then get the response token by token or here chunk by chunk that's why we perform a for Loop here so let's not do this and now you can see that's how that streaming works and we can see that the answer gets created chunk by chunk if you want to stream in combination with an API it makes sense to use async otherwise you block the main thread and therefore make the application very slow so streaming also supports async operations and instead of stream this is called a stream and now you can perform this Asing for Chunk in chain Ace treatment uh this works similar like before but it's now an A synchronous operation and this is the preferred way to do it when you use it combination with fast API a new way to perform streaming with the link chain is using a stream events and this is the syntax so instead of just using a stream we have to use a stream events and we pass in the string again and we also have to pass in the version currently the version is V1 so I'm going to show you how that looks like you get a much more complicated object as you can see this is a dictionary which has an event the event is here on chat Model start it got a run idid it got the name of the model and some metadata and also the data input and then in the Second Step this is where actually the streaming happens the event is on chat model stream and here we can see that this is an instance of AI message chunk and here we can see the content here we can see the a here is the par parot is so this is now created token by token and this is saved inside this AI message message chunk of object So currently this object is of course far too complex to actually work with it so we have to use a different more complex syntax so first step is that we have a look at all the available events so I create an empty list and stream append the events and then create a set from that events and you can now see that we've got three events here on chat uh model stream on chat model end this is when the model is finished with creating and here on chat Model start this is when streaming starts and this is where we get our tokens so now we can perform it like this so we perform streaming and check if the event is on start model then we can print stream started if the event is on chat model stream we can access the data key of the event and then access the chunk key and this is now Ani message chunk and we can access the content property of that class so we can print it like this and now we can see the following stream started and now this gets created token by token so this is how a stream events Works a little bit more complex than the a stream method but you've got more control over what happens in the Stream now let's have a look at how this works in combination with fast API we will use a rack pipeline as many of you requested and stream the results so let's go to the app.py and here is the code so first we have to import some classes from length chain and also from Fast AP I very important from Fast API we have to import a streaming response class this is needed to actually stream the results so then we load our openi API key we create an instance of openi embeddings to embed documents so we create two very simple documents doc loves to eat pizza and provide some fake metadata then we create a vector store pass in the embedding function to create the embeddings and then set the vector store to a retriever have a standardized retrial interface okay next step is to create a template with context this is standard for retrieving the documents and then we also pass in the question so this is very standard for creating a r pipeline so this is the complete chain we pass in the context which is here the retriever or the um documents we get back from the retriever we we pass in the question as Runner with path through so we don't change it pass it to the prompt pass that to the model and pass that to an output passer so very standard and now let's create the application so we create an instance of fast API and then we create our function which will generate the streaming response so instead of now just printing the Chunk we have to do the following first we have to replace the new line with this BR tag so new line doesn't work in HTML so this is what we do in HTML instead of just printing a new line we will use this tag and this will create a new line in HTML and then we will build this content and we have to provide it like this so data double colon and then new line new line so this is because we use a server send event protocol SS and this is how the protocol expects the data to look like so we need that data double colon and Then followed two new lines at the end to indicate a complete event message to the client the next is that we create a file response where we pass the index.html which is in a static f so this is just for displaying our HTML file and this is where actually the magic happens so what we're going to do is we create the chat stream endpoint with a path parameter where we pass the message and the message is just a simple string this gets passed to this function and then we use the streaming method here and we have to provide that inside that streaming response class we have to provide the function then set the argument and have to set the media type to Tex text events stream to actually make this work and then we just have to start our application here on Port 8000 so this should not work and now we have to receive this message in the index.html so this is the important part so we create a new instance of Event Source in JavaScript and that gets passed in our chat stream endpoint and now we have this on message function and this now we receive an event and inside that event object there is a data attribute and we append that data to the inner HTML of that data container the data container is here it's when we start typing it's empty but then the inner HTML gets appended here so this is how this works and now let's actually try this out so I added also a little bit of material CSS to make that look a little bit nicer so let's go to the app.py and now just run python app.py if everything works we can see UV ion running at Port 8000 and then let's now open a browser so this is our simple chat interface and if you have a look at the application we can see these two documents so the dog loves to eat pizza so we can actually ask what does the dog like to eat and we should now stream that okay that was a little bit too simple write 200 words let's make that a little bit longer and now we can see this actually works we get streamed this token by token and it gets displayed in HTML okay so we can see this works with the asnc method a stream from L chain but we can also do this with the new events API so this is a little bit more complex again but the functionality is pretty much the same so we stream the event try to access the event and if it's on chat model stream this is where actually the tokens are streamed we then can serialize that message chunk so this is where we access the AI message chunk I showed you that in the notebook before and this is got this content attribute so this is where we access the chunk content attribute save it in the variable and then also replace the new line with this HTML new line character again we yield that event and also we can do something like this so if the chat has ended we can print that the chat model completed it response we could send an email or whatever we want if we get that at the end so let's stop that and now run python appor events. py it serves the same file so what does the dog like to eat write 200 words and now we can see this also works as before great so you now learned how to perform streaming with linkchain and even in combination with a full stack application if you like this video please like and comment it thank you very much for watching see you bye-bye

Info

Channel: Coding Crash Courses

Views: 2,365

Rating: undefined out of 5

Keywords: fastapi, langchain, streaming, llm, openai

Id: juzD9h9ewV8

Channel Id: undefined

Length: 10min 48sec (648 seconds)

Published: Mon Apr 15 2024