Build an AI Chatbot with Python (FastAPI) and Expo (React Native) with Response Streaming

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] welcome back everyone to the channel Gordon jessic here good to see you all um so in the previous video we talked about uh how we can uh communicate with the backend API that we had using fast API and kind of running all the AI related stuff in the back end and communicate with that through our react native application uh we were working with this uh uh scraper that we had in the python code and kind of asking question on the front and chatbot here and then getting answered from the back end what we did then was to just like get all the response and then uh display to the user uh all in its entirety uh but in this video we're going to actually uh talk more about streaming the response back from our python back end to our react native front end so I'm just going to do a quick demo for this uh the question that we're going to be asking is uh about some uh huging face GPT huging GPT stuff that's basically over here in this uh web uh page that we have and so how does huging GPT work explain the four step in detail this is the originally uh this is the original version that we had with just like giving back uh giving us the entire response at once and then we display it on the front end to the user now that didn't really take that much time the response was just a few lines and so it was fine you could still go with just getting the whole thing back from your back end and displayed at once uh but if you're working with a you're expecting a long text to be returned to your uh users maybe a better way is to stream it back now uh I'm going to ask the same question again but this time I have spring uh streaming in place and uh we're going to see how that works uh the only uh thing that you really want to pay attention to is that is it worth it to do streaming cuz streaming might take a little bit longer to process in the back end and then send the text to you and so uh kind of do uh your own due diligence there make sure that uh the the way that you pick to go is actually the best way and provides better user experience for your user so obviously during the whole process we're going to show a loading indicator here so the user knows what's happening but like if it takes longer to get the Stream response back then probably you go with the other just showing the whole response at once but uh I'm showing you how you can do this that decision is totally up to you which route you want to go go so I have the exact same question I'm going to set send it and then it's going to start uh giving me the response basically chunk by chunk or like board by board and that's how it worked like you saw that it was streaming to me and then uh you can see that I have the entire message over here once it's streamed so uh we're going to talk about this and uh without further Ado let's get started so as before uh the python code backend code with the AI stuff uh it's not really the focus of this video uh but I'm going to just walk you through uh some of the most important pieces in the back end how we are going from just sending the whole thing back to the front end uh to basically just sending chunk by chunk and token by token to the front and so we can stream things on the user side as well uh so this was the state of the backend code at the end of the last video and so uh we were just like invoking this chain that we had here and that was taking care of the whole thing for us uh with the new streaming process we are going to make some changes to that uh so uh what happens is that when we once we hit this uh endpoint chatbot with our post request we send the question over here then we're going to call this function generate chunks and then the response that the result that we get there we're going to scream that using this stream response from Fast API responses um library to our user to the front end and you can see this m media type of event stream that we need to add there as well and that's what we're returning uh about this generate chunks function that we have here so uh we are introducing some async iterator call back Handler here for our llm for the open AI that we had like I said I'm not going to go into details the important things here is that we are uh setting the streaming to true and then the rest is mostly the same we have added this call back here that uh we have uh Define over here uh we're still hitting that uh web page uh kind of like scraping data and then text splitter Vector store all these things are similar to what we did before and then we still have the same chain over here and the only difference is that now our llm is different so we are doing like streaming and also with the call back and now uh we're going to Define this task over here which is going to uh invoke our rag chain from here in an asyncronous matter so it's going to be an async V and then I have added this index because I'm printing out stuff so as the tokens or chunks are being streamed uh we can see them in this uh terminal over here and uh then we have this triy block over here and then we're going to go through the like the acing iteration that we do on the call back and get the tokens and uh yielding those token basically sending that to our front end and once we get it done and we're done with everything then this is going to stop and then that's mainly what we are doing to make it uh stream the results to the front end that's the streaming part in the back end so uh again this was not the focuses of this video but you will have access to the code I'll put the link to the skid hob repo and you can go check that out but uh the focus is going to be on the front end side and that's what we're going to look at now all right so I have my back end code over here the python code and then front end react native on the left side and the reason I'm doing this here is because I am printing out the tokens as they're s being sent back to uh the front end over here I'm just trying to keep count of the numbers the number of tokens being sent over the token itself and I'm also adding like a uh time the time that's being sent over just uh I'm going to do some checks at the end to make sure that the process is working fine and we'll get to that later but right now let's go back to our front end code react native and uh here when we have the handle submit function which is what is called after hit the send button we're sending this fish request to our back and this post request and uh now this uh response is going to be different cuz we are streaming back and the format is going to be totally different so this whole thing over here uh we're going to have to get rid of it and uh then uh the next thing we're going to do is this fch function actually will need to have a uh new parameter over here we uh it has to be this react native and text streaming is true so that uh it knows that we're going to be uh streaming text back and then it will uh have the proper format for the response but right now as you see typescript is uh complaining about this uh like the type request in it does not have this react native problem property so we're going to have to uh assign our own kind of like type or interface to this whole uh call back over here for ref Fetch and then what we will do is uh just Define this new interface custom request in it which extends the request in it and then we're adding that react native property with Tex streaming of Boolean so with that and we're not going to uh have that uh complain that typescript complaining anymore and now uh the format that we get the response we can get the streamed value uh through this response. body and uh we're going to just uh save the state of the code right now and see what we get back from stream uh just console log in that over here so I'm just going to use the same uh question which is uh from this uh web page that we are scraping in the back and uh so how does hugging GPT work going to send it it's processing you can see in the back end we got the message that was sent and then one by one token by token it's being sent back and we have like 82 or 83 of these uh tokens that are getting sent back so as you can see these are like text with the time line that it was being sent back uh but what we see on this side is that the console log that we had for a stream is actually undefined and this doesn't mean that like the response uh kind of like doesn't have the body uh property on it this is just a react native uh shortcoming or whatever you want to call it we basically have to uh add a poly polyfill uh to get rid of this uh uh kind of like this thing not working and for that there is this uh react native polyfill globals that we can use this package and they have a lot of explanation on different things here what's the idea motivation behind it as you can see here readable streams is one of them response body is not implemented so exactly uh why we need to add this and so what I'm going to do I'm just going to grab mpm install Library name and I'm going to do that over here in a new terminal so mpm install react native polyfill globals and then once I do that you could have done MPX Expo install as well and then can close this other uh terminal one thing the only thing that we need to do is actually going back to this website the usage section uh we are going to uh polyfill all automatically so I'm going to grab this over here and we can add it to uh our layout. TSX that is basically the uh layout for our tabs um navigation over here since we are using Expo router uh and again uh if you don't know what this structure is I put links to the previous videos that I did and uh you can take a look how we set up the whole thing um so in this layout. TSX file at the very top we can add that import uh that uh we copied from over here and then we can save this and if we do the same question again how does hugging GPT work and uh paste it I'm going to clear this terminal over here and send it back and hopefully now this stream has to show something over here so yes now we do have this stream shown stuff but obviously we have more to do and uh we need to basically grab all these things that are being streamed over and display them on the screen okay so now that uh we have the response body uh stored in our stream variable here and you can see this is basically a readable stream uh we are going to uh just uh get R use the get reader uh function on this method on this readable stream and then uh uh basically we'll have like we'll be reading this the content of this stream uh I'm going to paste a little bit of code here and I'll explain to you what this is right now uh so uh we're going to uh read through that uh stream readable stream or whatever it was and then uh this is going through uh a uh self calling function so we're going to do this because we want to get uh all the chunks or tokens that are being sent over and this is where inside this uh reader. read function the call back that we have here uh we are going to um kind of like take care of all the processing of whatever we receive and then display it on the screen uh but right now I'm actually going to uh just print out the result that we get here and since this is like a self calling uh kind of loop uh I'm going to only grab like the first 90 uh results that we get because as we saw before we had like 85 or 86 tokens being sent back and so we expect by uh the uh read number 90 we already have everything sent back and then we look at the uh console and over here terminal to see what is being uh logged for us so let's do that uh I'm going to ask the same question again how does hugging GPT work we send it over we see it being printed out in the back end and then as this is being uh printed over here you'll see that this uh the response that we were sending back uh this also gets this result gets printed here keep in mind that it is still running well it's not running because we returned it if we didn't have this return it would be like a loop and then you probably had to like stop your um server or something to stop that and so you can see here that uh we have all these uh objects done which is a false value for most of that at some point it becomes true so that is probably our clue to know that everything has been sent over and once our result that done is equal to True uh then we stop this whole thing we do the return in that and so that's first thing the second thing is that the result. value over here this is just as you can see here this is Unit 8 array so uh these are just numbers that are being sent over and we need to decode these and we're going to use text decoder for this so that we can actually convert it to uh convert these numbers these arrays to human readable uh text so uh let's do that next so firstly we need to uh bring in a decoder so we're going to use this text decoder that was tf8 uh decoding that we're going to use to uh kind of convert the array that we get to uh human readable string and then uh what I'm going to do is uh for I get rid of this I uh that a counter basically and I'm going to wrap this whole uh reader. ra function that we had here into a promise so I can say resolve as resolve once uh uh we go through uh all the tokens and chunks that we get from the uh back end so as you can see here I have this streaming promise that uh uh I have moved the code uh that I had inside that and as we saw if the result is done we're going to set it as resolved and then we return and then uh we're going to save the result. value that we get here decode it and save that as our tokens this is going to be human readable um string and then we can actually console log this as well so that you can see uh what is uh inside each of these tokens and then you can see that I'm calling that streaming promise over here so I'm saving it and again I'm just going to use the same question that we had over here paste it clear these over here and here these Terminals and let's send it uh I've sent the message over here and see as these are getting generated we should see similar tokens are also being Sim generated here so we can see that stream is happening so we have like H again GPT works by using and so everything is being sent over uh as expected so this token is what we need to actually use uh to display our M message onto this screen so I know we currently have this uh state that we are using to keep track of our conversation uh but this is uh for like the whole conversation since we are streaming a message one by one then we actually need to have a single uh state for that specific uh we call it last message that we get from the uh back end so I'm going to Define this message and this is just going to be the string the text that we get and then uh all we need to do at this point is to add these tokens to that state so I'm going to set this uh last message we're adding uh this token to the value that it was previously so it kind of like accumulates the uh tokens and the whole message will be like shown as they the uh tokens come through now we do need to also show this uh at the end of uh our uh like this messaging area over here so for that I'm going to add the last message over here so right below where we have uh our conversation stuff happening I can actually go ahead and uh add this last message if it's not equal to uh empty string basically we have something that's coming through from the back and I want to show that uh in the UI over here and uh that's just the simple just we have a uh chat uh kind of like bot icon and then we have this uh background of uh I believe maybe like this gray background area of it with the text coming through that will be displayed that's where we show the last message as it uh comes through the tokens are going to come through and they're going to uh create basically form this last message so let's do this again I'm going to ask the same question and then let's look over here you're going to see the again the uh tokens being printed out but at the same time you saw that the whole thing was also uh streamed over here and displayed to the user now uh if we continue our conversation this will go away because uh we are not adding this last message to our conversation which uh is being uh uh kept track of by this conversation state that have conversation uh so in order to fix that I'm going to also add the uh final message the uh accumulated tokens the value that we have to the uh last uh to the conversation state that we have so uh I'm going to uh Define a new variable over here to keep track of all the uh tokens coming through so I'll call it accumulated message over here and uh all I need to do is uh make sure that it includes all the uh tokens that are coming through so again right here after I set the last message I can add the token to that variable as well and then once uh we are done the result uh the results are the tokens are completely streamed and displayed on this uh UI we can add what we have received to our conversation uh State as well and to do that we're just going to do some quick JavaScript stuff here we know this is the AI message that's coming through uh kind of cleaning up the message if there is anything you can do more cleaning up as well depending on your case so H this is just a simple rag looking at a website and for a case that I was doing with uh using kind of like a um rag with my own data that were stored on like mongod DV At Last I had to go through more uh cleanup process so uh I'm just printing uh bringing these uh things over here you might want to also dream the Stu start and there may be some uh unex unwanted uh double coats and things like that that you might want to remove from the beginning or end of your uh tokens that are coming through so just for your future reference you need you may need some cleaning up before you display the whole thing over here and so with that uh I think everything should be fine and uh what happens here is that once the last message is done streaming like the streaming is done and we have set this last message we also want to set it back to a uh empty empty string because we don't want to have that last message display twice cuz it is going to be added to our conversation here and also we are showing the last message as well here which if it's not empty string it's going to be displayed on the UI so I'm going to actually after this streaming promise is over I'm going to set the message that last message to empty screen and it will immediately be replaced but the message by with the message that we added to the conversation here which is basically the same message like I said you might need to do some cleaning up just for that you can see here like we don't have an empty line so we might have an empty line and then we add that to the whole conversation when the uh thing is over here being uh uh the logic is being uh calculated over there so with that I think I can also move this one inside the uh try uh block over here because we already have one for the catch and then I save this and that should be uh basically all that we need to do I can again test it let me reload the app and uh I'm going to add a little bit actually of uh uh similar printing or loging of over here on the react native side as well because I want to also make sure that and the streaming is actually happening as these tokens are being generated and sent over cuz there might be a case that everything is like being streamed here but on the front end we are still getting them as a full text full response and then during this process it kind of breaks them down into tokens again and then shows it to the user streams it back to the user so we want to make sure that streaming is happening at the same time on the front end as the tokens are being generated in the back and Orit like a very not exactly the same time cuz there is a time that it takes to come over to the front end but like we want to make sure for example our first uh token that is being streamed over here is not coming after their last uh token cuz that means then uh this is not really desired so uh let me just add a little bit of line of uh console logs over here and I'll be right back okay so uh I added any variable I over here just kind of like this index over here and then uh I'm also doing some getting like the hours minutes seconds however you want to do it it's fine I'm just like doing some uh maybe not so pretty JavaScript over here to console log and the uh the time that these tokens are being uh kind of like coming to my frontend UI and being uh streamed on the front and to compare this uh against what I have in the back and so I'll show you in a sec what that means I'm going to uh send this same question again and we should see the streaming happen on both sides and so this is where the comparison happens so my streaming on the front end was from like 330 341 all the way to 33342 p.m. and this backend started 33341 as well so millisecond 605 started at 576 so we can see that the first token that we received on the frontend actually was really very uh soon after it was generated on the back end and the last token generated on the back end was at 33342 so 33342 like 295 milliseconds and this is like 303 milliseconds so and we can see that it was actually streaming as it was being uh generated on the back end and being streamed on the front and it's not like it waited for a whole thing to end and then started uh streaming on the front end which is what which is is not what we want and what we want is what is actually happening over here so I think that should uh cover everything that I wanted to show you guys hopefully this was helpful uh please make sure to like this video And subscribe to the channel I will be spending more time um and doing a lot of other different kind of videos about react native and maybe include more AI stuff as well so uh please uh subscribe to the channel and uh turn the notification Bell on so that you're notified as soon as I upload videos all right see you later

Info

Channel: React Native Journey

Views: 322

Rating: undefined out of 5

Keywords:

Id: nt4UQ_mzDeI

Channel Id: undefined

Length: 32min 9sec (1929 seconds)

Published: Sat May 11 2024