Build a Perplexity Style RAG App with Langchain in Next.JS and Supabase Realtime

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm going to be showing you how to build out a perplexity style R so retrieval augmented generation app in nextjs using super base and Lang chain on the back end so as you can see the flow of how the application works is very similar to perplexity if you've used it so perplexity is an app that I use all the time it's one of my favorite implementations of an llm Style app and I thought it'd be interesting to take an approach on how we could build something like this out so in this video it was the first time that I use super superbase real time and it's very similar to The Real Time database that was sort of popularized by Firebase so if you've used Firebase before for like a chat application or something like that it's going to be very familiar uh to you if you've used it now to set all of this up so I actually have it set up in a way where you can do all of this for free so I have it set up on a free tier of superbase I have it set up on a uh Brave search API which gives you a generous amount of search queries every month I think it's 2,000 or so and then finally it gives you the ability to swap in the llm that you want to use by leveraging Lang chain so if you want to use something like oama you definitely can use that by just swapping out the chat uh portion that we're going to be using on the back end for whatever model that you want so this is set up in a way where you really can swap in different models that you want so I'm going to be using open AI for Simplicity sake uh in this example and if you haven't used the openai API you will be able to experiment with $5 worth of free credits to get this setup so it does take a little bit of setup to uh set up your Brave account if you don't have one your open AI account if you don't have one and then finally your super base account if you don't have one now the way that I'm going to be going through this is I have everything that we need in primarily two files here so I have our page JS for our front end and then I have our route for our back end and now after this video if you want to break this out and have the components and separate files and do whatever you want with it feel free I'm going to post a link to the GitHub repository but just for the demonstration and tutorial sake I'm going to go through the flow of the front end then back end and show you everything that you need to set up so like I mentioned you're going to have to get a few API keys so go ahead set up an account on open AI uh it's very simple to find the API Keys similar to the brave search API just make an account um go into settings you'll find all the API Keys very easily and then finally with super base once you set it up I will actually point this one out cuz if you haven't used it before uh you might have to do a little bit of click clicking around but once you make a uh new project here so this is the project that I'm using here you can go within the project settings and within the project settings if you click API you're going to need your URL and then you're going to need your service Ro secret now this is for your environment variable and then we're going to be putting this into the front end of our application which is safe as our public key here so once you have all that uh you can plug all those in um you put it in a EnV and I didn't even mention so because we're using next uh if you're not familiar just go ahead run MPX uh create next app at latest so this is set up with the next js14 which just came out we're not going to be leveraging some of the new features um like server actions and all of that in this but it will be compatible if you go ahead and run this at the latest version of next so once you've gone ahead and run that command uh make your EnV you can go ahead and start to scaffold out your project like I have here so you see I have a simple structure so we have our source app within our app we have our API and I just called it backend uh I'm just going to be using plain JavaScript for this so it's mainly just to show you how you can sort of take an approach to make an application like this if you want to swap it out for typescript uh go ahead and and do so so like I said there's going to be a GitHub feel free to do whatever you'd like with it and then finally we have everything within our page JS here so once we have that set up our API Keys all plugged in you can go ahead close out this file now the things that we're going to need to install are all within our package Json here so if you go ahead and npm or you can use bun I've been using bun a lot lately just to install and actually uh run everything instead of node lately uh and it's worked great it's really really quick I'd encourage you to check bun out if you haven't already um but these are going to be the handful of things that we're going to install so some of them are for the front end some of them are for the back end and I'll go through them more in detail as we actually get to the portion where we use them within the code so like I mentioned I'm going to start off on the front end I'm going to run through all of these different steps now it does look like a number of different steps in both of these but it is only several hundred lines of code across both the front end and back end and now there's a ton of different ways you could approach this this is the approach that I took to explore superbase real time but uh there's a number of different approaches you could use like websockets and socket IO or even a bun websocket server if you wanted so the first thing that we're going to do within our page so I'm just going to declare that we're using this is the client side we're going to use strict uh in our case here here then first what we're going to do we're going to import some of the core things that we're going to use within react now phosphor icons that's just going to be some of the icons uh within the application so if I just go back to our application here we see a handful of icons just some sort of nice to haves um then we're going to be using a react markdown editor so I had a comment in a recent video of someone asking how do we use a markdown editor in these applications so hopefully this sort of answers that question on how you can use it now it's very simple implementation but you can go ahead and look up uh the react U markdown editor and see some really good implementations of how users have configured it to make it look really nice like chat GPT or whatnot then finally we're going to be including our subab base here on the front end so the first thing that we're going to do is we're going to initialize our superbase client and this is where we're going to be putting our key like we saw on the uh goey of superbase here so we're going to grab this key and put it within the front end of our application so this one is safe this one is not so this needs to go within your environment variable so once we've done that we're going to first wrap this main page component in our home component we're going to declare a number of things here so we're going to be using a ra to actually scroll down the page so as the messages load it's going to go ahead and use that ra just as a marker sort of like an anchor to continually scroll down the page then we're going to set up a hook for our input and our message history so essentially all our message history they're going to be going with right into superbase and then based on superbase that's going to be essentially how we manage the state of our application and the Order of all our messages and then we're going to map it out uh essentially once once we get to uh the jsx portion so here's a simple auto scroll to the last message that we have within our use effect then next in our next use effect this is going to be what we use to handle the inserts uh depending on the payload type now there is a little bit of code in here to specify if the payload type is of type GPT and that's to be able to handle that streaming in of that response for the GPT messages in particular and it it doesn't necessarily need to be GPT it could be um like mistl or whatever you want um to to use here but if you're using streaming responses you just have to set up so that the last message gets added to that portion of the array so here is going to be how we subscribe to the sub based Channel and we're going to be listening for the insert events in our table of message history now this is going to be for when we initially load up the application and it sets all the history of our application from there we're going to have the function to actually send the message across and then we're going to be setting the input uh just to clear so you have that um familiar uh clearing of the input so you don't have to backspace and what not not so our our backend very straightforward so you know naming is hard in programming I just call it backend you could call it something more appropriate if you want we're essentially just going to be sending the response of the input to the back end and that's that's it um that's the one piece that's going to go back um is the input or the follow-up questions which is essentially uh going to be an input as well so from there we're going to render out our home component here so I'm going to go over the message Handler in just a second but essentially what we're going to be doing to render out our components is we're going to have a hashmap of all our different components and it's going to go through and see what type of payload that it's receiving and depending on the payload it's going to map out the different component and now this can be useful if you want to use a number of different uh rendered UI components that aren't just like streaming text or simple text so you see I have boxes here that I rendered out for like the sources component type I have obviously the streaming text response uh but you can do a handful of things if you want to render out you know like a map or a chart or something like that you could very well do it do it with this implementation I'll probably do a follow-up video eventually where I can show you how to do those sorts of things with some more involved uh type of UI components so from there we're going to be adding our ref um below where our message is map out and then where our input component is and both of this these components as well as all our other components I'll go through in just a moment here so our first component outside our main com home component here that we're going to go through is the input component and essentially what this is going to be doing is we're just be going to be listening for the change so essentially as someone's typing on the keyboard we're going to be setting that input hook and then if the key down is enter it's going to send that message and then we're also going to have the button with our Arrow from the phosphor icons to send that as well so next we're going to have our query component so our quer component is like the simplest one that's going to be essentially just what we send across here as our initial message now our sources now you can can swap these out and be a little bit more involved with some of the logic you could also do this on the back end if you wanted but essentially what we're doing here is we're going to truncate the text so we don't have a large piece of text and just to sort of blow out these Source bars here and then we're just going to be uh putting only the URL in this format with this extract site name here so from there again we're going to grab a nice little icon we're going to put the title of sources and then we're just going to map out all those different sources with some of the the normalization that we have above so next uh we have another one so uh Vector creation now this is just one that will temporarily show so it will show as each Vector gets created and I'll go over this a little bit more on the back end but essentially how the application is working is it's going to pass in this piece of text then from that we're going to extract a good query for a search engine then from that we're going to pass the search engine query to Brave and then the brave API is going to respond back with the top results once we have the top results we're going to go ahead and essentially scrape those pages we're going to embed all those pages once we've embedded those we're going to do a similarity search on those top results so you'll be able to swap this out for more than four results uh once you get there and I'll sort of go through all of that um once we cross that bridge so this will just show that initial message and then it will clear it so it doesn't sort of hang out so next we're going to have our heading component um so this is going to be where we just render out the answer here um but if you wanted to use a say other components and render out a heading separately you could also reference that heading um for another heading of a component later on so next this is going to be our simple implementation of the react uh markdown um viewer here so for instance if there are links within here uh or annotations it will render out and you'll be able to click them um so this is similar to how a lot of applications uh render out markdown but I said this is a very simple implementation all we're really passing in for the configuration are to make the links blue here but there's some really big uh examples I just wanted to keep it very clear and concise that to get a very basic markdown editor here that's essentially all you need so from there we're going to set up our follow-up component now within the follow-up we're going to also have nested within it another ref to scroll down so when I set this up with the ref to scroll down this component did not act kindly um before scrolling down so it would sort of load the messages and stop about here and I ran into issues so I just nested it within this component as well in addition to the other ones that we have okay so here we're going to basically do a simple check to make sure that it is Json now there's more sophisticated checks that you could to make sure that it is Json content but because on the back end we're asking the llm to generate Json sometimes it can fail this is sort of a safeguard so it doesn't break our UI here so here we're going to handle the click of the followup uh message so if someone actually goes and clicks on one of these follow-up messages that will essentially perform the same action as setting the input and sending that message across if you actually typed out a message so here we're going to map out and set set up the follow-up component so again we have our icon we have our heading then from there we're going to Loop through all of the different followup um uh answers that we have generated uh from our llm response and then we're going to scroll down now finally our message Handler so this is going to be where we actually discern which component to use depending on the type of the payload that is seen from the super base realtime instance so here we have our simple map we have query Source Vector creation heading GPT followup now those components all map to the same keys that are responded within the key of type or or I should say they're the same values of the key type within our superbase real time database and then from here this is going to be just how we determine which component to use so if it exists within this map here go ahead and render out that uh particular uh component and its contents as well as passing down um some functions that we might need like send message so that's it for our front end so like I mentioned not a ton of code it's about 240 lines of code and then our back end will be um just as hopefully uh straightforward as this so I'll just hop into that in just a moment here all right next we're going to be moving on to our back end so if you haven't already go ahead and make a directory of API and backend and then just go ahead and make a route JS so once you have that we're just going to go through all these different comments that I've set up here so first what we're going to be requiring is the recursive character text Splitter from Lang chain uh we're going to be using a handful of things from Lang chain now you could also take this even a step further and require all of the things that we're using within open AI once we get to them within Lang chain but in this example I'm going to be using both the open AI uh V4 SDK as well as Lang chain and then similar to the front end we're going to be including subbase here now all of these different things I'm going to go through in more detail here so the brave search is pretty self-explanatory but things like memory as well as the open AI embeddings you'll see their function and understand it a little bit more as we get into the code of that portion so first we're going to be initializing both the open Ai and superbase clients with our environment variables that we put in at the beginning of the ver uh the video rather in our environment variable file here so first we're going to be setting up a simple function that is going to send the payload to the superbase uh table so essentially what what's going to be happening is depending on the state and period at which the this back end is executing it will add different uh messages to our message history uh depending on the type and the responses from the llm or the initial query and things like that so that's going to be essentially our Handler for adding things to the database and then everything that goes to the front end is going to be from subase so we're not setting it up in a way where we're going to be first sending it across uh with our post request or with a web socket now you very well could do that though is something I just want to sort of put out there is like I mentioned earlier you could use something like socket.io or bun web sockets send it to the front end then log it to the database so a couple different approaches here so the first thing that we're going to do is we're going to rephrase the input of what we receive and we're just going to sort of clean it up a little bit for the um for the initial Brave query so all I'm saying here is you are a rephraser and always respond with a rephrase version of the input that is given to a search engine API always be S succinct and use the same words as the input so essentially if someone puts in something and it doesn't entirely Mak sense this is going to add a little bit of uh sort of padding to that that to hopefully um pass it off to the brave API with um more relevant and like a good search engine query so from there we're just going to be returning the the content from the gbd 3.5 turbo uh response and then from there what we're going to be doing is we're going to be setting up our function to get all those sources so on our front end here we have our sources and this is going to be where we set up and parse out the number of sources that we want um set up all our embeddings and all of that and I'll cover it a bit more as I go through this here so the first thing we're going to do is we're going to load up Brave we're going to load in our API key here then we're going to call our rephrased uh input function here from the llm so this is going to respond with whatever gpt3 3.5 turbo uh thinks it it should uh need as an adequate uh search engine query and then from there what we're going to do is we're going to go ahead and use our a loader here and call the rephrased messages within our docs here so if I go ahead and actually look up our loader here so our loader is going to be calling the brave Sur search um API and then we're going to load all those docks that we're going to be getting back into an array here so by default I believe it's 10 results that you get back from Brave and then from there we're going to go ahead and parse out four in our example so obviously the more that you load the more that you sift through the longer that it is going to uh take but there are going to be some things that we do run asynchronously to Hope speed up the process a little bit while it's embedding so first what we're going to do is we're just going to call our normalized data function so I just put it within here just so it's easy to see all that we're going to be grabbing from Brave is the title the link and then we're going to be excluding results if it includes brave.com now the reason I put in this is as an example if you want to filter out uh certain results I found when I asked for news it would return results for brave and I was looking for other results that were sort of outside and uh other news organizations but that's just an example of if you want to filter out different things uh you could include that there so then we're just going to return with the normalized links we're going to only return four in this example then once we have that we're going to send our sources to the front end all normalized and clean for us just to render to the UI so next we're going to initialize our Vector count now another thing that you can do for the scanning document portion is if you want to send back a response with each scan document you could go ahead and add in uh it all within here and you can have the UI be a little bit more responsive instead of just that one bar that comes across the screen that says uh scanning documents so we're going to wrap this in a TR catch we're also going to set a timeout now the reason that we have this set up is because we're going to be making fetch requests to the web page and in this we're just going to be doing a simple get request to the web page we're not going to be using Puppeteer or anything like that and now a lot of websites out there will require something like a synthetic Chromium browser to get the content so essentially what Puppeteer would do is it would wait for the page to load wait for the JavaScript to execute and then you could go ahead and parse it now that's both more expensive computationally um and it's also an extra step but if you do want that and you do want uh more results and you want it to be able to parse the internet a little bit better I'd encourage you to look at Puppeteer I'll cover that in a future video but it's a little bit more involved to set that up as well so we're going to be sending in the link we're going to be getting the contents of the HTML now this promise. R is going to be how we have all these requests go out in tandem so instead of awaiting okay go scan this page and embed them Etc one at a time this is going to be how we essentially send them all out we're going to be waiting for them and then they'll all resolve and get that response so instead of waiting for them one by one we're just going to wait for them all to come back so first we're going to check to see if there's a sufficient length of the HTML so if there's some uh cannot parse this page or whatever it responds back with if there's like a 400 message or if there's just uh enable JavaScript to use this page or something we're just going to hopefully skip over that content because it's not going to be relevant to to what we're doing here and then next what we're going to be doing is we're going to be splitting the text of the web page into uh chunks and the reason why we split it into chunks is we're going to embed each of those chunks to perform our rag against the llm so what we're going to be doing is we're going to be chunking it we're going to be embedding it into vectors once we have those vectors we're going to go ahead and take the query from the user and see what similar within the page so essentially the query is going to go in and look through that page and respond back with the top result so what you're also able to do I'll point this out in just a minute here but you can query for uh more than one result so within the similarity search it's going to respond back with the number of top results that you specify and then depending on the top results there you can go ahead and pass it to the llm like we'll see in just a moment here so what we're doing here we're going to create our Vector store from the split text we're going to be passing in uh the link as well into our embeddings and then what we're going to be uh going through is we're essentially going to be looping through this function here so we're going to be creating these vectors as we get um the length of the uh data that we want to parse now the other thing is it is going to skip if it times out and then it is also going to skip if um the length so it's not actually going to split those up and embed those if it's too short then last we're going to just log out uh an error if it's skipping certain pages so say if you're testing this out and there's a certain page and you really want it to parse it um but for whatever reason it's not loading it's just going to indicate that link is skipped so if there's something that you're querying and it keeps failing that at that point is when I would encourage you to look at Puppeteer and Implement something a little bit more sophisticated to actually parse and scrape those web pages so what we're going to be doing here we're going to wait for all the fetch requests and the the promises to be processed then from from there we're going to Loop through the four that we specified so if you want eight you could do eight if you want five you can do five feel free to do what works best uh for you here so obviously the higher the number the more times you're going to be um quering that embedding endpoint and then the higher the context that you're going to be passing to the llm as that payload so essentially the more results the more expensive but arguably more accurate so once we have that we're going to filter out all our unsuccessful results we're going to get our top four results um and then from there we're going to be sending over that payload to superbase and we're going to be specifying it Vector creation and that's going to be where it sends this message um temporarily to the front end like we had set up on our page JS so it's just going to for a few seconds there just show finish finished scanning sources so once that's all done we can trigger our llm and get our followup message so what we're going to be triggering our llm with is our message as well as the top results here so I'm just going to be stringifying it you could also parse through that payload and make it just plain text I found that stringifying it actually works pretty well cuz the llms will understand the context of each different Source um um sort of group together within that Json structure so next we're going to have a simple fetch page content function and this is going to be where we extract the text now once we extract extract the text from the page we're going to be using that Cheerio library that we went ahead and included and we're just going to go ahead and clean up some of the tags that we don't want to uh have within our results so like the script style tag head tag you can be a bit more sophisticated here and remove some other things but just to get uh hopefully mostly natural language text as in the results so it's not like putting in JavaScript and you're asking text questions about like the Javas script that's actually executing on the page that sort of thing so this is just a simple example you could really blow out this function and make it like really um more involved and get like a really good uh cross the board um implementation of how you could parcel page so from there we're going to write out our trigger llm and follow-up function and what we're going to be doing here is we're going to get the GPT results from the input we're going to get the follow-up results and then we're going to be sending those across so what we're going to do is first we're going to go with in the uh get GPT result then it's going to wait for this to finish before it actually sends the payload to superbase for the followup and then this is going to be for our actual next endpoint to show that the it's processing the requests here so within our get GPT results what we're going to do is we're going to accumulate the content of the stream and we're going to be adding that within uh superbase and concatenating it as it gets streamed in so this is just making sure that it will accumulate in order um initially when I tried this uh if you're sending things and it's not waiting properly and whatnot and you're not managing that stream State uh effectively it can appear out of order so just to be mindful of uh why we're accumulating it here so once we have that we're going to create an initial Row in the database for the GPT Stream So then we're going to be sending over our heading so we'll have that just appear right when the answer is about to go ahead and iterate through and then we're just going to start iterating through the stream that is uh from our gbd 3.5 turbo response now one thing I'll note here within each of these functions where I call the chat completion from open AI you could swap this out for a different model so if you want to if you don't want to use GPT you want to use a local model like mistol or uh Llama Or code Llama Or what whatever you want you can swap out all these different portions here within all these individual functions there might be some syntax and things you you might have to change a little bit but just to be mindful if you want to swap them out you can you can do that so once we have that we're going to see if the content exists within the message and then we're going to accumulate the content uh within our um string here at the top here then we're going to go ahead and update our row with the GPT response so next we're going to generate a unique stream ID now this is sort of optional you don't necessarily need this to work with the implementation but if you want to start to scale this out and make it something where you can have different users and different threads and all of that you can start to put in different payloads within your um setup for superbase uh to be able to have unique Keys within everything and the one thing that I am going to show you at the end here is I just want to pop over to superbase and show you some of the configuration that you'll need to set up this real time uh streaming within your database so I'll just remember to show you that right at the end here so here we're going to be generating that unique stream ID we're going to be creating our payload to GPT we're going to insert insert that into our message history database here and then we're just going to have a return so we're almost there um so next we're going to update the row with the GPT response so we're going to be deleting that initial row and then we're going to be inserting the new message here so there's a number of different ways that you could do this uh being the first time that I use uh the superbase real time database this was the approach that I sort of explored and the one thing that I note will note is I tested this with websockets and if you want streaming responses that are faster you will get faster responses if you send the results directly back from the llm to the front end and then put it within the database but the thing with this is if you send the query you can you know you can go ahead send the query and then it will be running within the back end as you're parsing so and the other cool thing with this is if you have um this loaded up on multiple devices that response will come back in real time across the board so it's it's sort of neat it's generally used for a chat application but just uh sort of an aside all right so next we're going to be setting up our function for generating the followup now this is going to be where we try and rely a little bit on the openai llm to generate a nice payload for for us so with this one I actually chose to use gp4 and I'm asking it specifically to respond back in a particular format so gp4 um it's not a huge uh amount of texts generally that you're going to be sending to this follow-up and it's not a huge amount of text for the follow-up questions um that you're going to be getting back so it's not a huge cost uh to use the GPT for um for this particular function so once we have that all set up we're just going to Define our post function for our API and then we're going to send once we get our message parsed we're going to send that initial query message right back to superbase so it saves within State and then once it's within superbase it's going to paint to the front end and then finally this is going to be the function that actually starts the whole process every time so it's going to start the search engine return the sources etc etc so just like I mentioned so once you're within uh superbase I'll show you what the table looks like in my database here so if I just go ahead within my project and I show you the table so you see I have message history just like I had uh shown you in the back end there and front end now here's all the different responses now if I go ahead and edit table the way that I set this up is I set it up with a payload of Json now once you have this more set up and closer to production I'd encourage you not to use Json and actually leverage the database as it's more intended and you could put in the different you could put in type you could put in content and put those in separate keys I found this nice just to use Json because it gives you the flexibility without having to constantly touch superbase in initial uh development of a project and you're able to just descend over whatever payloads so long as it's within the Json P um format but once it's um like I mentioned closer to production encourage you to actually use uh columns for all the different fields here so you could use a column for type you could use one for Content Etc now the other thing that you'll see that I have here is I have it enabled for real time you'll have to make sure to uh turn that on for that real-time functionality to work and then I also have turned off just for demonstration sake the RLS or the road level security um you could tighten this up again if you're wanting to do something like this within production but in this example running something locally and just running through the steps here I just turn this off um for demonstrations sake so that's pretty much it if you found this video useful please like comment share and subscribe consider a subscription on YouTube or on patreon and otherwise if you found this useful I'll look forward to seeing you in the next one hopefully take care here
Info
Channel: Developers Digest
Views: 1,092
Rating: undefined out of 5
Keywords: Perplexity, AI, RAG, NEXT, NEXTJS, SUPABASE, REALTIME, SUPABASE REALTIME, GPT-4, GPT 3.5
Id: KXiefN1kW2Q
Channel Id: undefined
Length: 38min 27sec (2307 seconds)
Published: Mon Oct 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.