Build Your RAG-based ChatGPT Web App with Azure: LawGPT Use Case Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video we are going to develop a web application powered by llm so we're going to use Azure open Ai and cognitive search to build an app called chat with your data okay so it's basically how you can use your Enterprise data you know uh with chat GPT inside Microsoft Azure Cloud okay so they have given this GitHub repository we will Fork the GitHub will clone basically the GitHub repository and then we will have our data you know we'll create the knowledge base out of it and then we'll push it to Microsoft Azure and that everything will happen automatically now we'll just have few clicks okay we just have to run few commands from our CLI and that will automatically create all the resources and instances on Microsoft Azure to you know develop this application so if you currently see on my screen I have something called chat with your data it has two options called chat and ask question and I have named it called law GPT at the moment but you can name it anything you want okay let me first show you the data that I have taken so I have three PDF files that I have used as a source data or the source documents okay so if you see this is a crime analysis case studies you know and all of its data are related to law okay so they are you know related to law legal legality and all okay so first of all for example if you are a lawyer if you are an aspiring lawyer or if you you are a aspiring judge or anything in the law enforcement uh a field right if you want to have a conversational interface to retrieve some kind of information this app will be very helpful in that case so what we are going to do is that we can we have a data folder and inside this data folder you can keep on adding documents okay you have five documents 10 documents and then we'll use something called cognitive search that will create the index of those documents so we have options of vector search and the tech search and a hybrid search both vector and text combined I will show you when I when we push this up so you can see it says Azure open AI plus cognitive search law GPT and then we have two options here called chat and ask a question so if you see this is the one document that I can I can go into I'm going to open this document as well risk assessment case studies for the legal profession anti-money laundering and terrorist financing working group and if I go to other document then I have a Supreme Court case studies you know so there are 32 Pages or something in that document and you can see all the case studies over here you can see this case study so I have splitted this PDF it was around more than 200 pages but I splitted it just to save some tokens there right so if you see we have this case studies like Dartmouth College versus Woodward D bounce versus Ogden Worcester versus Georgia Etc and all this case studies over here if you come down case study one then some questionaries are there then case studies too some questions Etc right so how how and aspiring lawyer or a judge can use this tool that we have gonna develop using Azure okay uh completely as you cloud and all of their AO AI Services okay so if you see here it says chat with your data ask anything or try an example so if you click and click on this it has some default questions like why is the Dartmouth case considered to be important in the economic history of the United States it also has something called ask the question so if you go to ask the question here you can ask one question so this section is mainly a QA section question answering section and this is more like a conversational interface like where you can ask follow-up questions you know it's more like a chat board because if I click on this why is the dark mouth case considered to be important in the economic history of the United States now what I'm expecting is that's going to go inside that knowledge base the index that is created the vectors that it has right it will give me the response from there and you can see the streaming responses it has a streaming response just like chat GPT what it says is the dark mouth case is considered to be important in the economic history of the US because it established the principle that states were not permitted to take over private so it it gives me an answer and it also gives me citations the source document and if you click on this Supreme Court case studies the file that we have over here you can see this is the file that we have Supreme kit code case studies and now what what it's showing is it's a PDF viewer that helps you understand that from which document right from which page you can see it redirects us to this page once we click on the citations it you can also see the questions over here Supreme Court decisions have on Dartmouth College so this is the dark mouth case that we are talking about in this uh you know uh this question that I have asked now it also has couple of other things that's very interesting by the way you know it has something called supporting content it gives you the content because now we also have number of K that we pass it as a parameter right the number of K rate how many kids would look at so for example K is equal to 3 it will look at three samples or something like that right as a source document and you can see it over here now it also has a thought process that how it searched you can see the you know the system prompt the user prompt Etc and how it has combined and retrieved the information all together and how it has given us this concise answer that that's what chat GPT is good at it right all the llms in general so it has a supporting content it helps you explain the output completely right so you can see we have thought process that will help you understand okay this answer has been retrieved by this information now if you have the citations now you can see the citations over here right uh the complete PDF so what we are doing we have developed this low GPT with the help of azure uh search uh so we said that we have as your cognitive search and Azure open Ai and I'm I'm calling it law GPT but you can build anything that you want you know if you have your medical documents if you have your Law data if you have your financial data if you have your insurance policies data if you have any other kind of documents you can bring everything all together and you can build a domain specific application that will that will retrieve information for you guys that's what we are going to develop in this video is going to be very easy okay we're not going to write a lot of a code because it's the GitHub repository is already there you're going to clone that repository make some changes and then little bit of customization on the front end and then we'll have this you know uh app ready here okay so now let's let's build this application guys okay so to develop this uh law GPT application guys or any chat with your data application we basically we can find that GitHub repository here that called Azure search open AI demo which says a sample app for rag which is retrieval augmented generation you know the community is going Gaga over rag right the selling point nowadays in the community or the industry basically and pattern running in Azure using cognitive search for retrieval and GPD models for as an llm right to basically power your chat GPD style question and answer and chatbot experiences right so this is the GitHub repository you can see it's very famous have more than 3.5 K Stars has some setup thingy some high level architecture that how you combine cognitive search with uh llm on your data sources now your data sources can be in Cosmos DB can be on blob storage can be anywhere else but we are going to have this locally because I don't want to occur that cost but if you already have your data on Azure don't worry about anything else just use GPT models okay which is on Azure you are okay with that okay your data will be completely private and it will not get leaked out okay if your data is already on Microsoft Azure okay and then you have cognitive search and combine with open Ai and then some orchestrator right an app server or something like that then we have prompt knowledge Etc that we'll see and then you have your some features like this is you can see the screen grab here and they have some project setup might look little overwhelming for the first time you know you might get a little confused that how you can set it up but let's see you know with very few lines of you know command we will be able to set this up in our system and then we'll put it there to as your Cloud okay to do that what you have to first do it that you should have you must have a Microsoft Azure of course account and I have a pay as you go subscription and if I'm not wrong I maybe I'm wrong but I think you need a a business email ID or something like that to create your account on Azure open Ai and correct me if I'm wrong because where I created months ago it was the the criteria was that that you need a business email you know to have Azure open AI so you should have all those I assume that you have all the access on Microsoft Azure if you have all the access you are good to go with that okay and then now what I have is I have these three data that you already have seen three PDF files you know I have one crime analysis case studies then I have federations of law societies which by law society which is anti-mile laundering and terrorist financing working group risk assessment case studies for the legal profession and then I have Supreme Court case studies you know so I am developing developing this for lawyers and judges to have some recommendations find out some you know Insight from previous case studies that can help them you know in their Journey throughout okay of using a generative AI That's what the huge case that we are trying to you know tackle in this video so what we have to do is that there are few criteria and some free for few prerequisites that I'm going to discuss you can see I have a folder called Data where I have all the three files now what I'm going to do is I'm going to open this project folder in my terminal which is a power cell by the way by default and you can see I have power sales 7.3.6 the first criteria is by kind of by the way I'm using carbon dot now.sh I use this to basically create some documentation Etc you can also use that I can give the link in the description this is what you have to do pretty much this what you have to do you first need Power Cell 7 plus version you know to have all these things up and running so if you don't have power cell 7 plus version then you first have to upgrade it you know your power sales version there okay so please go ahead and have power cell 7 plus once you have that you know instead you can learn PW with all those PW powershell.exe from a Powershell terminal if this fails you likely need to upgrade your power cell as I said and then you need node.js as well so if you don't have node.js I think you need node.js install for Windows you know you have here if you go to this Repository you can find out one of the criteria is because it's a react uh front end and it it needs node to run those uh the entire web app there guys if you see it over here they say that if you come to requirement it says node.js 14 plus and Power Cell 7 plus that's what you need you know for Windows users okay and so let's do that so here you can find out node I can give you this link in description if you don't have you can just download the MSI and double click install Windows set to path that's it so that's what you have to do and then you can see then we have the something like that PWC says something then you so you can see I am inside this project now what I'm going to do is I'm going to first run this command here so you can see winget install Microsoft HDD I already have this developer CLI but I'm not going to run this now if you're running it for the first time if you don't have a zoo developer CLI then please run this win gate install Microsoft dot ajd okay you have to run this command now what I'm going to do is I'm going to login to Azure so let me copy this and I'm just going to paste it over here you can see I'm just writing a j8 AGT auth login so what this command will do it authenticate with your Azure portal so when I click on this it will basically open a new tab in the browser and you can see if you already have sign in you can see I am signed in with that you can say it says authentication complete you can return to the application feel free to close this browser tab that's what I'm going to do here so I'm going to do that and come back you can see this is I only have one resource here AI anytime now once I run the upcoming command it will create a new Resource Group and have everything running inside that Resource Group I will explain that quickly so now if you see it says logged into Azu now I have already logged into Azure now so what I have to do next is now I have to initialize and clone this repository and I could automatically do that for us don't worry so what I'm gonna do is I'm going to copy this here come over here and you can see this is what the GitHub repository name is also that Azure search open AI demo so if you come over here and this is what we have written here AZ initialize right in it you do get init and all right so similarly we have a z Azure developer CLI that's a z a z init and then hyphen T which is your repository name and just clone it for you okay so that's what I'm going to do here so let me just do a control V I'm just going to hit enter it says continue initializing an app here this will also it's like a new guitar I'll say okay yes so let's just do NES and hit enter now what it does it's it's downloading all the code from that GitHub reposit so basically it's cloning inside that folder into a new environment name you know I'll just do dot V and V here so let's just do that enter a new and it's a success new project in a slide you can view the template code in your directory so let's go back to this and now you can see that you know we have uh inside project we have all of this thing that over here okay so let me just go back inside project and then you can see in docs these are some photos and some images then we have data and these are the data folder that we have so what we are going to do is we need to have this data over here so let me just do that so what I'm going to do let me just go back to desktop excuse me another stop osc project and inside this data I'm going to create that I'm going to delete all this data over here first and let's delete that data I have deleted all the data which is which comes by default when you clone it for the first time and I'm going to download all this data over here over there that inside that folder so let me just go inside osc my project and the data folder I'm going to just download all of these three files so let's see files inside that you can see this is what I'm going to do and that's it okay and you can see that now I'm gonna just hit excuse me you can just do a refresh project data I don't know why I didn't download let me just do a Control G it says fader okay so let me just do that what I'm going to do now is let me just go to Once recycled band and let me just copy this is not this Supreme Court case studies and fls case studies that's it and I'm going to just click on restore now once I restore and go back to C project you can see now I have all the files basically once you clone in that folder the project folder I already have a folder called data which was replaced by the default data folder so I have to bring it again from the recycle bin now if I click on this let me just delete all these three files just to show you that we have the right set of data over there let me just uh just click on this okay now once I open this you can see all those files are there inside that data folder perfect now we have our data you can put any other data that you have right with you and mainly the PDFs okay so you have PDF data you can put it over there inside the data folder now let me also do one thing let me just open this in code dot I'm just going to open this in vs code and you can see I have opened this entire folder in vs code and you can see find out the data inside there then we have you know if you go inside app basically let me just remove this if you go inside app you can see front end and backend separately so in front end will go there later then we have to we can go inside uh in the front end you have this go to app folder and after going to app folder you can go to front-end folder and then you can open index HTML now once you open index HTML you can make some changes which we say GPT plus Enterprise data sample now I'm going to replace this I'm making some changes on the front end I'm going to call this as low GPT you can name it any anything else you want okay and then you can also hash this you know line number five so that's what I'm going to do because I have line number five I don't need that icon on the uh the browser the tab that we have okay but you can put your icon but I don't have an icon handy but you can do that as well and I'm going to save this here so let's save this and you can come out of this index.html ha now if I click on the SRC which is Source now you can see we have something called pages so now once you click on pages let's click on the pages once you click on the pages then we can go inside layout and inside layout we have layout ps6 now why I'm opening this because we also have to make some changes over there inside this layout TSX now what you have to do is we have to change this title guys GPT plus Enterprise on the header the earlier that we changed was on the browser tab now here I'm going to change this so let's again call this law GPT okay that's it now we have changed this to low GPT and we can also do one thing there's a GitHub icon that I have to remove so if you see uh GitHub repo here here this I found so what we can do we can remove this list here from 27 to 38 we can just do a has so you can see that I remove a basically it has this this code is basically in react okay so this is how you have seen reacting uh or any other uh language as well that's what you do JavaScript or python control and the question mark on your keyboard now this is done as well now what I'm going to do is go inside this chat folder and inside this chat folder you have something called chat TCA TSX sorry excuse me of typescript Etc then you have TSX and here what we have to do is uh let me see let me just do an ALT key first setting button set is config panel what we can also do we can hash this line 167 that we don't need so let me just do that here control has okay now this is done as well and now let me just do a Ctrl s now inside SRC you know we have this components and inside this components we have this example and if you see this example we have that three default questions that we can replace it over here you can see it's by default this is the question that we have currently what is included in North Wind Hill something something now what I'm going to do here is I'm going to replace this uh with my questions okay so let me just do that I already have that question handy with me here in notepad and I'll just do that so let me just copy Ctrl C so let me just do a Ctrl C and just do a control V and you can see now I have done that it's my question that I have I have shown that once I was showing you in the beginning of this video where we have this question that is a key value pair so that takes and a value the default question where you can you know give some idea to the end user that what kind of question the questions they can ask so this these are the questions that they can ask tell me about us tell me about uh let me just remove this uh still tell us about so tell us about project Centurion so tell us let me just do that tell us about project Centurion and the other question was what to do if management of an existing trust that may contain criminal property and something like that okay you can change these questions depending on the data that you have so that's all so these are the four five that you have to change guys you know on very high level you can change your prompts Etc right I can show that so you have index HTML layout TSX chat TSX and example TSX that's that's where you have to make some changes let me just close all this because we have done that Ctrl s and now what I'm going to do is I'm just going to basically minimize this front-end thingy now inside app what you have to go to backend you have a little app.pi this is where all the magic happens guys right the app.pi where you have a lot of things like config Etc which automatically creates the instances and all of those things okay for you and the back end if you have data your blog storage you can also go to blob storage Etc to you know get that now if you come to main.pi you can see that's how it start that app from app import create app then you have text.pi new lines Etc you have requirement these are requirements that automatically will get installed once you do AGD up as you developer up the other but if you are doing it manually or something like that locally then you have to install all of this and then it has LinkedIn adapters callback handle Etc so the other thing that you have when you have data you know employee EnV something like lcsb okay uh that's it guys so now let's do one thing uh let's go back to yes you can see on the terminal basically the power cell and now what we have to do is we have to just do is set up okay so now let's let's do that here so us here and this this might take a bit of a time so I can pause uh the video here because it will take a little time to set it up and everything so let me just pause the video here and I will start the video once it gets completed okay so let me just do that okay guys so you can see now it says your application was provisioned and deployed to Azure in 37 minutes so it took me more than half an hour to set this up and deploy it on Microsoft Azure okay you can see it has given us a URL as well so we'll copy this URL now so this is in the end point so let me just copy this I'll copy this I'll open this over here and I'll just hit enter I will come back on this okay we will try it out now if you see here a lot of thing that has happened and I will show you if you go up it's uploading block for page so what it does guys for each of the PDF that you upload it splits each pages in all of those PDFs and then create the embeddings out of it and you can see it says uploading blog for page 0 page one page two so Azure blob storage then you have you know a document intelligence search service Etc I will go to the resources one by one you can see for all of this document we can see the pages splitted individually right each Pages has been splitted there okay you can see it over here for each of the documents so let's just go up and you can see you know it does a lot of things blah blah blah install all the required dependencies and libraries and then it took more than half an hour to create all these instances within the resource Group whatever Services you need and automatically set it up for you once you authenticate with Az AZ authentication that we have so now if you see here uh your application was provisioned you can view the resources created under the resource Group so my Resource Group name is RG law gptv MB so if you come over here I already have opened for you you can see it says RG law gptvnb and these are the resources sorry either the services or the instances that Azure has created for you all automatically set it up for you inside this Resource Group now if you see it's using services like of course an app service that gives us an option to basically interact with that chat bot or the application the web app so web app has been deployed within this app service our application then we have document intelligence then we have Azure open AI okay so Azure open AI is to utilize the gbd 3.5 model and then we have search service we have cognitive search service for indexing And all I'll go there and then you have a storage account for restoring all of the blocks that you saw for the PDFs if you go inside this search survey so let's go inside these guys so service now if you come here you can see you know a lot of uh things let me just you can see the gptk GPT KB index the knowledge base and you can see the storage is now 8.7 to MB and there are 321 documents okay for each of the PDFs pages that we have okay so the documents and you know you can go to field you can see the field ID content embedding category Source page so we are retrieving The Source document then we have the embedding the vector data the vectors for you okay and then we have document intelligence that provide us that uh Vector store capabilities okay so if you come over here you know search Explorer you can also search over here the query string it will basically uh go inside that in uh basically the created index and will try to retrieve information for you that's how cognitive search works but now we have combined this cognitive search with Azure GPT uh models okay now if you go back to this Resource Group we have Azure open Ai and all of these are in East US you don't have to do anything if you are doing it as a POC or an MVP of course you can you know if you come inside this vs code you can do a lot of changes here in the if you just expand app if you go to backend you can do a lot of things over here you know uh message Builder model helper you know you can see the GPD model the gpt4 models are also there whatever you want to you know spin up you can also spin up that you can make some changes here in the back end depending on that it will you know it will act guys okay approaches then how are you fetching the approaches and this is very intuitive so let's go back to this uh GPD and ask this question why is the Dartmouth case considered to be imported in the economic history of the United States and I'm expecting some kind of response you know from uh the llms that have been combined with the cognitive surge now you can see here the Dartmouth case is considered important in the economic history I mean once you click on this citations which gives me one document now it also have and that's how our rag works great right retrieval augmented generation it gives you the supporting content the number of keys that we have three over here but you can make it 2 5 10 whatever you want to do thought process you can see the thought process here that how it's searching you know inside that knowledge base that you have created so you can see there are two things that I would like like to highlight the one is role system the system Rule now if you see here it says assistant helps the company employees with their healthcare plan now what I have to do I miss that part but you can do that once we first clone this repository the data was for healthcare but you now we are doing it for law right lawyers you have to make this for law experts or lawyers or something like that okay and even it will retrieve better responses in that case and you can see it over here all the documents I know and then supporting content and citations we are getting the exact citation that from which part of that document we are getting that respond the exact up path of course it generates synthetic sentences because that's how it works it creates synthetic data out of it maybe you can do temperature equal to 0 or 0.1 if you want to reduce the creativity and Randomness out there you can do that from back end okay there's no problem inside that now you have approach dot pi you know you have your retreat uh if you read this chat read retrieval read dot Pi you have all of the prompt if this is where you have to make the prompt changes okay their healthcare plan you have to make it lawyer expert or something like that let me do an ALT G and show you so you can see it over either healthcare plan questions blah blah blah follow-up questions prompt content query prompt template you can make all of these changes here the system from the user prompt Etc okay you can make all the changes over here and then you have your model and the config etc etc temperature you can see Zero Max tokens blah blah blah you can increase all these values guys depending on you know what you're trying to do now I'll say okay if I said tell me more about it let's see what it gives now I said tell me more about it okay let's see what what it responds so I'm expecting that it should give me the response for that mouth case and you can see it gives me that okay it says a dark mouth case and fantastic right this is amazing because I'm not asking about that mouth case anywhere in my prompt or query I say just tell me more about it it automatically understand the semantics behind my question that I'm asking wait understand what what my query is okay from follower question that I have having so it's a follow-up question on my previous question it's level two basically elaborate the previous response about the dark mouth case and you can see it over here now let's open that data guys quickly okay so I'll go inside data and I'll go some case study something like that okay let's see if we can ask some other question and see if it's able to give any response terms of fund without such necessary legal I don't know what to ask Okay purchase and sale of real estate property investigating uh potential proceeds of crime let me see is to purchase property criminal uses elderly parents to learn the longer proceeds of crime let me copy this here okay and I'll just copy this will come over here tell me about criminal uses elderly parent to learn the process of crying tell me what to do when let me just phrase this question in a better way tell me what to do when criminal uses elderly parent to learn the process of crime that's it let's hit enter okay now I'm expecting that it should give me some responses and you can see if you suspect that a criminal is using elderly parents to learn the proceeds of crime here and you can see the citations that I've I got my citations fantastic right so this is this is how it happens you have your Enterprise data on Microsoft Azure you are having your data on blob storage or you know Cosmos DBS or anywhere else guys I have my data locally and I push it from my local folder but you can keep that already on blocks or anywhere you want right but this is good from an accuracy standpoint as well it gives you the response that you are looking for and you can see that it such a such a beautiful answer that is given for this tell me what to do when criminal uses elderly parents to learn the process now suppose for example if I am an aspiring lawyer you know I am just in starting my career in the field of Law and this legal Industries and all right and I want to understand different case studies I want to find out you know insights and some uh best practices and recommendations basically you can build this tool now you can take this extend this for the you know put some guard rails put a payment Gateway and a login authentication something like that and build a SAS application no it completely depends what you want to do with this right now this is on the chat it also has an option called ask the question and let's let's open this citations as well now once I open this citation for this question that tell me what to do when criminal uses elderly parents to learn a process of crime it takes me to that page number seven here you can see it over here right this page and it generates the data out of that the output data that we see this is fantastic if you go to content it gives you again the number of K that we have three K's that we have then thought process what was the thought process behind this information that we have retrieved the augmented generation of that information now this is a chat interface where you can ask different follow-up questions okay and but here's the problem now if I ask who is Bill Gates now if I ask who is Bill Gates let's see what it answers based on the provided Source there is no information about Bill Gates that's the Improvement that you know uh it provides because what we are doing now we are using cognitive search to only look at the index that we have created so it only gives the information from that knowledge base it does not go outside of that context or the knowledge base that we have created and you can see when I ask who is Bill Gates it says based on the provided sources there is no information about Bill Gates this is fantastic right then and this is the problem that we want to solve with the open source and length because they had I've not said that's hallucination because the model also has 7 billion or 30 billion or Falcon 180 billion those are the floating points the data points that we have and they have their own understanding and knowledge right you cannot stop them by putting a prompt over there prompt of course help you know to limit it on some extent that okay the information is not there but I know who Bill Gates is that's what you know open source llms are currently very struggling at some extent but with this it goes to that indexed that we have created and retrieve the information out of it this is fantastic right now if you come over here in the after question here you can only ask a single question you cannot interact with it so if I say okay tell me about us tell me about uh I think that the prompt is wrong you know that we have to make some changes but it's fine so let's see what it does okay so you can see it's only giving me one response and I cannot ask a follow-up question on this now okay project Centurion is a multi-agency interviewer that focuses on using problem oriented policy in pop approaches to reduce alcohol related crime and disorder on the prominent areas in dog loss the capital of Isle of Man blah blah blah it also has citations same thing over here right but it says ask your data so it's a question answering section and this is a chat board where you can ask interact and ask follower question so this is what you know we have built uh like today guys I'll not use the word built we haven't built it Microsoft has done that okay we have this is just a wrapper right around it and this is what we have done we have made some changes on the front end a bit of customization and then just use the GitHub repository over here you can see and I don't take any credit to be honest on this video that you are seeing or the repo what I want to show you is that how you can take it it's an MIT license you can use it for commercial purposes as well don't worry about it of course go ahead and read the code of conduct but you can use it in Enterprises for your clients there is nothing wrong about it okay now how you can just use this with few lines of command I will give this command list of command that you can just take it even if you don't have much technical expertise how you can just build this up guys right how you can develop it set it up put a custom domain also you know how you can put a custom domain if you go to app service you can also set this up okay all those domains and all you can set it up over here but you can find that on you know YouTube videos or anywhere on the internet but this is what I wanted to do you know and I hope you learned something in this video this is more little to law and legal you can build it for healthcare insurance Finance banking etc etc right and if you have any thoughts or feedback please let me know you can also reach out to me through my social media channels please find that on the YouTube channel banner and also about YouTube channel you can find all those information and if you haven't subscribed the channel yet please do subscribe it share the video and Channel with your friends and peer that's all for this video guys thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 13,887
Rating: undefined out of 5
Keywords: falcon, falcon model, siraj raval, enterprise app, enterprise azure chat app, azure openai, azure cognitive search, llm, generative ai, large language models, chatgpt, openai, gpt4, langchain, tech, python, coding, law, legal, meta ai, llama, llama2, llama 2, chatbot, chat bot, how to build a chatbot, how to develop a chatbot, azure open ai models, rag
Id: wmfAJWwyaQA
Channel Id: undefined
Length: 38min 1sec (2281 seconds)
Published: Thu Sep 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.