Chat with your PDF Chatbot: All OPEN SOURCE (Runs on CPU)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in today's video we are going to develop a chat bot powered by generative AI okay so we are going to look at open source Stacks like you know Lang chain which is a framework for generative AI then we'll move to sentence Transformers to create the embeddings out of your documents and then we will look at a vector store or a vector database chroma DB and then we'll use a large language model okay to basically infer those embeddings to retrieve information or to basically generate some responses on your prompts or queries that's what we are going to you know develop in this video guys but you know there are already few videos that I have created you know previously on Korea search uh search your PDF and chat with your data you know but in those videos we were not having this input mechanism on the streamlit interface okay what we used to do in those videos that we basically you know created a file called data ingestion dot Pi where we used to ingest the data in back end okay and we used to save the embeddings you know put in a persist directory for example DB and then we used llm to inference it right but in this video we'll also have an input option on the streamlit app where in a user can upload their PDF and they can chat with it so everything will happen on the streamlit UI okay so you upload a file and the embeddings will be created and then you can also have a chat bot interface within the app so you can chat with it so there you know there were some requests from few of my subscribers all of you okay uh that you know can we create something like this okay that everything happens on the interface like there are applications available currently like chat with PDF chat your PDF ask PDF you know there are a lot of applications currently available if you go on GitHub you search you'll find a hell lot of applications like this so how can we also build you know our application you know that that also runs on a CPU machine which is very easy to deploy so definitely this will be a two video series maybe where in the in this video I will create the chat bot end to end and in the next video I will deploy this either on AWS or on Azure maybe through a Docker container or something like that we will definitely do that guys in the next video so let's get started in this video so if you see for this video there are there are prerequisite okay I'll request you that you go through you know this video uh this video here that you see search your PDF where we have you know search your PDF app using Lang chain chroma DB and open source llm and you can also look at create private GPT for your data around some CPU so these two videos are the prerequisite you know for this video it will be helpful if you go through this you know two videos where I have covered you know in very in too much of details okay in these two videos you can see this is more than an R Video search your PDF app so you have to invest time to basically grasp the concept okay so if you if you want to watch this video please watch it and then you can watch the video that I'm currently creating okay so what we will do we'll use the same folder so if you come on desktop okay I have this local llm search okay I will use the same directory and I will create a let me first open in a terminal I'm just going to open in a terminal and I'm just gonna do activate the Deep learning environment so deep learning activate deep learning and this condo activate deep learning is done and now what I'm going to do here is I'm just going to open this in vs code once I open this in vs code you will see there are few files okay so there is ingest dot Pi will use this function of course we will not write you know a lot of code in this video because we already have this code snippet you know in this previous two videos we'll just take we'll write a few code snippet for a streamlined of course so what I'm going to do this do in this video guys that we have this DB folder so if you see this DB folder this is nothing but the purchase directory so if you come over here you can see this purchase directory has been defined so whatever embeddings that we create embeddings file through our duck DB plus Parkade okay so the data type of the embeddings right do we have right so it gets stored in this DB folder it will basically have an index and couple of parquet files okay so index will contain the metadata the UI uuid unique identifier and the market will contain the embeddings stored into it okay so this purchase directory equals to DB that's why basically we are going to store here currently you see there's nothing inside this DB because we are going to create this embeddings through a stimulate interface everything okay now we'll use this file and will also use this chatbot Pie Guys okay this file that I've created previously okay we are using extremely chat so extremely chat is a python library that helps you you know with the chatbot interface in stream lead okay so you know somebody from the community has created it you know kudos to them credit goes to them for creating this Library and we are going to use lamini T5 738 million model OKAY lamini model OKAY which is a language model that not that large but it's really works well for multiple use cases that's where we are going to look in this video guys laminate T5 738 and this one I'm gonna use LinkedIn and sentence Transformers Etc right so this is the file that we're gonna use okay so let's do one thing let's create a file first okay so what I'm gonna do here I'm going to create a file in this uh directory that's called chatbot you know something like chatbot extremely dot file so let's call chatbot app.com so this is the uh file that I've created now what I will do I'll first import streamlit so let's do that input streamlit as HD that's what I'm gonna do here okay and then I'm gonna write maybe you will not write HD dot title so what I'm gonna do is here that okay we will first focus on the interface that how we can create this interface so let's do that guys so what I'm going to do here is I'm gonna focus more on the markdown okay so let's first create a div function so diff Main okay div Main and let me just if name and this main now we'll start writing our code in this main function for the streamlit interface okay so I'm going to use a markdown so I'm just going to use HD dot markdown not markdown file if you're at markdown and in this markdown what I'm going to do I'm gonna Define an H1 okay and in H1 we'll have style and it's not a background but anyway let's hit enter so I'll have a text align so let me just write a text align text align Center is okay so we'll have our title in Center okay for this chatbot okay so text align Center and then we'll have uh then we'll have a color for example what should be the color of the text that we are going to write the title of this app so color for example let's keep it you know blue okay and this is okay blue and then we'll have something like okay let's okay let's do something like you know this H1 is done now what I'm gonna do here is I'm gonna write a title okay so basically that will be chat chat with your PDF okay chat with your PDF or something like this okay chat with your PDF and now this okay chat with your PDF now this looks good okay enter blue and this color blue not this this should be this and now it makes sense so this is the first markdown so let's do one thing let's first run this and see so what I'm going to do here I'm just gonna come here and say stream lead run chat bot underscore app dot pi so I'm just going to run this and when I open this it says you can see it says H1 stylus where we have to pass this uh unsafe allow HTML true okay so let me just do that okay so unsafe allow HTML true and I want once I do this let me also do an ALT G so we get the next line you can see it and now I have to do a reason so when I do a rerun you can see chat with your PDF let me just change the you can also do it from config.2 ml but I'm just gonna do it from here let's keep it a light back now you can see this is our heading okay chat with your PDF what I also need I need couple of emojis so parrot emojis so I'm just gonna do parrot image emojis I'm gonna go inside this and I will take the Emojis from here guys okay the parrot emoji that we have to take anyway let it load uh by then and then we write the next line okay so this is okay I'm just gonna copy this complete thingy over here so I'm just gonna paste it over here mark down texture line Center and now here color so let's keep this color as Gray the next line and I'm gonna write not chat with your PDF okay I'll just write you know build by or something okay so let me just do built by excuse me built by and built by for example we have an href and in this href you can pass our GitHub repository guys okay so build by href let's give it the GitHub Recorder Three let me come over here we'll look at AI anytime GitHub repository areas our GitHub repository okay so I'm gonna come back here I'm gonna paste inside this okay now this is what AI anytime you have repository now what I can do here you know uh after AI anytime I can just write inside this AI anytime excuse me AI anytime which I'll also use an emoji here okay but anyway let's first close this okay so we have closed this and let's not have this as H1 will be too big for this so let's do an H3 for this and this also becomes an H3 and on safe allow this looks nice so let me just come back over here and do a rerun and see what we are getting you can see built by you know AI anytime and we'll use the one more Emoji so let me just see heart heart emoji or something like this okay and we'll take this Emoji I don't know why it's not loading okay maybe we'll take it from this link or something so parrot has been loaded so let me just do we are using this for lunch and guys because if there is no language very difficult to create this kind of application because Lansing does all the heavy lifting for genetvi application okay so let me just do a rerun and you can see here we have a new Emoji out there okay it looks good okay now what I'm gonna also do is you know something like document Emoji or something okay so document emoji and I'll take this page facing up emoji meanwhile we have this I don't know if I can how can I copy this okay let me see if I if it has been copied anytime with okay this has been copied now if you come over here you'll see okay with love AI anytime is love now this looks good okay so now we have this title and you know sub header of it okay now your call can also take this uh page facing up emoji what happened where the page facing up emoji let me go back here it is copy so I'm just copied it now let's paste it over here save so once we save that if you come back on the streamlit app now you'll see that we are done with this guy now let's do one more thing we'll also add one more mark down here and what I'm gonna do here in this case HD dot markdown inside this markdown what we can have is you know and let me just copy this entire thingy over here and Ctrl C TRL V let's keep this an H2 style okay and it's to style and this looks nice okay let's let's remove this out should we keep a sub header or should we keep a markdown let's let's write a server header and see okay if I just write uh upload your PDF okay below or something like this okay and if I come back and rerun okay so no so let's keep markdown which is better so mark down and S3 this looks nice but you do not have this entire hdf thing here if you don't need this okay so power is not required just write you know upload uh your PDF below or something okay upload your PDF below okay so this is what we're gonna write here let's give this a color as red and let's make this S3 looks nice or let's make this H2 and see how big that header is and I'm just gonna come back here and rerun and now you can see this looks nice okay we'll have one more Emoji okay and we need uh thumbs thumbs down emoji not up so thumbs down emoji and I'll just take this Emoji so let me close on this thing here not thumbs down sorry okay the finger down emoji or something like this the white down pointing okay backhand index Point okay there's the call backhand so let's copy this and I'm just going to come back over here and paste it over here so let's paste this you know this looks okay let me just see this looks nice okay now let's come back here on rerun now we are done with our headings answer headings guys right so this is the title of this chatbot okay now what we will do we'll divide this in three or maybe two columns this layout that you see so let's divide that in two column guys so what I'm going to do here is this now okay so let's write the code for stimulate so I'm just gonna have an uploaded file excuse me uploaded underscore file and in this if you got file uploader this looks nice HD dot file if uploaded file is none not none is also right okay if uploaded file is not none okay we can uh now let's chat with we don't need it okay so if uploaded file is not none okay we'll have we'll use some file details okay but let's first do a pass here okay and see so if you come back over here on chat you will see that this is an uploaded file here okay if uploaded file is not enough okay now here you upload your PDF file okay that PDF file that we have to basically create the chatbot for so let's now come back and we'll start writing our code here so if uploaded file is not done this is okay let's give let's write some file details guys so what I'm going to do here is file details I'll create a dictionary and that I can print through a Json you know in the Json format on the stream lead what are the file name you know how big the file is in all those details so file details what I'm gonna do here so in file details inside is gonna write so name type and size this looks nice so size uploaded file dot size you know I don't know if you can upload a file.size will work but let's see if this doesn't work we'll write a function for you so let's come back here and let's do a rerun probably okay let's do a browse file and what I'm gonna do I'm just gonna go inside this docs and I will use this file fast facts what is climate change okay this is the file that I'm gonna use but we are not currently printing it so this is just a dictionary so let's do one thing so we have file uh file details now what we will do will save this file we have to save this file okay that so we can take it you know for embeddings and all other things right so let's write the code for that so what I'm going to do here is file path and in file path the first thing that we have to do we have to you can see this docs folder right in docs folder we'll save this file so let's define that doc so in docs and inside this docs I'm just gonna write uploaded file dot name this is okay and I will use open so with open this is okay file path As Time file select Define another temporary file so temp file and inside this we'll just do a temp file dot write dot get buffer or we can also dot read okay so let's do dot read uploaded file dot read here okay and this makes sense so 10 file dot write and uploaded file.rit now what we are doing here we are saving this okay so now we will save the file in this doc directory now what we're gonna do here we're gonna Define divide that layout in two columns as we said so column one column two and now SD dot columns and even in this columns you can basically give more weight is so I I want to divide this in one is to 2 ratio okay so I'll just give one is to 2 ratio so the First Column will have for example one so the second column will be twice of it okay in the width and that right the width distribution so this is what I'm not dividing this looks nice PDF details let's see if this works guys what I'm going to do here I'm just going to come and run and we will also write the code for with column two so I'm just going to do with column two and with column two what I'm going to write PDF preview no PDF preview will not come here okay so let me just do a path by now we can do a PDF preview in the same column let's see if you run this first and if you get in here you can see the PDF details this is the PDF details that we are getting okay here you can see it over here okay fast fact what is you know climate change and all those things so this is this is basically at least working okay or can we also do one thing can we put this layout as wide so we get we slice this wide layout okay so let's see that okay if we can use that wide layout so now what I will do I'll come over here on this streamlit app and I'll just put HT dot set pageconfig you can see this page config right let's try it out and see how it looks okay when we put the layout as wide so layout I'm just gonna do layout equals wide and if I make that change now you can see right this now now it looks better so we what we will do you have this PDF details okay and this is PDF details let me make this gray for example you know not red is not looking good here okay so let's put this gray so gray why I'm doing gray because gray will also look nice when you put it in a dark theme okay now you can see this is a PDF details okay like this is what you know name fast fact what is you have uploaded okay your data that we have uploaded the app with the type which is a PDF and the size of it okay that's perfectly fine okay this is what I wanted to do in this first column now in the same column what I also want to do is now I want to basically show the file preview okay so for that we will utilize one of our function let me go to my GitHub repository where I have previously used a function to basically preview a PDF file in a streamlit application so let me do that so what I'm going to do here is you know I'm going to come over here in the chat with some other you know it's in my multiple you know uh basically a python file that I have created stimulate application I have used it so let me just do let me just take that from where kind of display PDF here we go okay so this display PDF so you're gonna just paste it over after this so I'm just pasting it and for that what we are doing is you can see we are using a PDF display we have a function called display PDF that takes file as an input parameter okay and then we are first using base64 okay to basically and then we are decoding two in a utf-8 format and we are embedding that PDF in an HTML okay through an iframe so we have an iframe that will help us embed this PDF in an HTML file because see stream it also again power like estimated supports markdown okay so whatever you can write in a markdown format it should support okay so this function basically what it does it you just iframe to embed the PDF in that HTML and then we're using markdown 2 so that PDF on our stimulate UI so we'll use that PDF so let me first do an import here okay so let me just let me just copy all the Imports because all these Imports will be required okay for this video as well so I'll just copy this I come back over here and I'll just paste you know this is what I have pasted now let's use this here in the column one after file preview sorry after uh PDF details what we can do we can do a file preview okay we can preview this file there guys okay so let's write that okay so I'm gonna do here is HD dot mark now let me just copy it okay we already have here okay so I just want to copy this entire thing let me just copy markdown text Center PDF details and what I'm going to write here is PDF review so you can preview your PDF okay and here we're gonna write uh you know right here is just use this PDF view so let's use PDF View and this PDF view will have a display PDF display PDF and then file path okay or basically the file path okay the file path that we have over here or we we can't Also let's don't also use PDF video to see if we can just you will display PDF okay what we are getting okay if this function works so we should put it in a variable so if I come back over here on chat bot chat with your PDF let's see here we go we have our PDF preview guys so the PDF that we upload okay we should know that which PDF we have uploaded because see with this application with this chatbot what we are doing we are giving the control to the user on the data ingestion part at least they can upload the PDF that they want so this will also preview it okay currently it supports one PDF but maybe if you are watching this video you want to extend this project this code will be available on GitHub repositories you go ahead and take this code and try with you know your PDFs your algorithms your mechanisms okay if you can come up with your techniques that helps you you know pass multiple PDFs and create the embeddings please let me know please drop uh that uh progress or you know findings in this comment box okay so this is a PDF preview so our column one is done guys okay we are we are done with the column one now what we will do we'll move to column two now okay so let's move to column two and column two will start writing with the embeddings and also first we'll create the embeddings we'll have a loader once the embeddings are created we'll have a chat interface so with column two what I'm going to do here is let's have a spinner first so with sd.spinner and in this with HD dot spinner what I'm going to do here I'll just write embeddings are in you know something like embeddings are in process or something okay embeddings are in process not that big okay Emirates are in process okay with HD dot spinner with H2O spinner embeddings are in process okay now we'll have our variable called ingested data so this injector data will write a function okay uh we'll write a function called Data underscore injection the same code that we will take data underscore ingestion and then we'll just basically do a so I'll just write embeddings are created successfully something like this and then I will just you know do a markdown okay like you chat here or something so let me just remove embedding preview and now use chat here and then we'll write the rest of the code after we first focus on embedding here so now let's create this function called Data ingestion okay so this functional gonna create data ingestion so let's write the function for that guys so what I'm going to do here I'm not going to write anything new okay I'll just take this function or this entire thingy over here okay I'll just come back on this chatbot app and after this I'm just gonna paste it over here so what I'm gonna do here I'm just gonna remove this Main and I'm gonna call this uh data ingestion so data underscore ingestion now this becomes our function and nothing will be changed here okay in this but first we have to Define this checkpoint so let me go back to uh one of this code chatbot Pi the previous video we have to look at this checkpoint and also let me just copy this let me just set the checkpoint thingy here so this is our model checkpoint and tokenizer right this is how we load the model okay we are loading in float float32 data type device map Auto automatically offload the weights on CPUs and if you have a GPU as well okay so purchase directory DB data ingestion this function is okay we don't have to do anything in this function now this will help us create the embeddings no embeddings are in process ingested data detonation currently if you see there is nothing inside this DB inside DB there is nothing now let's run this and see if you are able to do it guys okay so let's come back on uh okay this is for data ingestion now we have to write the function for embeddings okay that embedding function so let's do that so what I'm going to do here okay I'm just gonna use uh other functions here so let me just take it from GitHub repository so you can also have a look at it this is what I'm going to do here um okay let's first do a cash resource because I don't want to create the embeddings every time when we when we are in that runtime okay so let's cache the resource it's a decorator available in streamlit that we can use so sd.cash resource they also have cache data so when you're working with csvs and numpies and all you can also use cache data okay so now this is okay now just come over here and just copy this entire thing okay because this is what we need and I'll explain the code step by step okay and we can take display conversation let's take the entire thing and I will remove couple of functions which are not required let me just paste it over here we don't need this display PDF because we already have taken that function so let me just remove that and this function is also not required this is not required in this video and I will now explain the function one by one guys okay now what we are doing in this function llm pipeline we have a function called llm pipeline that look that create a pipeline uses a Transformers pipeline you can see this pipeline class of Transformers we are using a task that which task we are looking at so we are looking at text to text generation we are defining model tokenizer maximum length what should be the maximum length of the new token with the generation that we are going to do to generate the response so we are saying okay keep the max length at 256 you you can increase it also okay there's no problem then we are setting the temperature for creativity and Randomness at 0.3 and then we have top p and then we are using the hugging phase pipeline to pass that pipeline okay and that's returning that local llm that's what we are doing in this case and with this function qllm now once we have the embeddings now what we're going to do here in this case is that basically we have an llm pipeline okay that we are using it and then we have an embedding from sentence Transformers and then we are loading it through chroma okay from persist directory giving the embedding function from sentence Transformer and then we are using a retriever chain retrieval keyword from Lang chain and we are creating a chain with the language model and embeddings and retriever that's what we are doing here okay in this and this is just basically to uh basically to format the output you can see process answer and this is very interesting now this display conversation if you have seen my previous video I have explained those things in very detail okay like display conversation is for stimulate chat Library we have to look at the previous response and you know basically uh insert from that right you can ask follow-up question so it should answer it okay that's what we are doing okay with this now let's see now this is okay uh with column two is okay so with column two here now once we have the embeddings created we have to basically give the chat interface so I'll also take the chat interface from that code itself the previous code okay so let's take this from here user input let me just come back paste it over here you know user input so what we are doing you can see it's very much self-explanatory it's also available on extremely discussion it's not my code you know as I said right it's it's the library has been created by someone in community because we're just taking the code it's just a wrapper okay so stimulate chat so you can see initialize session state for generated responses and past messages the session that you are in when you are interacting with the board okay and then it search for the database for a response based on user input and then up to keep on updating the session the chat history that's what it does now this is okay now let's do one thing let's go back to our uh streamlit interface because I think our we are okay with our code so let me just uh see we have we are okay with it so let me just come back here on stimulate chatbot let's do a rerun now you can see it says embeddings are created successfully okay you just saw right because if we have one file a very small file you can if we have a bigger file it will take more time to create the embeddings and I can see we have a load if you have a spinner the loader and embeddings are created successfully so we have successfully created an embedding using an open source uh Vector database or vector store that runs native chroma DV right you can see embedding secretary so let's see that so if you come here in the DB now you can see the files the embedding that has been created the parquet the index the metadata the uid you can see everything over here so it means that we have successfully created the embeddings and we also have a chat interface and you can see it over here how how beautiful this looks right so you see this in column one we have a PDF details and PDF preview we have a file uploader option now now you can upload the file and they can just create a chat but you know interact with it okay and we're also going to deploy these guys okay I will deploy this through a public Docker repository okay so I will I'll post this in Docker Hub okay official Docker repositories you can also use that if you want just download that Docker image from there and you can also deploy it in your infrastructure but I'm also going to deploy this and I will also give the link for the public users you can use that in the next video okay now if you see your interface it's a chat here I can chat with it so let me ask what is climate change now you can see we got our response guys okay climate change in natural process where temperature rainfall wind and other elements vary over decades or more okay so we got our response this is a chat but you can see here our response and these two are required to have the session and previous responses that you have to inside the list right now if you see this what is climate change so climate change is a natural process right so we got our response okay we have a Max token as 256 if you increase that you will get a linear response okay so this is what we are doing you can also ask follow-up question but if you have a very limited infrastructure a compute power it might crash okay so you have to rerun the program again and interact with it so it depends how much of compute power you have okay what kind of machine you have let's see this interface you know in when we're changing the settings make this a dark theme or something so now if you see I make a dark theme even looks more beautiful right in Dark theme okay you can see that let me just go back to uh in the in the light theme here okay so let me just go to light thing and let's ask a follow-up question if I get an error I will I'll probably not rerun but you know you can play with it right so let me ask a question how does it affects the people so see I'm not mentioning anything like climate change into this prompt or this query it will automatically look at your previous responses and see right so let me just do that how does it affect the people when I run this you know it will probably either crash or either it will give you the response okay let's basically give you a meta offload a meta of offload error okay so let's see now it didn't crash because I have a decent machine okay but for you it might crash okay this is climate change affects people in various automatically looking at the semantics it can capture the semantics behind it and that's the beauty of lamini model guys to be honest laminate does not hallucinate okay I've used it it does not hallucinate at all okay so when I say at it all it might be little you know I'm trying to brag in brag it on on lamini but you can control the hallucination with laminate language model because you say small model and the smaller model generalizes very well okay on your data or the document that you have okay that's how it works currently right so you see the response that I got climate change affects people in various ways and this is the chatbot that we have created guys now you can take this project you can take this at your workplace as your hobby project or your you know uh Pet Project or you can take it in your college finally a project whatever you want to do with this you can do with it okay you can extend this and I'll be more than happy if you know if you connect if you're extending this further okay so this is what I wanted to create you know some of you guys requested that okay have an input upload option where you know the end user will upload the document you create the embeddings and then you basically you know chat here so that's what we are doing here guys I hope you liked it the code will be available on the GitHub repository okay and you can also watch my previous videos related to it okay you can see it over here the previous videos I have I have a lot of videos on Last language models so go on playlist you will see on large language model I have more than uh I have around more than 25 videos guys on large language models okay and now we'll deploy this we'll deploy this in the next video guys we'll deploy definitely deploy it through a Docker container or something like that we will see that are we looking at the container or can we deploy it in through other uh other mechanism as well so I will do that in the next video so you know I hope you uh like the video okay and uh if you have any thoughts or feedbacks please let me know please drop your feedback and thoughts in the comment box and you can also reach out to me through you know our social media channels if you want to join our community please join the WhatsApp group through uh the Youtube social media the YouTube Banner that we have you will find the YouTube link there okay and if you haven't uh subscribed the channel yet do subscribe the channel guys and please share this video and channel to your friends and peer thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 19,008
Rating: undefined out of 5
Keywords: Chat with your PDF Chatbot: All OPEN SOURCE (Runs on CPU), langchain, ai, chatbot, llm, python
Id: N7wk1E1I0Qo
Channel Id: undefined
Length: 35min 46sec (2146 seconds)
Published: Tue Jul 04 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.