End To End Multi Language Invoice Extractor Project Using Google Gemini Pro Free LLM Model

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all my name is Kish naak and welcome to my YouTube channel so guys yet another amazing end to endend project for you and in this particular project we are probably going to create a multilanguage invoice extractor and we are going to use Gemini Pro API for this Gemini pro has been quite amazing you can definitely do a lot many things I have prepared more 10 to 15 different kind of projects that are related to Real World Industries and trust me all the specific projects are performing exceptionally well with respect to accuracy so over here we are going to focus on creating a multi- language invoice extractor app we'll be using gini Pro we will be writing all the code step by step so please make sure you practice along with me and once we practice things right one then you get multiple ideas like what different kind of projects we can basically do okay so let's go ahead and first of all let me show you the agenda what all things we are basically going to focus on so here is the entire agenda so in this agenda what we are going to focus on first of all I will go ahead and show you the multi language in wash extractor app demo okay how the demo looks like later on we will start the process of creating the project by creating the environment first of all then we will go ahead with the requirement. txt what all libraries we are specifically required and then we will start writing our code this will be an end to end project code step by step we'll try to build this app again it will take some time let's say this project will probably take somewhere around 25 to 30 minutes and then in fifth point we'll also discuss about what more additional improvements you can specifically do so that you can also try it from your side and as usual guys I'm actually keeping Target likes for every video so let's target uh 1,000 likes for this specific video because all these videos will be super beneficial for you in the companies so let me first of all complete the first one that is the demo okay so here you can probably see this is my entire app okay and here I have actually uploaded one of the uh invoice okay so this is the GST invoice and it is completely in Hindi okay the best thing is that if I ask any question related to this using gini Pro so here I have asked what is the address in the invoice so address basically means over here 1 2 3 uh SBC building DF state so this is just a common invoice I've taken from the internet itself so here you can probably see we are able to get the entire response so this is quite amazing not only this I've asked for different different questions what is the date let's say if I go ahead and say what is the date what what is the date in the invoice and here you can see date basically means the knock so that usually this Google gini pro is able to understand those things okay so I think we will be getting the response so let me just go ahead and see it so it's running uh so here you can probably see 12 27 21 so all the information in this specific invoice you are able to extract just by putting a prompt over here now the best thing about this particular particular project is that it is very difficult to automate it because I'll tell you uh we have tried the specific project with the help of tesra OCR and all right just imagine that Google Germany is able to perform exceptionally when well when compared to all those kind of tools okay so this was the demo now we will go ahead and probably develop this project completely end to end and we'll start from completely from scratch itself so uh let me go ahead and let me start this specific project so guys here is one of the project that I've started in my VSS code itself so first of all just go to the terminal so I will show you all the steps what you should basically do as I suggested the first step is basically to create my virtual environment so for creating the virtual environment I've already created it so that it does not take much time because for creating the virtual environment also it takes some time so in order to create it just go ahead and write cond create minus P okay V andv your environment name okay and then you can also give python version and remember to give python version greater than 3.9 in this case because Gemini Pro is suitable for python version greater than 3.9 so here I'm going to basically use 3.10 and then you just give- y so that it does not ask you for any permission while doing the installation as soon as you probably press enter so this kind of V andv environment will get created in the same project folder okay so I'm not going to repeat this thing and probably execute it because I've already done this okay so just do it from your site with the help of this specific command the second thing is that I will go into theb file and I'll create an API key which will be available from the Google okay so Google API key this is basically for gini pro if you don't know gini pro gini pro is again an amazing model that is provided by Google which actually provides you in a free way so you can actually hit 60 queries per minute okay so here is the API key that I've have got if you want to also create your API key go to this website okay maker suit. google.com/ API key and you can just click on this create API key new project okay so I have already created it so I don't want to create it again okay so this are the first two steps you really require the API key and you require the environment now now after that you just activate like you just write cond activate venv Okay and just activate this specific environment okay once you activate it you will be able to see that you'll be in that same environment location now let's go to the next step in requirement. txt what all libraries we specifically require so here is streamlet then you have Google generative AI then you have python. EnV then you have Lang chain you have P PDF P PDF is basically to load any PDF as such or read any PDF uh then you have chroma DB chroma DB is specifically for Vector embeddings so we will try to also do Vector stores or vector embedding we try to create it so uh these are the basic steps that we specifically require now let's go ahead and start my coding or creating the specific application uh I will start writing the code completely from end so first of all what I will do I will go ahead and load my environment variable so I will say load do from EnV import load uncore do EnV so the reason why I'm doing this is that so that I can upload my all I I can load all my environment Keys okay so if you remember we have also installed python. EnV so python. EnV is basically for all my environment variables now I will go ahead and write load doore EnV so this will what it'll do it will take it'll load all the environment variables all the environment variables fromb file okay so this is what it is specifically going to do we'll do step by step now you'll be able to understand and please make sure that you write the code along with me so that you'll be able to understand it now the next thing is that I will go ahead and use streamlet as streamlet is a better framework to quickly you know create an app and definitely I use chat GPT for taking out the code and all right so that is the reason you're able to see that I'm able to upload daily videos see not the entire project is created by chat GPT but how to use streamlet how to create this uh website kind of app you know all those things I can usually use streamlet uh use chat GPT so then uh so this is my streamlet the next thing is that I will go ahead and import OS um OS will basically be useful by for picking up the environment variable assigning the environment variable from somewhere else okay now this is done uh the next thing that I want is from pil I also import image okay I don't know whether I'll be using this but let's see the next thing I will also go ahead and import from Google dot generative AI as gen okay so I'm also going to import this specific because gen AI will be my entire libraries that I'm going to access it okay so done this is done these are some of the basic uh things that we are specifically going to load it okay now usually when we start our any application using gini API so what we need to also do is that we need to configure uh the API key so here I'm going to write gen. configure API uncore key is equal to OS doget OS do get EnV okay and here I'm going to get the my environment variable that is nothing but Google API key so here whatever environment variable is basically present over here we are going to take this okay so configure okay so gen a. configure uh the API key with this okay now it's time that we will create our function to load gmin gini provision since the invoice instructor is on top of an image so we have to use this gin provision okay so function to load or first of all I load the model so I say model do gen dot generative generative model so I'm going to spef speically use gen. generative model and here I'm going to basically give my model name so it will be gin Pro Vision okay so once I do this that basically means we are going to use this specific model now I will go ahead and write definition get giny response input image prompt okay so let me go ahead and write model equal to gen AI sorry I'll not initialize the model again so what I will do I will go ahead and write see the thing is that here I'm going to give three parameters one is this specific input input basically means uh uh whatever input I really want okay with respect to all the images that I'm giving and I'll also talk about this specific thing okay the three important information this input is basically I'm telling what what I want the assistant to do okay if I say hey you need to act as an invoice extractor you need to act like an expertise who is very good at uh taking out details from the invoice right so that basically becomes my input okay this prompt is what message I want like what is the address I actually written this is basically the image that we are going to pass okay so all those information this three information what we can basically do I write response Dot model dot generate content and here we are going to use this three information first of all is input then you have image of zero the second one and then you have the prompt okay so this three information basically when you're generating this content you can give this three information in this same way okay input image of zero and prompt okay so in mini pro they take they take all the parameters in this in the form of a list okay and remember the first parameter is basically the kind of prompt that you're giving where your model needs to behave in that specific way so I will talk more about it as we go ahead and finally we are going to just return the response. text so this easy it is with the help of gini pro okay and that is the reason I'm loving it when I probably compare with open Ai and the best thing is that I can also use uh this along with my Lang chain you can probably use it with different different things I will show you I've also created a project where you can chat with multiple documents okay so that will also be we'll be using Lang chain and all so this is the function that is specifically done now understand one thing guys um we will do our streamlet setup okay so streamlet setup what I will do so here I will go ahead and copy and paste like this so here I'm using st. set page config let's say that I'll go ahead and say over here multi line language invoice extractor okay multi language invoice extractor now in this multi language invoice extractor I will probably also give this information let's say okay now here I've given one input box this one input box is my input prompt okay and this is basically my upload file file uploader so I'm saying that choose an image the image can be jpg jpg PNG this is the image of of the invoice okay so let me go ahead and write this message of the invoice so once I specifically upload this specific file then we can do anything that we want okay now the next thing is that I will create an image variable I'll keep it blank initially and let me go ahead and write if uploaded file is not none so that basically means when I've when when I have uploaded some file then I will go ahead and write image and again I will be using image. open and we will upload uh open this uploaded file okay now once we upload this so what we can also basically do is that we can uh write some kind of image and all I want to display the image also as soon as I upload it I probably want to display it so I can just try use this st. image functionality and I'll say caption uploaded image and we can use this properties Now understand that this this code right I have directly searched from uh uh chat GPT okay and uh I've just written okay just create me an image where to upload files and all right uh so very simple it is not like I am learning from somewhere I'm even not seeing the documentation chat jpt actually provides you everything or Google Power provides everything that is basically required now this is my uploaded file now I will also go ahead and create my submit button so here I will go ahead and write st. button and I will talk about about it saying that tell me about the image okay tell me about the invoice something so this is my message that I'm actually going to give in my submit button and finally I have to also design my input prompt now see this input that I'm actually going to give it right so this basically becomes my input prompt I what how I want the jinii pro llm model to behave so here I will go ahead and create my in inut prompt just see this okay this is important and this will also give you an idea like how improv prompt works okay how we can actually work with any kind of improve promp I will say you are an expert okay in understanding invoices okay um we will upload we will upload a image image as invoices okay I'm just writing some messages and you will have to answer any questions based on the uploaded invoice image so this is just a basic prompt that I'm specifically using over here I'm telling this to do something related to this okay so this is my input prompt and all I've written it over here then let me go ahead and write if submit button is clicked if submit button is clicked so this is my default input prompt now what I will do is that I'll also create my prompt template itself and probably go ahead right if submit button okay is clicked now what will happen if I click the submit button so first of all I will go ahead and write if submit first I need to get my image data okay now understand over here as soon as we load the image but still we have to do some kind of image processing and convert those images into some bytes okay so for that again how do we do it so I will write definition input uncore image setup so for this I have just written in chat GP saying that and here will be my uploaded file okay uh uploaded file uh okay uploaded file okay see now you may be thinking what I'm doing in this function in this function what we are writing is that it will take that uploaded file it will convert that into bytes and it will REM remain it will give all the image format all the image information in the bytes now I did not write this code I just went and searched in the chat GPT and this is the code that I specifically got okay and this code is quite amazing same way nothing I did not do anything see here if the uploaded file is not donear so we first of all we are getting all the values then the image part what all things we basically required the type the data and bytes data right and then we are returning the image Parts in this two format okay the M type and data and if the file is not uploaded this is that so this is completely I got it from CH GPT I'm not bragging anything about myself and all um again charit is already trained in internet data so it's just like writing an input prompt and I'm saying that that okay I require this two specific information please give me that information now in this image data what we will basically do is that we will get all the image information so here I will go and write input image setup so let me do one thing input image details okay so I will call this give a good name okay and here I will give my uploaded file okay uploaded file so uploaded file whatever uploaded file I'm specifically getting I'm going to give that specific thing over here now by this I will be getting my image data now image data once I get it okay then I will go ahead and write my response and go ahead and call my get Gemini response so here I'm going to basically write my input input prompt first parameter is this second parameter that I'm going to give is my image data as usual remember all the information will be coming in the form of list okay so image _ data and this will basically be my input input and this input is nothing but whatever information I'm putting it over here all this information will go over here and you have all this information in this format right now after this I will get the response and now I will go ahead and write St do subheader I'm giving some kind of subheader and I will write the response is std. WR and I will just display the response okay whatever response we are specifically getting okay that response we going to get over here so all this information is done and this is really good now it's time that we can just run the code so guys now let's go ahead and run this uh we have completed almost everything that we really want to do now is the most amazing thing whether the project will run or not okay so if the project runs it is absolutely good because at the first time we have written the code and everything should work fine so here I'm going to write streamlit run app.py and let's go and execute this so it has opened let's see so I've have downloaded two invoices let's see what all things will be there first we will try with the normal invoice okay so here you can see all the information m who is this invoice build to okay so I'm going to put this information over here and I'm going to click it tell me about the image tell me about the invoice on all the information will be provided over here this is good so your client uh so all the information is over here your client this this this this even the number has been extracted which is quite amazing it is really a daunting process guys okay uh let's see I will just take a small one what is the deposit requested okay so I will just go ahead and write it what is the deposit requested this is good this is giving an amazing response so I will go ahead and click tell me about the invoice over here and here you go let's see what it is going to get so tell me about the deposit requested it is just saying your company uh who is the deposit okay my prompt is wrong tell me so it is not able to understand the context obviously if you don't give the proper uh tell me how much was the deposit requested I'll give a good respon okay so I will go ahead and now click on it should give uh the proper answer I think now it is somewhere 169 95 it'll pick up that exact info and provide you all those information this is good so let's see 169.99 this is good guys this is trust me this is very very close uh what was the Consulting fees so let me go ahead and write what was the Consulting fees now I think it should get confused with those two values what was the amount of the Consulting Fe I think it should be able to give it let's see then we'll try with some other language invoice like Hindi and all okay let's see let's see let's see I think it should work fine but this is a good thing guys you can automate this entire process just imagine it is such a daunting process for with respect to invoice just see that whether you get an invoice and all so the amount of the Consulting fees was $550 okay it is taking this information um okay there are some minor mistake but other than that I think let's see what is the total what was the total discount let's see discount is somewhere around 179.1 but if you give proper prompt I think you'll be able to get a good response okay 179.1 4 okay so let's go ahead and try some other invoice this also looks good and uh let me go ahead and write this so here I will go ahead and write what is the HSN of Leno 51251 Lenovo in Hindi it is written in Lenovo so what is the HSN number of of of Lenovo okay I'm writing it in English English 51251 5125 5125 I let's see whether it'll be able to give or not see this small information it will be able to take now over here the date is denak okay in Hindi we basically say it as dinak so here you can see 84713 01 0 amazing amazing just amazing okay so what is the date in the invoice and you can try anything you can try different different invoices if you want I've downloaded the invoices from internet you can also do it okay so yes here let me see whether you're able to get it yes perfect so guys this was it from my side I hope you like this particular video if you like it please make sure that you subscribe the channel and all the information regarding this will be given in the description of this particular video I'll see you in the next video have a great day thank you wonder it all take care bye bye-bye
Info
Channel: Krish Naik
Views: 25,920
Rating: undefined out of 5
Keywords: yt:cc=on, machine learnign tutorials, invoice extractor ll project, google gemini pro tutorials, gemini free vision api, generative ai tutorials, krish naik gen ai tutorials
Id: -ny5_RSMV6k
Channel Id: undefined
Length: 25min 59sec (1559 seconds)
Published: Wed Dec 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.