Experimenting with Website Q&A with Embeddings | AI that answers question from your Website | OpenAI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so let's see currently I think for training the GPT 3 Model basically open API models I think Azure have already started with their Azure open AI service and till now like only few of them were able to get access to this service for you so once this service is in the market then we can like train our custom model to work with for example FAQs for any question answer pairs that we want to train with or we can train with our website data so all those can be trained once this is like available to everyone in the market so but for now we can try to focus on this open Ai and they do have one API let's see the dogs so they do have something called as I think embeddings or maybe files fine tunes yeah I think fine tunes is the one to create a job that fine-tunes a specified model from a given data set response include details of in queue job including job status and name okay let's see what does this fine tuning mean if my tuning lets you get more out of the model available through API by providing higher quality results from type then prompt design able to train more examples token savings low latency okay fine tuning is currently available on this models you have prepared training data I think this is the one so we have to prepare training data in such a format and then pass it along yeah let's try this there was one documentation where tutorial basically classification case study conditional Advanced usage station I think this is the one question answering let's try with the question answering part and yeah who is expert in data science field here I think Ajay and sale you both are from the same field yes maybe if you can understand this piece of code that will be great so here I think it's just a notebook where uh get a Wikipedia page given title get the title which are entitled to okay okay let's do one thing today you guys will be working and let me give you scenario here I saw other day where where we can train this gpt3 model based on the website data so if we can like retrieve all the URLs pass through that read all the HTML and then train our GT gpt3 model and then we can just prompt based on that uh there was one example let's see that maybe fine tunes there were seven tutorial here somewhere um click Start tutorials yeah this is the one website qnn okay yeah let's see okay so initially I think they have created a environment so before that so here they say that to start with the code clone the full code for this tutorial and alternatively follow along and copy section into a Jupiter notebook and run the course step by step or just read along a good way to avoid any shoes is to set up new environment and install the required packages by running commands okay okay let's follow along yeah create this command basically create the environment first go to a directory and then create a environment like create a new directory you can use the command line also like type mkdir and the folder name let's say open AI mkdir open AI foreign like work in separate directory not on desktop it's a good practice you never work on desktop like CD double dot and CD here only create active because again it's it's inside your users only so AI and now CD into that and now you can run that environment creation command server would not open requirements file okay [Music] um where we are activating that I think it is not activated source is not recognized just try to fix that error like I think some pip maybe we have to install something I think a side I think you worked how to activate uh this one so we have that requirements file first okay let's see whether we have something for the requirements the primary focus of this tutorial is so if you prefer you can skip the context how to create a web browser just download the source code but how we will get that requirement.txt file okay it's there on that link um there is a full code for this tutorial on GitHub click on that link and copy this content requirement requirement.txt you can just guide him because I'm not aware about where we have to store this assign the directory whichever which we are working okay yeah create a text file in that open AI and you can paste it there it's Sunday users yeah and save as requirements.txt thank you requirement spelling check that e foreign from there because the type is already txt it should not be part of the name the rename and remove txt dot txt yeah okay so this is fine how do we activate the environment as well because he was getting one error right yes just open the command prompt and come on type the command dot forward slash foreign script slash activate as capital right [Music] sorry it was backwards not forward slash manually go to that path and see if the file names are correct and folder names are correct foreign [Music] scripts capital s and it is small a not capital A go back to command line and run that still it says dot is not try with other slash ER active wait spelling is it scripts or script just check this spelling okay descripts foreign [Music] check the logs you're using preparation try upgrading the paper and then try again maybe we have to manually install that HTML parcel first let's uh create the PIP and just copy that error module not found error no module name HTML password and Google that one you have created a docus file named HTML that masks the standard Library package what does that mean we have not written any code as well scroll down repository of that code okay let's clone that uh go to that GitHub version and go back let's copy them web here and web QA IP oh let's see you can download everything at once let's see here yeah download zip or yeah you can use command line as it kit clone do get clue and do it in separate directory a CD back do CD double dot and now here you can get clone okay now manually go to that place cookbook open AI cookbook samples sorry apps and I think web crawler and here run the command prompt from this path okay and uh create the environment and activate it foreign foreign we again have that module not no model name the HTML parser run this command Python 3 Dash C and this is run that command just use Python only okay there it is go to that path and what does it say rename it or delete it okay go to that path no no in the your file explorer just go to HTML path let's not open that file just go to HTML path [Applause] um [Music] okay you have wrongly entered that path remove that dot exe from there remove everything you have entered two parts in the same line yeah go there and what was the path just check there in the command line HTML yeah don't go to the file directly no no don't go to the file okay just go to the path till HTML and yeah that init file just rename that rename to something else my HTML you can give okay Enter okay send me that path let me make that change okay now try to again run that command basically to install okay cannot import name and escape from HTML just copy that and let's see what is that I don't know what do we do now I'm not sure so so we can try like uh importing installing this thing HTML parser separately okay just run those commands this will HTML 5 pip install the HTML dot parser HTML dot parser p a r s e r I will try again that pip install again do that pip install uh for requirement for requirement okay now uh install that HTML module how do we install clip install HTML I just install HTML pip only [Music] open yes let me know it is not there but why it is not installing the restart all the command line and upgrade that this one pip basically since you upgraded everything restart all command line and then this again run that command basically close every command line and run again just check the PIP version pip Dash V create um use capital V now restart command line and then try again now do pip version Double Dash version why it is giving us this error okay let's make sure we rename back that file just type pip 3 uh that space version basically Double Dash version now spelling is wrong do we not have this python to install on the system it should not be basically from python 3.9 that's okay all right I think we are already up like near the time no problem we'll continue this next week the same thing where we have left today how to run this notebook on my machine I forgot that like what's the command to run this notebook I usually run the python file so we just like python States and file name with DOT by extension no no or this is a different command for running the notebook server yeah so yeah I think that this is the one Jupiter layout okay Jupiter notebook you know not destroy me I was running on a liquid server okay let me install again notice again I think is object ah I need to activate that environment first what was the command to activate the element or Prime video so we were going through this tutorial okay so this shows the command let's see if I have that ready okay now it's fine how to exit from this only Ctrl d use quit okay now I can just run that Jupiter notebook let's see if it works yeah no it should work all right so I had this particular piece of gold and I gave somewhere some condition to limit the number of urls no I just interrupted in between so in the first piece of code what it does it uh basically it's a web crawler it goes to my website and retrieves the list of all the URLs present on my website right and it just prints it over here list of all the URLs so let's run the first one so I'll just interrupt after like 10 or 12 URLs so that we'll be testing based on that only because open AI has limitation of some this limitation here somewhere uh moduation making a quiz yeah rate limit so we are already exceeding this space basically yeah 20 PPM something like that [Music] yeah tokens per minute we are already exceeding that so we need to limit the number of urls is it stopped yeah I think it's automatically stopped that's fine okay we have few of the URLs that's fine now let's go ahead and do something we have only uh machine learning engineer can only understand these things to even I'm not sure what exactly it is happening here closed the code let me remove some of the files that are already there those are not required so last time when I ran I it generated this many files basically it took out my it took out each of the URL that is stored and it can it converts it checks the website takes out the HTML text convert that to a normal text and store it to a notepad basically it's a text file like whatever the content is there on my website now it's stored as a text file so each of the URL now it is stored as a text file and after that it does some processing and here it generates some processed files so here we have something and it just creates some are not sure what exactly is this but yeah it converts everything to some processing files and at the end if I just type something let's ask something like for example that means something from that process file power automate let's see what reply it gets me last time when I ran I just asked this questions and it was able to answer the answer then here let's run okay name here before that I have to run this question create context and Define answer question here and now let's see if I had to run everything from the beginning in order to work with this guy let's see let me delete everything I'm gonna expressly so let me remove this ones all right now let's see let's go from the beginning here I have to enter my domain name let's see okay that should be fine for now okay so I just have some 10 or 12 urls now let me run this next one and import pandas that will create this craft dot CSV file what's the issue with the default value of the QX will change from two to pause and create a look at that so anyway so here you see some output is generated and now let's run from that process scrapped CSV and that should generate me some chart yeah here's some visualization and let's run this portion as well and this one some data frame which generated something and now comes the open a key so let me get the key okay this is okay let me see I have the same key hello okay so now that should generate our embeddings file most of the time we get an error here because it exceeds the rate limit per minute so we did get some output that's fine and that will call this particular endpoint engine that is embedding Ada let's run this one and that should generate something so we have this error now rate limiter let's wait for some time and then we'll run this again because we are exceeding the limit number of tokens per minute let's try again create limit error yeah here you see the exact error that is shown so organization on request limit 60 per minute current 80 per minute so I have exceeded my 60 as the limit so let's wait for some time and then we can continue moving if it is equal to I doubt but let's see do we get a timestamp when this piece was run no let's run this around okay now we are not getting anything yeah now that is being executed now coming this portion let's run this guy and that should turn with a fresh output yeah so finally let's run this guy and now let's ask the question I'll just ask a very simple question what framework it will not tell me a definition because if I have not put a definition on my website then it will not get and it is only getting the data from my this many urls and we only have very limited blocks I think this one is block two and yeah we only have three blocks in this whole list of urls so it will try to get something from there and let's run this one so here it says I don't know that's fine so let's try to ask something else let's say DB read data from date read from SQL let's see if it can answer this question so here you see it was able to give me some response in this post we'll create a chatbot that connects to the local SQL database so basically that is giving me uh my introduction of that before it does give some response based on the training data that we have provided so let's say what is SQL I don't know because we have not mentioned that Entity Framework yeah so here it gives me some good answer Entity framework is a modern object database mapper for dotnet it's supposed to link queries and it gave me a perfect answer from that blog yeah so this is what we are trying to achieve last time and here is the result for that yeah okay yeah anything else we can try to complete it to do
Info
Channel: Dewiride Technologies
Views: 378
Rating: undefined out of 5
Keywords: open ai, chatgpt, chatbot, faq, qna, question and answers, q&a, embeddings, ai, llm, website, train open ai, train chatgpt
Id: eys4M_QNz5A
Channel Id: undefined
Length: 42min 33sec (2553 seconds)
Published: Mon Mar 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.