How to Set Up ChatGPT and Whisper API with Voice Recognition for Your AI Assistant

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up today I'm going to show you how I use the chatgpt API whisper API and then the speech recognition in order to create a chat bot let me go ahead and give you a quick demonstration on that real quick hey Vivi I'm listening can you help me decide what I want to eat for dinner tonight of course what type of Cuisine are you in the mood for um something with meat in it great do you have any particular type of meat in mind such as chicken beef pork or something else let's say pork sounds good how about a classic dish like pork chops with mashed potatoes and roasted vegetables or something more exotic like stir-fried pork with rice and vegetables uh that sounds pretty good can you give me a recipe for the first one and so that's a quick example for my Vivi project that I'm working on right now the code is in the GitHub that I have in the description but I'm going to go ahead and show you how you can do that for your own um with voice assistant and whatnot so let's go ahead jump into code my guess is that if you're following this tutorial that you have python installed and then you also have Visual Studio or some type of IDE installed as those are the prerequisites for this so make sure you have those installed okay so what you're going to want to do is create a new file and then name it dot py so I have one called voiceassistant.py that's what I can that's what I'm going to be using as the example file and what I have here is the Vivi class that I've been using to build my chat bot and I'm going to be using it to help me guide um guide us through this process because I don't remember exactly everything that I did so the first thing that we're gonna have to do is we're going to need to import some things into this script in order to do that you're gonna have to do some pip installs so you're gonna have to do pip install open AI you're gonna have to do pip install PYT TSX three and then you're gonna do pip install speech recognition okay so all of those Pips will be down below in the description um and then what we're gonna have to do is um input all those into the script the first thing that we're going to do though is get a prompt completion from open AI so let's go ahead and start with that first before we integrate voice into it so we're going to go ahead import open Ai and look and the function that I used for this one was a response completion um but in order to to do that we have to initialize some things first so let's go ahead and initialize chat gbt in order to initialize chat to upt what we have to do is set up some type of key so we'll just call it key equals um this is where you enter in your Chachi BT key and to do that you have to go to open Ai and create a API key I'm going to show you this just as an example I will be deleting this API key after the video so don't even try to use it because it won't work so that is um what we're going to input into here so you can see the key is there and then um we're gonna go ahead and set it to open AI API key equals key so now we have the openai key set to this and we can use the prompt completion that we're going to be needing for Chachi PT okay so I realized the window was getting a little cramped so I just move that over to another screen okay so once we have the API key Set uh we're gonna now need to be able to do a chat GPT completion so um let's just go ahead and start with what I have here so this is going to be a file that we would have to import a personality into so let's just go ahead and create a personality variable personality variable and then we're going to set it equal to some text files so we're going to call this text file personality.txt and put those inside of the quotes So what this is going to do is it's going to take whatever text is in the personality.txt and read it into chat gbt in order for it to respond in in order for it to respond in whatever fashion you wanted to so um let's just go ahead and create that text file let's just do you are a helpful assistant an assistant period and then we'll go ahead and do file save and then we'll save this as purse personality.txt so now we have the prompt we've got personality.txt and then we're going to be opening it as um the mode and so this messages list here is pretty key in order to set up the tattoo BT assistant in order for it to respond back to you and we'll be using more from that as we go on so okay and so the next thing that we have is something called completion we set it equal to openai.chatcompletion.create and then we have the parameters that we're going to pass into the completion in order to get our response so the model we have is GPT 3.5 turbo and this is the latest edition watched first um the messages is going to be the messages list here and we're going to set it to the role and we're going to set the system rule to what we have inside of the content for personality and then we have a temperature parameter here for 0.8 and that's basically just the randomness of the response that it's going to give you and so once we have this the next thing we have to do is just print out what the response is and so this can be done by using these lines here so we're going to be taking in a response for the completion that it sends back the most recent response with the zero here and then we're going to append the um we're going to append the list of messages this is this append is needed in order for it to kind of store the conversation that you've been having and it's going to store inside of this messages list here all right and so now that we have the response completion and we have the system what we need to do is have some type of user input so we're going to create a variable called user input and we're going to set it equal to um just do input what question do you want to ask so that's going to take in the user input and then what we're going to do is append that list once again so message is that pen and so we're going to do that same thing in order to add this to the list we're going to do roll and then we're going to set the user and then we'll have to do content equals user input so that's going to take in the user input um and then send it over to chat gpt2 and send it over to the open AI API in order to have a completion and then it's going to read out the response here so this code should work so let's go ahead and run it by clicking F5 you can also go to run up here and start debugging okay so it's not finding personality.txt probably because I have a typo here okay so this is weird bug with a vs code that I created the text file but it wasn't grabbing it from the folder so you just have to close out of vs code and then reopen it back up so now that we have that um I renamed it to P dot txt just to make it a little bit easier to make sure I'm not doing typos and let's go ahead and open up that file again so you are a helpful assistant so let's go ahead and run it at five and you'll see down below in the console here what do you want to ask what were your instructions and it's gonna say I'm in AI language so yeah as you can see it gave back a a completion let's go ahead ask something else what is first president of the USA okay George Washington okay cool so that is how you can get a text completion from chat gbt but as you can see that's not really useful because you're answering um only one at a time so what you have to do is you have to Nest this inside of a while loop in order for it to continue going on so let me just show you how you can do that real quick so the key lines that we need for the um for the bot is going to be these ones here the other ones are just initializing things so we're going to go ahead and put these inside of a while loop so that we can keep asking questions so let's go ahead and say wow true it's gonna just keep going forever and ever and ever until you stop so let's go ahead and demonstrate that real quick how are you doing today okay it gives me a response how can I assist you today I am feeling hungry you man it's all about food today for me I don't know why so yeah as you can see it's doing the Chachi PT stuff and it's just giving the response back so we're gonna go ahead and end it here um so this is it this is gonna be your first chat bot uh with the open API key and if you just did this this is chat GPT at its core um so what we're going to do now is now that we've got the chat gbt completion working we're going to integrate it with some speech recognition so that we can actually talk to it so what we're going to need to do is um import a couple more things import speech recognition as Sr and then import p-y-t-s-x three Pi text to speech three okay and we're going to need to initialize a couple of more things so let's go ahead and go down a little bit more um we need to initialize the text-to-speech engine so just go ahead call it engine we're going to do p y ttsx3 dot init so initialize the engine forward and then we're going to need to select a voice so voices equals engine dot get property and then inside here we're gonna do voices so that's just going to get the prop that's just going to get all the voices that we have installed onto our Windows system and then we're gonna do engine.set voice voices okay and so what we did here was was set the voice that we want it to zero is going to be for mail and then one is going to be for a female the the two voices on the system so once you've initialized the text-to-speech the next thing that we have to do is initialize the microphone so R equals um speech recognition recognizer oops recognizer and then all right and then sorry Mike equals sr.microphone and then this device index is going to decide what microphone you're going to be using for the system uh we're gonna set it to zero as that is your default a couple of additional things that I did set for my Vivi project was I did Dynamic energy threshold equals false and then I did energy threshold equals 400. so the reason I set this to false is that it kept listening um it kept listening and never ended and that's because it was automatically adjusting the threshold to my ambient noise and it might have adjusted it too high to where it never stopped listening so that's why I set it to false so that um it just sets it at the beginning and then the energy threshold to 400. Okay so we've got the microphone set up we've got the speech to text engine setup we're going to do this inside of the while loop so that we can continually talk to it um but what we're going to need to do is we're going to need to listen to The Voice so to do that we have to do with Mike as let's call it source um we're going to print out something that says listening or um we're gonna print out something that says listening so that we know that we are that it is listening okay and then we're going to do um the recognizer adjust for ambient noise source and then the duration and the duration we just set at 0.5 and I believe what the duration for this is is how long it's adjusting for ambient noise so I just set it to the very minimum that it recommends and then what we want to do here is we can do a try accept so that so that it only passes if you actually spoke into it so if you didn't do any speaking it's going to fail and go into the accept and then it'll just keep looping until um until it actually registers that you said something so go ahead and set a new variable audio equals R listen source and then um and then we're going to do an accept we're going to continue okay and I actually moved um I actually moved the audio outside of the try so um it's going to try for the audio and then it's going to use well first it's going to listen and it's going to store that audio data into audio and then it's going to try to use uh the recognize Google to see if there's any um words spoken if not it's going to go to the accept and then it's going to loop back and Skip all of this and then listen again for a valid response so so to do so to give a demonstration let's just go ahead and run and I'm just going to blow into the mic when it says listening oh and as you can see it said an empty list here and now it's going to register what I'm saying and then it's going to go ahead and pass and then it's going to go ahead and head on over to this next one so what we're going to need to have here now is just pass in what we said over into the messages directly so um we don't have to do anything other than just delete this input line here and what it's going to do is um recognize the audio store inside of user input and then send that over to chat GPT as the user input for this messages here so and so let's just go ahead and try that so go ahead click Run start debugging hello there I'm just testing something out so I said hello let me let me know if you need any assistance can you give me something to eat for dinner tonight something that is Savory says certainly what are your dietary restrictions or preferences and so here it gave me an entire list um so let's just go ahead and stop it so as you can see it can now listen to what I'm saying and respond back to it um and then the last thing that we have to do is have it just read out the response so this is what we're going to use that pyttsx thing for just reading it out so the last thing we've got to do is generate a voice and so actually we want to have it after the response um what we're going to do here is let's do engine dot say and then we're going to insert into an F string to insert that response inside of here so um man I'm using single quotes and double quotes it's probably bad code but um let's just go ahead and do engine run and wait and let's just go ahead do that again hello there how are you doing today hello as an AI language model I don't have feelings but I'm here to assist you how can I help you today and as you can see it responds back now with that voice that we just gave it so that is basically it to get a chat bot up and running um with voice and speech recognition um the next thing that I'm gonna show is it's not completely necessary but how you can use whisper instead of speech recognition to get this done and then how you can save the conversation okay so we're gonna go ahead and get rid of this user input we won't be needing it anymore uh what we're gonna end up using is uh whisper and to do that we're going to have to first save the autofile as a save the audio file as a WAV file and then we're going to go ahead and read that into the whisper API so uh what we have to do is save that audio file as a WAV file so we're going to go ahead and um let's just save it as speech.wave and there is probably there's probably a way to do it with temporary files but I haven't figured out how and so that's one thing that I'm looking into but we'll do quote quote quote WB quote and then we'll go ahead and do as F and then we'll do F right um audio we're gonna get gets wave data from the from speech recognition and then and so this is going to write it to a WAV file then what we're going to do is we're going to write it into a new variable called Speech let's go ahead and open it um let's do speech.wave and then we're just going to do it as feed bytes then the model that we're going to be using for this is going to be whisper so um let me make sure I didn't use model ID for okay I use model down here okay so we're going to go ahead and do model ID actually we don't need to do that so what we're going to do let's just do completion and since we already have completion down there we'll do W completion for whisper completion so w completion equals open AI Dot audio dot transcribe and then we're going to set in the model so model ID model equals um let's do whisper I believe the model name is whisper one dash one and then we're going to feed in that WAV file with file and then we'll do speech as that's what we have the variable up there for okay and so that's all you need to do in order for it to transcribe this um and then we're going to need to take the response so in this case we're going to you name it as user input equals um W completion text all right and so this will allow us to use the whisper API to complete the um the text the speech to text part of it so with this we should be able to go ahead and run it we already set the API key up here so that's why we don't have to set it again for the um whisper API and so F5 we're going to run it hey there welcome to the video thank you how can I assist you with the video okay so as you can see it replied back but I actually didn't get uh I want to see what what it's registering so I'm gonna go ahead and print the user input here so let's go ahead rerun that hi there how are you doing today it's gonna register what I said here as an AI language model I don't have feelings but I'm functioning properly and ready to assist you how can I help you today and so as you can see whisper is now officially working so um what I like to do what I did in my in my chat Bots is I made this into a function um and then just called it so as you can see this is a lot of code to just uh put in here whereas the other line was just one line of code um so an easy way to make this into a function is let's just go ahead and go up here and Define it as a function so we'll do def uh let's do whisper and so what we end up passing is an audio variable and then all we have to do is paste all of this in here let's go ahead make sure it's tabbed correctly and so now instead we just need to return user input and we should be able to just simply call whisper and pass in that audio variable and it should do the same thing so let's go ahead and user input equals whisper audio okay cool let's go ahead rerun it hello there how are you doing today as an AI language model I don't have emotions but I am functioning well thank you for asking how can I assist you today see there you go and then now you've got whisper running as a function so um this is just a little bit more this makes what um like your main Loop a little bit more readable in my opinion and that ways if you want to do it with the other one you can always have an if if Loop in here to select Whispers so if you just go back and do user input equals r dot recognize Google audio uh you could do something like this if whisper if you use whisper then user input else you can use Google so then all you have to do is just set a use whisper variable up here somewhere uh we'll do we'll do false and this will skip it and then go to the recognize Google so let's go ahead and show you that one F5 hello there this is Google's okay as you can see how may I assist you today this is Google's here and it's not using whisper but if I go ahead and set this now to true hello there how are you doing today hello as an AI language model it's going to go ahead and use whisper as you can see it didn't have all of these different dictionaries here so that's just an easy way how you can switch back and forth between using whisper and then the Google one this is if you want to perhaps save money and you just want to use Google but I find Whispers a little bit more accurate and it's still pretty cheap um six point zero six cents three-fifths of a cent in order to use whisper for like a minute of audio so that's pretty cheap in my opinion and yeah you just have to have some type of variable and and you can switch between them so the last thing that we're going to do is how do you save the responses um into a into a folder or how do you save the conversations for later so to do that I'm just going to go ahead and input some functions that I that I used for my chatbot just to make it just to speed it up a little bit and show you what I did okay so the two functions that I borrowed from my Vivi project was save conversation and then save in progress so um to use these what we're going to have to do is input OS and then we're going to have to import Json so that'll get rid of the little squigglies that we saw and so let me just go ahead and detail um what these two functions do so this first one is a save conversation function this is going to be the initial save for the conversation so that um it doesn't overwrite any other folders so what it's basically going to do is go through the directory where you have your folder stored and check to see if the path exists if not it's going to go ahead and create a new um a new text file that you can go ahead and start writing to so this one is going to return the suffix number and that suffix number needs to get fed into save in progress where it takes in the suffix count from the Save conversation function in order to know which file you're writing to so that's just the way that I separated it into you could probably have it done in one but I found it a little bit easier with two in order to call them from two different locations in the script so um let's go ahead and use them the first thing that we can do is we can we can call Save conversation so so in order to utilize it we got to do suffix equals save conversation and we're going to pass it a save folder name String we're going to take it takes in a path to save the conversation to um so preferably we would want to set that path and so the easiest way to do that would be to just grab the current script location so to do that we're going to go ahead and use this line here so going to grab script location I'm going to use this we're going to use this line right here to grab the script location this is something that I just found online and it just grabs the script uh directory so what you can now do is uh specify a save folder location so we're gonna pass in something safe for the name so let's go ahead and specify a folder name let's just say this one is voice assistant and then we're going to create one called save underscore folder name equals OS dot path dot join where we can now put in the script directory um and then we can do we're gonna save it into a folder called conversations and then we're going to do we're gonna make this an F string so that we can put that into it so folder name cool and then once we have this long thing out of the way this is where we're going to save it to we're going to go ahead and put save folder name into save conversation and so that's going to um I'll show you what it does after we run the script and then what we're going to want to do is at the end of here um I prefer to have it right after the response is printed we're going to call Save in progress and then we're going to pass in that save folder name variable here so oh yeah and then we're going to also need suffix so we're gonna do yeah we're gonna pass in a suffix and then save folder name and then I'll go ahead and show you why I did it that way and what it ends up doing so the easiest way to do that is um start a conversation up hey there how are you doing today as an AI language model I do not have emotions but I am functioning perfectly fine how can I assist you today would you be able to tell me a funny story something that is short and under 50 characters why don't scientists trust atoms because they make up everything can you tell me another one that is less corny sure here's another one why did the tomato turn red because it saw the salad dressing oh my God okay so yeah you get the point um and then let's go let me go ahead and show you what it does so so inside of the folder what we have here is a conversations folder and inside we have that folder name so as you saw we named the folder uh voice assistant so inside of here we have voice assistant and then a conversation where we can now see the conversation so here we have that messages list that we had we have the system and then you got the user the assistant and this is basically the entire conversation all the receipts of the conversation and and you can store all of this locally so that you never ever lose it so that is how you can now incorporate whisper and then save all of those conversations to a folder with the text and have everything inside of it and that is gonna be pretty much it I know we ended up going through um a lot of lines of code here and of course if you do everything in just one script it's gonna get a little bit messy because you have to do all the initial initialization and then you have to Define some functions and then call all them inside of the loops but if you do it inside of a a class or you do more functions um you can get it looking pretty uh concise so this is and that's why this assistance uh python script you only have 41 lines here and then um literally just five lines that you have to adjust but this is everything that I went through in order to get my voice assistant up and running I didn't show you 11 Labs but let me know Down Below in the comments if you want me to show you how you can do the 11 Labs part it's part of my Vivi project if you want to go take a look at that but that is going to be the end of today's little tutorial for getting the chat GPT set up with speech recognition and Whisper so everything I did today is going to be on the GitHub page so go ahead check the description if you just want to see the code and let me know if you want me to do anything else I will be working on my Vivi project and there are some other things that I'm going to be focusing on so uh just let me know what you want to see and I will be sharing my progress for little projects that I'm working on as well so see you later and yeah let me know what you think in the comments below
Info
Channel: Jarods Journey
Views: 909
Rating: undefined out of 5
Keywords:
Id: u4oE49sWI4w
Channel Id: undefined
Length: 31min 17sec (1877 seconds)
Published: Tue Mar 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.