Create a Smart Voice Assistant using Open AI's ChatGPT API, Whisper, Python & Gradio

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
who was the main actor in the movie Inception the main actor in the movie inceptionist Leonardo DiCaprio [Music] in my initial chat GPT video I showed you how you can create your own voice assistant using chat GPT in my previous video I showed you that charge GPT now has an API but there was one thing that was missing in my previous Voice Assistant video that was giving out the output in form of audio so in this video I'll combine everything that you asked for so you can ask questions through your voice and get output in form of Voice through chat GPD so without wasting any further time let's kick start the video let me quick start the activity by showing you the installation section so I'll quickly unite the cell I'll require Whispers base Modi which is where I'll install whisper first I require gredio to kind of give you the whole chatbot like interface I require open Ai and I require Google text to speech so these are some of the Imports that I'll make in the import section but for imports to happen I'll have to first install then so I'll quickly install this while the installation is happening just a heads up that you will require a GPU based instance on Google collab and not a CPU based instance if you're using a CPU based instance then the inference will take really long so my advice is you need a GPU for this entire activity if you're doing it locally then you require a good GPU if you're doing it on cloud then you require a GPU again so this is something that I wanted to specify now that the installation is done I'll quickly go to the import section all that I'm installed is all that I require again but I also require additional libraries like OS Json and the others which is what I've specified in this particular cell so I'll quickly run the cell as well now that the Imports are done I'll quickly go forward and I'll start defining variables that I require there are some unwanted warnings that creep up so I want to kind of hide them which is what I'll achieve using warnings dot filter warnings so I'll quickly run the cell in my previous stat GPT based API video I showed you that I have my API key within a Json file so that is something that I load so I'll quickly load the Json file which is GPT underscore secret underscore key.json into the variable data from data there is a key called as API key which actually has the actual API key so I save that into the variable openai dot API key so I'll quickly run this cell as well now given that this is an introductory video I'll load the base model of whisper into a variable called as model so I'll quickly run the cell the maze model is up and running I'll quickly check what does it use does it use a CPU or a GPU so I'll call the command model.device and here it says type equal to Cuda that means it's using a GPU so that is a plus point for us in the entire activity when I have to give out an output I'll require a temporary MP3 file for radio to function correctly I'll have to first create a temporary file and then load the new files that are generated so for this purpose I'm creating an entirely temporary MP3 file with the name temp mp3.10.mp3 and this file is something that will keep changing with time so I'll quickly run this cell as well just a heads up if you're doing this locally on your machine there are chances that you might not have this particular ffmpg module in your system so you'll have to install it separately and then run this command just so that everyone's clear I'll show you the files that are there in my directory as well so I have the Json file which includes my openai key and I have a temp dot MP3 file as well which I'll kind of utilize to keep changing the MP3 file by giving out the text to speech output now this is the chat GPT API function I've covered this entire function in depth in my previous video where I've gone line by line but essentially just to give you a small recap I create a simple list variable which contains a dictionary dictionary contains two keys role and content so the first content that you have to supply is you are a helpful assistant and then it kind of takes other prompts as well now if there is a valid input text or if there is input text that is received in this particular list which contains a dictionary I append the content that I want the response from then I call the open ai.chatcompletion.create function I specify the model that I want to use and I pass in the messages list and finally the response that comes in is saved in chat underscore completion variable in the chat completion variable I'll have to Traverse through the list and dictionary to kind of extract the actual reply which is what I do in this particular piece of code if I've gone a bit fast I'll add the link to the previous video as well so you can kind of go step by step and understand the logic that is in place for the entire function uh for now what I'll do is I'll kind of import this particular function in memory Now we move on to the transcribe function wherein what I'll do next is I'll kind of transcribe audio pass the transcribed audio text to the chat GPT API function that I've created above the response that I'll receive from chat gpt's API output is something that I'll pass through the Google text to speech module and I'll generate audio for it so all of this is more of a system that is interconnected in between and all of this is achieved using this particular function this piece of code is what I'll utilize when I convert text to speech so that is something that I'll require later on in the first two lines of code I basically load an audio file using the function whisper.load underscore audio function and assign it to a variable called as audio I then pad or trim the loaded audio file to fit a duration of 30 seconds using the function whisper.pad or trim in this piece of code what I do is I convert the padded or trimmed audio into a log Mills spectrogram using the function whisper.log underscore mail underscore spectrogram and then assign the result to a variable called as mail the log main spectrogram is basically a common way to represent audio signals for speech related tasks such as speech recognition or language identification in this piece of code itself what I also do is I move the log Mill spectrogram to the same device as the machine learning model using the dot 2 function this step ensures that the data is in the correct format and on the correct device for the model to make predictions so I want my predictions to happen on the GPU and not on a CPU I use the machine learning model to detect the spoken language using the function detect underscore language and this function takes the log mail spectrogram as input and returns a tuple of predictions and probabilities the predictions are not used in the code snippet but the probabilities are assigned to a variable called as probabilities finally I go forward and decode the audio signal using the whisper.d code function which kind of takes in the machine learning model the log Mills spectrogram and the decoding option is input the decoded result is assigned a variable called as result extracting the decoded text from the result variable using the dot text attribute is something that I carry out next and then I assign this entire piece to a variable called as result underscore text so essentially all the audio that comes in through this particular function at this point we have reached the text which is basically the question that I want to ask chat GPT this particular text is what is passed to the actual charge GPD API so which is where I'm calling the chat GPT API and I'm storing the result from chat GPT API into out underscore result once I've done that now the result that comes out from chat GPT is again in form of text I want that to be spoken out in form of audio so that is where I kind of initialize a Google text to speech instance and I call it audio object uh text is equal to out result which is the output of chat GPT I specify language equal to English and I don't want this output to be slow so I've set slow equal to false and whatever audio is generated I kind of save it into a file called as temp dot MP3 so if you remember it created a temporary MP3 file that is something that will keep replacing as the output keeps changing okay I return the result text which is basically what whisper is able to decode I return the chat GPT output and I also returned the temp dot MP3 file so these are the three outputs that I return and I quickly run the cell to import everything into memory finally I'll create an interface using radio the first output would be Whispers model output which is speech to text the second output would be charge GPT output which is through the API this time around the third output will be the actual MP3 file which will kind of speak out the result so these are the three outputs that I'll have I'll Define the interface so I'll kind of give it a Titan I'll call the function transcribe I'll set the inputs and I'll set the outputs as well and I'll say live equal to true and Dot launch so I'll quickly run the cell so the interface is loading so here you have the interface let me ask my first question who is the current prime minister of India so if you look at this particular output it says who is the current prime minister of India so this is the output by the speech to text engine which is our Whispers module the output is the current prime minister of India is Narendra Modi and if I want the output and audio uh and since all of you have requested this here is the output again integrated into the chatbot itself the current prime minister of India is Narendra Modi so this is the output that I'm getting I can ask one more question so I'll go forward and clear the entire screen who was the main actor in the movie Inception so I asked the question who was the main actor in the movie Inception this is the chat GPT output which is perfectly correct there is no error in this the main actor in the movie inceptionist Leonardo DiCaprio perfect so the entire system is working all thanks to chat GPT so this is something that I wanted to show you a lot of you were commenting that can we have audio as the output and here is what I have right I am inputting audio and I am getting an audio output isn't this amazing so this is something that I wanted to share with all of you today I hope you found this video informative if you do like the content that I create on my channel it would be super motivating if you can press the Subscribe button and also press the Bell icon to be notified for amazing videos on data science and machine learning thank you so much for watching the video foreign [Music]
Info
Channel: Bhavesh Bhatt
Views: 11,084
Rating: undefined out of 5
Keywords: artificial intelligence, ai news, whats ai, whatsai, chatgpt, gptchat, what is chatgpt, what is gptchat, gptchat explained, chatgpt explained, how does chatgpt work, how does gptchat work, chatgpt programming, elon musk, openai chatgpt, is chatgpt free, is gptchat free, chatgpt in 5 minutes, gptchat in 5 minutes, chatgpt results, gptchat results, chatgpt examples, gptchat examples, funny chatgpt, chatgpt memes, will ai replace programmers, ai replace developers
Id: gTn1fXYw9sU
Channel Id: undefined
Length: 12min 1sec (721 seconds)
Published: Thu Mar 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.