Talk to ChatGPT using OpenAI Whisper speech to text and LangChain

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi chat GPT can you tell me what ASR is we can send our recorded voice to chat gbt and get the answer back ASR stands for automatic speak recognition and so on in this video we want to use open AI whisper to convert our voice to a prompt and send it to chatgpt as if you are talking to an AI assistant let's dive in we change to our project directory and from there we create a new folder and name it whisper underscore chatgpt we change to the new folder and from inside the empty folders we start Visual Studio code in Visual Studio we create a text file requirements and add some packages before installing the packages we need to create a virtual environment type a Python 3 Dash M AVN and the name of the virtual environment in this case we end activate the virtual environment with Source bin activate the name of the virtual environment appears before our prompt so we can go ahead and install the requirements depending on the speed of your machine this can take a little while and then the prompt comes back and we can clear the screen to install openai whisper we navigate to to the GitHub page and copy the link to install the latest comments from the Repository back-end Visual Studio code we paste it in the terminal and install whisper after the installation is done it's time to create a new openai key so we navigate to open Ai and create a new secret key create and copy and close the pop-up back in Visual Studio we create a Dot N file and assign the key to open AI underscore API underscore key and finally we create a file app.pi to put our main code inside we close and make room to start with the main coding part first we need an user interface gradual makes it easy to use the microphone and add audio as input so we go to the docs and copy the sample code and use it as our starting point for this project back in Visual Studio code we paste the sample code in our app.pi file it is a very simple script it takes the input and adds some text like hello to the text and send it to the output to test it we simply type gradio and the name of the file in this case app.pi gradual starts and run here on localhost Port 7861 we navigate to localhost Port 7861 and type YouTube and get the hello YouTube greetings back so the user interface is working we stopped radio with Ctrl C and start customizing our script we change a hello to transcribe and do some formatting and bring each argument of a gradual interface to a separate line next we change the inputs from text to gradu audio component with Source microphone and type file path this help us to start the microphone component and record our voice next we change the name of our function with F2 from agreed to process and it changes everywhere we do the same with the argument name and change it with F2 to file path as the function receives the file path from the audio component of a gradu we save the changes and now it's time to test the new user interface with the microphone as the input so we open the terminal and start a gradual again after gradu has started we navigate to localhost Port 7861 this time we see deck radio audio component instead of the text input we can click on record from microphone and ask our question again after we stop recording and a submit our input we see the file path of our recorded voice in the output so our function receives successfully the file path of the recorded voice from Radio audio component and we can go to the next step to convert it to text back in Visual Studio code we stop graduate with Ctrl C now we want to improve the process function and use whisper to convert the audio given by file path to text for this we first need some imports open AI for whisper and Os and Dot N for getting our API key from the dot end file now comes the main part of the function we use the open function with RB to open and read the given file path as binary and assign it to audio next we get our environment variable open AI underscore API underscore key and assign it to a variable and finally to the open AI API underscore key next we use openai.adio.transcribe with the whisper dash 1 model to transcribe audio we are interested in the text part of a transcript and return is return it as the return value of the process function which will be displayed in the output of the gradual user interface now it's a time to test our code and see the speech to text part in action we save everything and open the terminal and run a radio again after radio started we navigate to the given address we open localhost port 7861 High chat GPT can you tell me what ASR is highchat GPT can you tell me what ASR is as we see whisper does the speech to text job pretty well now we come to the final part of our project to use the text returned by Whisper as the prompt for chat GPT so we go back to visual studio code and stop radio with Ctrl C we can use open AI directly or use a wrapper like Lang chain here we want to use long chain and from Lang chain llms we import the open AI wrapper then we create a llm with open Ai and set the temperature to 1 to let chat GPT be creative and assign our API key next we use the text part of transcript as the prompt and a return the answer of chat gbt as the return value of the process function which will be displayed in the output of the user interface we save everything and open our terminal and start radio again then navigate to the address and ask our question again hey open AI can you tell me what ASR is when we stop and click on submit our voice is converted to text by Whisper and uses prompt for chat GPT and we get the answer ASR stands for automatic speak recognition if you need more information about whisper models and the languages supported head to the GitHub of open AI visper so this example gives you a starting point to create your own AI assistance and talk to your data and databases especially in combination with openai function calling good luck

Info

Channel: business24_ai

Views: 5,132

Rating: undefined out of 5

Keywords:

Id: veTF-9WHW70

Channel Id: undefined

Length: 9min 59sec (599 seconds)

Published: Sun Jun 25 2023