Generate AI Voices for Your App with Open AI and Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone in this video I'm going to demonstrate the step-by-step process to easily Implement text to speech into your application using open Ai and python by the end of this video you'll have all the information you need to bring your app to life with a realistic AI voice adding voice to certain types of applications can be a real GameChanger with respect to user experience and accessibility in my opinion open aai offers some of the most realistic computer generated voice available to developers the pricing is also very reasonable standard definition costs 1.2 cents per thousand characters and HD cost 3 cents per thousand the first thing I want to do is take a look at the documentation go to platform. open.com slocs in the left sidebar scroll down and click Text to Speech this audio API provides a speech endpoint to open AI text to spe each model it comes with six built-in voices and it can be used for a variety purposes like narrating producing spoken audio in multiple languages and also realtime audio output you can scroll down to the different voices here uh six different voices you can test them out my personal favorite is Nova which I just find is the most realistic the API supports multiple output format and also supports multiple languages so if you need to generate speech in a foreign language language you just add the text written in that foreign language and the open AI model automatically detects it and converts it up here in the quick start section is where we get the code we're going to be accessing the model in Python but you can also access it in JavaScript and curl I'm just going to copy this code I'll go to my vs code terminal I already have a folder TTS that I'm going to work in and I'll create a file called main.py for now I'll just paste this code inside there are a few things we're going to have to do to set up the development environment before we start creating voices one of these is we're going to have to install some libraries we're going to need open AI which is where we're getting the text of speech model we're going to use py Game to automatically play the audio that's generated and then we're going to use python. EnV just to handle the API key before I install these I'm going to open up a terminal so that I can create an activate a virtual environment now I'll install these libraries with Pip and that should be good the other thing that I'm going to need in order to use the open AI library is an API key for this you're going to need an account on open AI if you don't have one already go back to the documentation and just click sign up and create an account I already have one so I'll just log in with my Google once this is done click dashboard and then on the left sidebar click API Keys click create new secret key you can give it a name which is optional and also assign it to a specific project and then click create secret key once this is done copy the key I'm going to go back to my VSS code click new file and create a EnV file this is where I'm going to store my API key I'll call it open AI API key equals and then just paste in my key and save then I'll go back to main.py I'm going to import load. EnV from the EnV Library also import the OS library and then I can load my secret API key from the EnV file using load. EnV this line here client equals open AI this is where we're going to create an instance of the open AI client and I'll have to give it the API key and to do this I'll just use os. get EnV and then give it the name of my open AI key from the EnV file and that should be all set up and we're ready to go so just walking through this code here it's all fairly simple um this speech file path this just essentially creates a file path um so that we can save the recording um to our current directory with the name speech. MP3 this response client. audios speech. create this is where we are accessing the text to speech open AI endpoint uh we're giving it the model tts1 um we can also choose HD I'm actually going to do that we can select one of the six voices by default it's alloy I'm going to change this to Nova and then this here is our input text which is just today is a wonderful day to build something people love I'll also add here it's also nice to learn how to use open AI API or open AI text to speech so this creates the audio recording and this actually streams it to a file but right now if we were to just run this this is not going to work um if I hover over here it says the method stream to file in this class is deprecated Duo bug doesn't actually stream the content and this is true uh my understanding is that this is more of an issue with python than it is with the open AI library but there is a quick fix to this so the message here is asking us to use the with streaming response. method and what we can do to use this is we're just going to add this here so client. a.s speech and then we're going to add dowith streaming response. create we're going to put this in a with statement and then we'll just say as response and we will adjust this and take out response equals of course so this should work now with client. audio dope. withth streaming response create uh we have of our inputs as response and then it outputs response stream to file so this actually saves the file I'm going to save this and then we can just run this in the terminal Python main.py and it creates This speech. MP3 file and now we can play it today is a wonderful day to build something people love it's also nice to learn how to use the Open Eye text to speech that's not too bad so again you have two different model options here you can use tts1 which is the standard definition or tts1 HD which is high definition the high definition of course is more expensive but it's also slower so depending on your application if you're prioritizing a speedy response then you'll probably want to use the standard definition but if speed is not a factor then high definition might make more sense the next thing I want to do is set this up so that it will automatically Play The A audio to do this I'm going to use py game I'll import py game and then after the audio is created I'll initiate a pygame mixer then I'll load the audio path with py game. mixer. music. load and then I'll play the audio with py game. mixer. music.lay and I'm also going to add this code here so that the mixer doesn't shut down before the audio finishes playing and I'll import the time Library and maybe I'll add different text here like today we are learning to code text to speech I'll save and then I'll run this again from the terminal today we are learning to code text to speech good and that was just done automatically without me having to go to the audio file and press the play button which is what you would want for your application now depending on your application um you're probably not going to use pame in order to play this if this is a web application you'll likely use a JavaScript library and play it from the client but this is sufficient for our purposes now finally what I want to do is wrap all of this code into a function we'll call this text to speech and this function is going to take uh the text string as input um it will also allow us to choose the model we'll choose the tts1 as default and also the voice and we can choose Nova as the default we'll just make the adjustments here model equals model voice equals voice and then the input equals text there we go now we can just create and play audio as we're generating text we can test this out one final time adding voice to your chat boot can really bring it to life and improve the user experience and then just text to speech text model equals we'll do the tts1 HD and then we'll keep the Nova voice okay great now I will save this and run it again from the terminal adding voice to your chatbot can really bring it to life and improve the user experience perfect so now that it's wrapped in a function you could import this into another module into your application and just generate your audio as you're producing as you're generating text and the text may come from variety of different sources maybe you're using open AI to do text generation for your chat bot uh but regardless now you can give that your application a voice uh that is that is fairly realistic and really improve the user experience okay great I hope that you found that useful uh if you have any questions or any comments please post something uh in the comments below hit the like button subscribe if you're interested in this type of content if there are any other videos you'd like me to make I'd love to hear about that as well I will see you in the next video goodbye for now [Music] [Applause] [Music]
Info
Channel: Aaron Dunn
Views: 300
Rating: undefined out of 5
Keywords:
Id: akzARL8q4LM
Channel Id: undefined
Length: 11min 12sec (672 seconds)
Published: Tue Jul 02 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.