AI Text to Speech in 10 Minutes with Python and Watson TTS

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

tired of speaking maybe don't want to do your own speech well there's an app for that so in this video we're going to be taking a look at how you can take text and convert it to speech also known as text to speech now you're probably wondering why the hell is he wearing this hat well in this case what we're going to be taking a look at is how to convert text from different languages as well as english to be able to generate speech so we'll actually be taking a look at how to convert french to french speech let's take a look in a little bit more detail as to what we're going to be covering so this video is going to be a bit of a crash course on text-to-speech conversion so the first thing that we're going to be doing is converting a simple variable from python into a speech file so we'll output it to mp3 we'll then take a look at how you can pre-process text documents so say you've got a bunch of files that you want to convert we'll be able to convert that to speech as well last but not least we'll also take a look at how we can convert using different language models so we'll take a look at all the different languages available and we'll specifically convert using one of those we'll also cover all the setup that you need to go through so in this case we're going to be using watson text to speech so we'll cover all of that as well now in terms of how we're going to be doing it we're going to be working inside of a jupyter notebook so we'll capture all of our content or our corpus using python we'll then send that text to the watson text-to-speech service so that will be allow us to perform that conversion and then from there we'll output that as an audio file so it'll be an mp3 file that will be able to convert you can do it in a bunch of other formats but we'll use mp3 in this case ready to get to it let's do it so the first thing that we're going to do before we start converting our text to speech is lay out our jupyter notebook so there's four key steps that we want to go through inside of our python jupyter notebook those are installing our dependencies setting up our speech to text server so authenticating then we're going to convert a basic model read a file from text and convert that and we're also going to use a new language model all right so we've got the basic layout for our jupiter notebook setup so again as we said we're going to first up install our dependencies then authenticate convert a string read from a file and convert that and then last but not least we're going to convert with a different language model so this allows you to convert with a chinese language model an arabic language model there's a whole heap of different language support so let's go on ahead and first up install our dependencies so in this case there's one key dependency and that's ibm watson now in order to install our dependency we're just going to use the pip install command so that's now done so you can see we've just used an exclamation mark and then we've typed in pip install ibm watson so that's our key dependency now installed now the next thing that we need to do is authenticate so because we're using the watson text to speech service we need to go and set up a text-to-speech service first so in order to do that all we need to do is go to cloud.ibm.org forward slash catalog and then from there all you need to do is hit services so if we zoom in so select services scroll on down to ai slash machine learning and choose that then from here you can see there's a whole bunch of different services the one that we're looking for is text to speech so this one down here so if we select text to speech so you can see from here that we've got a light plan and that's going to give us 10 000 characters of conversion per month and it'll be deleted after 30 days of inactivity so in this case when we're just getting started out this is more than enough to to go and convert our text to speech so let's choose that light plane and hit create so our text-to-speech service is now being created you can see it's called text to speech dash wk now what we need from our text to speech service are our api key and our service url so to grab that all we need to do is go to manage and you'll have your api key here and your url here so both of those details are just under the credentials box now what we'll do in order to store those in our jupyter notebook is just create two new variables so we'll create one for our url and one for our api key so now let's go ahead and copy over our url and api key into these variables so those are the core components that we're going to need from our speech to tech service so you can see here that we've got our url and we've also got our api key so those are the core things that we're going to need now what we want to do is actually start authenticating now in order to do that we're going to need to import some dependencies from the ibm watson sdk that we just installed what we've done is we've imported two key dependencies so we've imported the text to speech class so this is going to allow us to work with our text to speech service and we've also imported iam authenticator so that's just going to allow us to authenticate to our text to speech servers now what we want to go and do is actually go and create a new instance of the text to speech class and go and authenticate against it so that's our service now set up now what we've done is three key things so we've gone and imported our authenticator and passed through our api key we've also gone and created a new instance of the text to speech service and pass through our authenticator and then last but not least we've set our service url so that's basically where our server sits in the world wide web so that's really our service done now now what we can actually go and do is go and convert some speech to text so in this case we're going to start off with a basic conversion and what we'll do is we'll just convert hello world now when we go and convert we pass through a string or a body of text and this is going to output a file that we'll then be able to read so when we're going to be outputting our speech what we'll do is we'll just output it to the folder that we've got our jupyter notebook in so you can see here that this is our jupyter notebook and we've also got a text file which we'll convert in a second as well so let's go on ahead and just convert the string hello world first to see how it works so what we've gone and done is we've created a new speech.mp3 file so this means that we're going to use this file object to go and write out our file then we've gone and used our text-to-speech service which we created up here and we've synthesized the words hello world so ideally our speech should come out saying hello world we've also passed through a couple of keyword parameters so in this case we're just saying that we want to output an mp3 and we've also chosen the voice that we want to use or the language model so the text-to-speech service has a bunch of different language models i've just used a u.s one and in this case it's a female us voice by the name of allison and then the last thing that we've done is gone and grabbed the result then we've gone and written out that file so if we go and take a look inside of our folder we should have a file called speech.mp3 and you can see that we've got a file called speech.mp3 now available so what we can do is play that file hello world and you can see that we've now gone and generated our speech so we've gone and taken hello world and we've converted that to output hello world what we can do is we can change this so in this case i might say hello youtube and test that out hello youtube and you can see that we're now able to really really quickly convert our text to speech what happens if we wanted to go and convert a text file for example rather than just inputting a keyword so say we wanted to productionize this and we wanted to pick up text files and convert those in real time well we can do that pretty easily all we need to do is just read in a text file so here i've got a bit of a speech from winston churchill now what we can do is actually go and read in this text file and convert that so we're going to follow a similar manner to what we just did here the only difference is that we're first going to read up our text file okay so we've now gone and read in our text files so if we take a look you can see that this is all the text that we had within our document so in this case what we did is we opened up our file and then we just read the lines from that now you can see that we've got a slash enter a new line indicator on each one of these different lines what we want to do whenever we pass our text to the text to speech service you have one single block of text so what we'll do is a little bit of pre-processing so to convert this entire array and number of strings to one single block of text so what we've gone and done is replace that new line indicator with a blank space so that's just going to strip that out and then we've gone and concatenated all of them together so what we've done is we've used a list comprehension so four line in text to allow us to loop through each one of those and then ideally what we should get out is a single block of text which you can see here now what we can do is follow a similar process to what we did up here so we can just copy this block now what we're going to do is rather than having hello youtube all we need to do is pass through this text block here so let's go and replace that and we're going to call this file churchill instead so ideally what we should now have is another file which is just exported called churchill.mp3 now this might take a little bit longer because again our text is longer now so it's going to have to convert a whole lot more but again it shouldn't take too long perfect so that looks like it's completed now if we go and take a look we indeed do have a churchill.mp3 file so we can play that we shall go on to the end we shall fight in france we shall fight on the seas and oceans we shall perfect you get the idea so we've gone and now read in a text file and converted that to mp3 so now the last thing that we want to take a look at is using a different language model so rather than just converting to english what happens if we wanted to convert in chinese or dutch or arabic or one of the other amazing languages that the entire world uses well the great thing about our text-to-speech service is that it supports a number of different languages so if we actually go to the documentation you can actually see that there are a whole heap of different languages supported so we've got brazilian mandarin dutch a whole bunch of different others now what we're going to do in this case is convert using french because i'm a bit of a frenchie now let's go and do that so rather than the core thing that we need to do whenever we're converting in a different language is just pass through the different voice so if we take a look at our different voices you can see in french we've got a few different ones there so what we'll do is convert using this one down here so renee v3 and we'll just create a new variable and then all we need to do is again grab our text block so i've got a lullaby that my grandma actually used to sing to me so it's called freya jacka i'm probably butchering that of jacques so let's go and paste that in so in this case all we need to do is copy our conversion block paste that in and this time we need to replace two things so we're just going to because our text is contained in a variable called freya we're just going to grab that offrair and replace the text here we'll call this french conversion and the last thing that we're going to do is change our voice because that's the core thing that we need to change whenever we want to use a different language model because we've got our voice contained in a variable here we can just copy that and just set this to this variable that's now done so if we go and take a look we've got a french conversion block as well and you can see we've now gone and converted now you can see that that's a little bit fast and that's because we don't actually have full stops here and that's because it's going to read it so it can ideally convert our text to speech exactly as it's written so if we actually add full stops here you can see that we're going to get a bit more of a pause between each one of these sentences so that's now converted now if we go and take a look at that freya foreign so you can see there's a little bit more of a pause between each one of those sentences now so it's replicating how we speak that about wraps up how to convert text to speech in a nutshell so just recap so what we've done is we've gone and imported out or installed our dependencies so we installed ibm watson we then authenticated against our service converted a string converted a file and last but not least we used a new language model but again keep in mind there's a whole heap of different language models if you want to get started thanks so much for tuning in guys hopefully you found this video useful if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of any future videos let me know what text you were able to convert in the mentions below and let me know if this video helped you again if you've got any questions at all be sure to drop a mention in the comments and i'll get right back to you thanks again for tuning in peace

Info

Channel: Nicholas Renotte

Views: 86,675

Rating: undefined out of 5

Keywords: python, tts, ai

Id: 8k8S5ruFAUs

Channel Id: undefined

Length: 13min 57sec (837 seconds)

Published: Sun Aug 16 2020