Create a Perfect text to speech in python in 10 minutes with Coqui TTS

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to this new video on this channel today we'll do something really special we will call the complete um text-to-speech synthetizer with python and in the next video i think it's a surface 2 video the next video um i will deploy it online so you can play with it so i will show you the process of creating a really good text-to-speech synthesizer and the process of deploying it online on the web server and then sharing your urls or your friends or whoever can play with it so let's get started in order to create a text-to-speech synthesizer in python we need some basic modules and to install those modules you need pipe i hope you have python installed on your computer so python commit pipe so the the thing that you have to do just install dhts5 pi for example right now i have gts in my computer so i don't need to install it anymore but you have to install this before and then let's create a file called tta supply this file will just remind you to install the requirement [Music] requirements and this will be peep install tts okay and the next step is to import all the modules that we will need to use perfect so we have to go in the tts module and import a synthetizer so we have to do from tts.utils dot synthetizer involved exercise i hope i wrote it right um then i think it's sensitized a bit with s and then hash here yes and then kinda okay great and uh we need to define some variables like the pathway sensor type is actually installed in my computer this is let me check this is [Music] [Music] [Music] slash python3 packages yeah and in this folder there's a file called modded.json that we need to import and it will this will give us uh some parameters for models and after that we need um the manager the manager will help us to load some model to download the model actually and to download some recorders so we need to import the model manager from tts the tutorials dot um manage import modern manager okay waiting just called managers here mother manager no no [Music] manager is equal to another [Music] manager staff this will yeah okay great and then the next step is to load the model path then the config path and the model item [Music] and for that you go to the manager and then you call the function called download [Music] model and for that we passed the link all the directory where we have the models and in this case i think the default directory of your graph model in dts you can see it on github it's actually gts model slash language we can just see it by typing here gts after you've installed it just yeah lists models and we will see a list of all models you will see you can just take the one you have downloaded so i will just take this one okay yes and then perfect okay and then after that we need to synthesize our variable we need to create a synthetizer instance i said i don't know just call it sent this equal to sentence or another synthetizer and for that we pass a lot of parameters the first one will be gts checkpoints and we pass the model path that we had previously you have to pass the gsconfig file also and you pass the country fact that you have that we had previously [Music] okay great and then [Music] here we can pass some text for example text is equal to i am a text created by a computer and then we continue by defining the outputs as being our variable sets it's called synth yeah dot tts and we pass the text and then we just do same dot save wave save waves outputs and then we pass a file random file name like audio that way for example you save it and then you try it here buy it and by typing python dts.pi [Music] okay um synthetizer i think the problem comes from the way i um actually import it i can install synthesizer or synthetizer synthesizer synthesizer something like that yeah fantasizer so no no that's not the way it's small yes and then the model works and uh we got an audio let's play this audio moment a browse to the audio so i can fade i am a text created by a computer uh it's actually really bad i am a text created by a computer yeah it's working but it is actually really bad we have to add a second module out of it i mean we have to add a vocoder to make it sound more human and for that we will import vocabulary here we'll use the same model manager to download the vocoder and we call it vocoder work path um config path also yes and then a random random just need this is equal to manager download manager but download [Music] and in this case we use the model item and another item while adding that previously and in this item we get the different color name default vocoder yes so we can know now passive vocoder here called vocoder vocoder checkpoint [Music] yeah look at the checkpoint is equal to vocal path work path there and then focus config it's actually the same same thing somehow vocoder config is equal to config path nothing is called func let's correct that work config path yeah perfect so let's try that again perfect let me play it i am a text reading by a computer it seems better since far more better okay so let's test it further so what i will do is actually paste a really long text it comes from the last uh blog post in your dot ai i'll just try to read it and see what it gives it will take time and it will use computer resources let me play it for you so you can see the output index we will show something a bit more technical we will be preprocessing all of the databases that we have downloaded from the last index as usual here if the video link from youtube but this article will be more detailed it seems not bad at all i think there's some punctuation problems but i will check again with uh other under the text to see if the problem comes from the text or the synthetizer so it does that okay play again great it will launch the notebook and you will be able to write and debug your app easily firstly we need to import the packages that we will be using great i think that's perfect okay so uh i think it's all for today it's really simple i'll post this code on gist and put the link down in the description and uh in the next video i will just try to optimize the synthesizer and deploy it online so you can test it yourself so see you in the next video bye

Info

Channel: LTPhen von Ulife

Views: 35,404

Rating: undefined out of 5

Keywords: tts, python, text-to-speech, ulife, deep learning

Id: 2kS0SUCpOgo

Channel Id: undefined

Length: 12min 39sec (759 seconds)

Published: Thu Sep 15 2022