The BEST, Local Text-to-Speech Generator - AI Voice Cloning (Tortoise TTS)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today I'm going over tortoise TTS which is a text-to-speech software that is pretty dang impressive and personally I preferred over 11 Labs so let's go ahead and jump into some audio samples too see why I think this and to do that first we're going to go over the actual audio and then I'm going to Overlay the text-to-speech audio over it so for this I used Elden ring um because Elder Wings the most recent game that I played and the characters have phenomenal voice acting so we're gonna go ahead and overlay some of that um first is going to be the actual audio sample then the tortoise Texas Beach Audio sample and we'll see how it compares Traveler from Beyond the fog I am melaner I offer you an accord have you heard of the finger maidens traveler from Beyond the fog I and Melanie I offer you an accord have you heard of the finger meetings so what did you think I personally think it's pretty fantastic and of course in the original audio there are some parts of the intonation that are just much much better and sound way more natural um as you can tell the audio is teortis TTS however it is really good in my opinion so now that we've compared it to Ground Zero the original audio let's go ahead and compare it to 11 labs and that's what I have here I have a bunch of samples with 11 labs and then the tortoise TTS generation so El stands for 11 labs and these are the um tortoise TTS Generations what I'm gonna do is play each one of them one by one and you be the judge of which one sounds better today from Beyond the fog traveler from Beyond the fog I and Melanie I am Elena I offer you an accord I offer you an accord have you heard of the finger meetings have you heard of the finger maidens so you be the judge there I use the exact same inference samples in 11 Labs as I did for my tortoise TTS the only thing unfortunately about 11 Labs is you can't train the base model or fine tune it in any way you can only give it audio samples to inference off of which is why I think towards TTS can get much closer to the original voice than 11 Labs 11 Labs is a little bit more crisp as you can hear in the audio traveler from Beyond the fog however tortoise TTS gets the intonation and overall replicates the feel of the voice much better and you can even do funny things like this I offer you 100 Accord so how does this work well it is a GitHub repository will not necessarily GitHub it is a get dot Ecker which is based off of git I believe um and it is different than the tortoise DTS released by Neon bjb or yeah and this one allows for voice training so here I already have it all set up today's not going to be a setup tutorial unfortunately so that would be for later but here I already have it all booted up by running the start dot bat go ahead head on over to Edge we'll go ahead and open up 127 um ending in 78.60 which is going to be this tortoise TTS gratio page so here we have it and so there are a bunch of different tabs here you've got to generate history utilities training and settings so training you can train everything here where you prepare generate and then run the training and then inside of settings you can have the you can change which model you want to use based on ones that you have trained so here we have Mel which is Molina and if we go into generate um I have a voice here for Mel and I go ahead refresh it and let's just go ahead and generate a sentence so um so here we have some audio from The Great Gatsby and we're gonna go ahead and just copy over this sentence here and throw it into our gradio page where we're going to have Molina generate the voice so here are a bunch of settings I've gone ahead and done two for samples and 32 for iterations which basically means it's going to be a fast generation but since I trained a model it's going to still be relatively good and what I found out is inside of experimental settings is if you adjust some things like length penalty and repetition penalty it actually helps out with the audio voice to be a little bit more succinct and less AI sounding so here we go let's go ahead and listen to the first generation here in my younger and more vulnerable yes my father gave me some advice that I've been turning over in my mind ever since so that was pretty good it messed up on since it kind of said sin at the end um and so you know this one you can just continue generating audio samples of course you probably need a decent GPU so anything 30 series or above um I think even 2080 and maybe like 1080 will get you able will allow you to use this but I don't know about training um and that is just one voice that I have here that I trained on Molina's voice um let's go ahead and take a look at some of the others that I've done so inside of settings let's go ahead and let's do Godfrey so here is a Godfrey one from also from Elden ring and let's go ahead and generate a generation for this so I'm gonna go ahead click generate and I just made it a little bit smaller so that I can see all of the page so here we go let's go ahead and play this my younger and more vulnerable years my father gave me some advice that I've been turning hanging over in my mind ever since and so if you can hear that raspiness in there and you know the Elder Marine character that is pretty accurate um and as well I only trained this on two minutes of audio so that was pretty wild um that I trained that on less than you know 10 minutes of audio and still got something that's pretty decent let's go ahead and head over to one a couple more we got Gideon so let's go ahead and do Gideon and my younger and more vulnerable he is my father gave me some advice that I've been turning over in my mind ever since and that sounds like Gideon um the audio that was trained on it has a little bit of Reverb in the background which is why it kind of sounds robotic but all these models if you train them on very clean audio you will get a very clean voice here let's go ahead take an example of Eno which is another model that I have trained in my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since so as you can see that voice is a more cleaner version of a girl's voice whereas the other ones were a little bit a little bit lower a little bit more I guess you could say medieval because of course they're from Elden ring which is a fantasy um world and and so there you have it that is tortoise TTS and some voices that I've trained as I said I'm not showing how to go over and do the training yet that'll be reserved for a later video but I do want to share some projects that I've created and used for this tortoise TTS voice with two AIS talking to each other alright and so here I have it I have two tortoise TTS engines running and then I've got this one which is running the local LOM so we're gonna go ahead and run it and I'll go ahead and just show you how it's running so it's gonna go ahead and run and then create some dialogue down here let's just go ahead and give a listen to the conversation between the AIS yes we had our share of struggles in our time as Rivals but in the end it's just a game of chess it's not as bloody or brutal as it once was and those who don't play it won't win it's just a matter of sometimes coming out on top and finding new ways to strengthen our rivalry Gideon I agree it's just a matter of adapting and staying ahead we may have adjusted to the game of chess but it doesn't change the fact that the competition is fierce do that so that is my most recent project that I've worked on having two llms talking to each other with the TTS voice and that was pretty cool another one that I've worked on is incorporating into Vivi for my AI streamer so that's all on YouTube so you can go take a look at that and then the other one that I did was an an AI audiobook voice narrator and that one I streamed on YouTube but let's go ahead and take a look at that one and that's gonna be the last one for this video so here we have it we have two things we have an audiobook maker so you can make your own audiobook and then we have an audiobook narrator that just narrates audio inside of a text file so here's the text that I have here it is The Great Gatsby and here is the narration so we'll go ahead and run narration first and see how that goes and here inside of the tour.yama we have the voice we're going to use which is going to be Mel for this case so let's just go ahead and oop let me turn this down as well to 32 so it generates pretty quick so narration let's go ahead and run that and let's just go ahead and hear the narration and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since whenever you feel like criticizing anyone he told me just remember that all the people in this world haven't had the advantages that you've had okay so there you have it The Great Gatsby with with Melina and let's go ahead and do audiobook app maker so we're actually gonna do this with let's go ahead and change it to Gideon let's save that and run this audiobook maker and here is a here is a window that pops up that allows me to make an audiobook so I'm gonna go ahead and load sentences inside of this text file are the sentences so I just go ahead and load this and then here I can adjust how long of a pause I want between each sentence uh let's just go ahead and do a uh 0.5 second pause between each sentence and then I'm gonna go ahead and play the first sentence to generate a response longer and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since if I like it we're gonna go ahead add that to audiobooks so let's click that and then we'll go ahead and generate the next sentence and by clicking play whenever you feel like criticizing anyone he told me just remember that all the people in this world haven't had the advantages that you've had so awesome we've got two sentences and we're gonna go ahead go ahead and add that to the audiobook and we'll go ahead and take a look in our files and now we have a um a fully concatenated audio file with those sentences so let's go ahead give it listen to the 20 seconds [Music] my father gave me some advice that I've been turning over in my mind ever since whenever you feel like criticizing anyone he told me just remember that all the people in this world haven't had the advantages that you've had so there we go I'm not going to go over all of the functionality that I coded into it in this video but that is an example of a narrator that I made and then an audiobook maker that I've made and I kind of cheated because I already had these coded up so it was simply just plug and play to swap out the Text-to-Speech voice generator so that is all I wanted to go over in today's video tortoise TTS has been my most recent excitement because it is so good um compared to other TTS other TTS is out there and yeah it's pretty exciting pretty crazy pretty spooky um all of the above and that's going to be today's video hope you guys enjoyed it and I will see you again for a future video
Info
Channel: Jarods Journey
Views: 35,172
Rating: undefined out of 5
Keywords:
Id: dMymrRZDU3c
Channel Id: undefined
Length: 11min 39sec (699 seconds)
Published: Mon May 29 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.