FREE AI Voice Tool: Best Opensource AI Text-to-Speech (TTS) - Amphion Better Than Bark!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] I may have just found the best open- source textas speech model out there introducing aen a free method to generate audio music and speech with its toolkit a is a tokit that can speak make sounds and sing enjoy [Music] it [Music] [Music] now wasn't that amazing this is the true capability of anthan now just keep in mind that its purpose is to support reproducible research and help Junior researchers as well as engineers get started in the field of audio music and speech this is something that is very amazing as it's able to do various different things in generation of audio as it can generate audios for sounds music as well as for speech this is something similar to something that we covered on this channel such as bark this was a TTS model that we've covered and it had very amazing cap abilities in generating audio in this case it's very similar to aen and it's something that we're going to Showcase throughout today's video it's another alternative for bark for example as this is a toolkit that is open sourced it's completely free and you can start generating various different types of audios this is something that we're going to take a look at throughout today's video as you explore how you can get started with MN as well as showcase some of the true capabilities about it so with that thought stay tuned and let's get straight to it if you would like to access our private a Discord where you can access subscriptions to AI tools for free get networking opportunities collaboration daily AI news you have various different AI topics being discussed on the Discord as well as a lot of support definitely take a look at the patreon link in the description below if you would like to book a one-on-one with me where you can access my Consulting Services where I can help you grow your business or basically give you a lot of different types of solutions with AI definitely take a look at the calendar Link in the description below hey what is up guys welcome back to another YouTube video at the world of bay ey in today's video we're going to take a look at ampen this is a toolkit that helps you create various different types of audios whether that's sound music as well as speeches it's made to basically support research projects that can easily be replicated it's something that will make it super helpful for new researchers as well as new users who are exploring audio music and speech generation now aan is quite unique because it's something that provides visual representation of classic models and structures which is something that I'll explain as you go further into the video this is something that can be practically useful for those who are basically starting out and want a clear understanding of how music is generated using AI now the main goal of ampen as stated under GitHub repo is that it's a platform for studying how any kind of input can be turned into audio it's not just for generating uh certain types of audios it's also helping you understand how it can generate it so that you can use it in other cases it's built to handle various different types of tasks such as text to speech this is something that is currently supported you have singing voice synthesis which is currently in development you have voice uh conversions which is also currently in development singing voice conversion is something that's actually supported with ampion you have text to audio as well as text to music now they're currently ly adding other methods which is something that they're going to release as time goes and this is a great way for you to study and get access to these generations for these various different types of audios now one thing I want to mention is that with the generation ampen also comes with various different types of V coders as well as different types of valuation metrics now a vcoder is something that is crucial for creating like different types of audio signals and this is something that shows this it ensures various different types of quality outputs as well as evaluation metrics that play a vital role in maintaining a consistent standard of the generation of the audio that you want to create in this case you can give it various different types of test descriptions lyrics notes speech and empan will use this highquality V coder which is the component for creating top-notch audio signals and then you get this output this is the basic framework that they've used to basically generate various different types of audios using AI now here's a table that shows all the open source toolkits related to audio generation we can see that all of these different types of toolkits are compared with different categories we have audio music and singing speech and visualization visualization is a new feature that is developed with aien and it's the only one that has this feature in terms of this category being ticked on now we have speech being like basically prevalent with all of these other types of toolkits most of them we know that bark is possibly the best generation for audio it's able to do everything as aen is able to do and this is why I really wanted to cover aen because it's a model that's capable of doing various different things that bark is able to do now bark is really really good but in terms of generation it could be a little off at certain times due to hallucination this is why I wanted to take a look at aan which is another alternative it's a substitute in a way that can do various different things that bar can't do in this case you have VIs ization now this is something that wasn't actually prevalent in the audio generation field there wasn't models that can actually do this and now we have something that can actually generate visualization with audio this is something that many of these models tried to do but actually failed and this is where we see ANM becoming a leader in this field to generate more capabilities compared to these other toolkits now before we go more into the technical I want to showcase how you can actually get started for the people want want to start using this now you can actually do this through hugging face there's spaces that they've developed for four different types of generation of audio you have text to speech text the audio you have natural speech as well as singing voice conversion now each of these types of spaces have their own specific generation type in this case you can do various different things with all of these different spaces so I recommend that you actually try it out see how it is and if you like it you can start generating it on your Local Host afterwards by down downloading the model which is something that I'll showcase later on this is something that's fairly easy there's two ways to install this locally you can definitely go onto the GitHub repo they have an installation guide as to how you can do this if you go down a little bit more you can first have to start off by cloning this repository but before you actually do that you need to make sure that you have this application called git once you have this application called git you're going to be able to clone the repository this is by going into your command prompt once you are over here you simply type in get clone and paste the link for the repo so in this case we already have get clone in the copy command click enter and this will start cloning the repository onto your desktop this might take a couple of minutes once this is done we can then move on to the next step this is where you're going to need to create a python environment this will require you to have cond so if you do not have cond you can simply download it it's very easy to do so but you simply just copy this paste this into your command prompt create the cont environment for aan once the environment is created you can then launch and activate in or emian sorry not emian ampian and once that is activated you can install the packages that are needed for the environment and start running it now they have detailed instructions for different types of tasks by using python in this case you can open up visual studio code and start generating text of speech you can start generating audio for singing or yeah singing voice conversions text to audio and voice encoders for example if you click on text to speech it will give you a detailed instruction as to how you can set this up once this loads up and if we go down it's able to show you a beginner recipe to demonstrate how you can start like basically using the m and Texas speech recipe on with python on your Local Host it's a detailed instruction as to how you can do this it gives you the code as well as how you can actually start doing this in my opinion there's an easier way to do this which I'll Showcase in the next step of the video but this is something for that I wanted to just demonstrate for the people who wanted to actually have this installed locally there's also different types of data sets that you can install so make sure you check that out before you move on to this step but it's fairly easy Once you have this downloaded you can then click by just typing in yes to proceed the installation once the installation is done you simply just need to activate the MN environment and then paste this link in and you're able to start using it by following the next instructions with each of these models but in the case that I'm going to be showing you now to install this locally you're going to need Pinocchio which I'll Showcase in the next step of the video now another method to install this is using text generation web UI now I have this tutorial up on my channel which I'll leave a link to in the description below this will actually help you install it with the oneclick installer once you have it loaded up you simply just need to have all the models ready you simply take whatever model that you want to work with in this case you have the text to Audio model you copy the model card go to text generation web UI once you have it loaded up with Pinocchio once you have it loaded up you simply go to the model Go download the model custom model or Laura you download the model you then load the model up and then you can start chatting with it and start having it generate various different types of text to speech audio that's as easy as that this is a second method to install and use ampen now obviously the best method right now for you if you want to test it out is obviously play around with the spaces so you can get a better idea as to how you can generate various different things this is something that we're going to take a look at in the next step of the video now they also have a huge directory as to like a repo on various different demos you can see that this is the TX speech demo for a ampen sorry and we can see that it's able to generate various different types of things in this case it's comparing all the other types of toolkits for example tortoise now in this case it it gives it a text and it generates the text for you so I'm just going to SK lower the volume down a little bit and we're going to compare these two and now I'm going to just have it generate or play this audio a tiny particle of the consecrated braid contain all the body and blood of Jesus Christ or a part totally of the body and blood does a tiny now this was generated using empan now let's see how tortoise does it consecrated bread contain all the body and blood of Jesus Christ now we can see that both of them do a really good job ampian has a different type of accent now this is something that you can set afterwards you can have it generated once again to set to a different type of accent now there's various different types of examples so if you're interested in this definitely recommend that you play through all of them now they have a huge repo on all of these different types of demos for each of the audio generation now this is something that you can definitely check take a look at on the repo you simply just need to click on all these different types of audio Generations now if you want to check out demos for Texas speech you simply click on it and go to the demo directory you can also do the same thing for w coder as well as singing voice conversion and that's basically it for the demos as well as installation now let's actually take a look at some of these other demos on hugging face spaces so for this demo I'm playing around with mn's text to audio hugging face spaces I gave it a prompt make it generate an audio for cars crossing a road now let's see how good it does now in my opinion it's not the best generation out there in this prompt that I gave it but somewhat good it kind of accurately gives you that generation it's very muffled and it's not the best quality now obviously as I mentioned at the start this is something that is in works it's in development and it will continuously get better as they keep on editing and making it better now if you look at other different SI of generation methods they have done a great job in other demos so don't just think like it's bad just because of this demo that I showcase there's various other types of demos that have been able to generate really really good types of audios with the prompts that they give it this is something that I recommend that you check out this is a really good alternative to bark for example this is a great way for you to use their TTS to help generate various different types of things I'll leave all the links in the description below for example the research papers so you can get more information on this as well as the hugging space spaces so that you can play around with the various different types of spaces they have now with that thought guys thank you guys so much for watching this video I hope you enjoyed it make sure you follow us on Twitter if you guys haven't already I'm going to be posting the latest AI news over here so definitely recommend that you check this out if you guys want to join our private Discord I highly recommend that you check out the patreon link in the description below but with that thought guys make sure you guys subscribe like this video check out our previous videos so you can stay up to date with the latest AI news but with that thought guys thank you guys so much for watching have an amazing day and I'll see you guys fairly shortly peace out fellas

Info

Channel: WorldofAI

Views: 5,062

Rating: undefined out of 5

Keywords: text to speech, ai voice generator, best text to speech software, free ai voice tool, bark ai, bark tts, bark, musiclm, audiocraft, text to speech software, text-to-audio model, transformer-based architecture, high-quality audio output, human speech, multiple languages, music, background noise, sound effects, tts, Amphion Toolkit, Audio Generation, Music Synthesis, Speech Generation, Text-to-Speech, Singing Voice Synthesis, Voice Conversion, Text to Audio, Text to Music

Id: gwrKk649-Pw

Channel Id: undefined

Length: 14min 22sec (862 seconds)

Published: Mon Dec 18 2023