Testing RVC's Realtime AI Voice Changer

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so this is the real time voice changer that's built into RVC and it um I think it works pretty well so I was browsing through the RVC GitHub repository and it looks like they have their own voice changer so in today's video we're just going to go ahead and check it out um and get it all downloaded and extracted so on the English page you can see that they've got a real time voice changer gooey and I believe we just run this bat file so if I go ahead head on over I already have it downloaded but it's in the releases area they have a new one um which is October 6th and then I downloaded this one so I've got it all downloaded and extracted so it should be as simple as going into it and then running um go real time gooey bat so I've never done this before on RVC I've always just used W okata and haven't really been looking at voice changing uh recently but um well let's go ahead and take a look at it and uh see what we got so I believe the latest version of RVC everything is located inside of the assets folder and then inside of weights and it looks like they distributed some with it so um let me get some headphones on all righty I've got some headphones on I'm going to reload these devices I assume and then the speakers all right and we got headphones um we've got an index file whatever this Kiki is but see response threshold I don't know what any of these do let's just start the audio conversion um hello test testing audio testing loading oh okay okay hello okay hello it is super loud let me turn it down super loud let me turn it down okay hello okay hello delay test test delay test all right so it is working it's maybe I need to change the pict maybe I need to change the hello hello all right so I'm going to stop the audio conversion real quick there and load in some pth files from RVC models that I've trained all right so um I'm I've got some other models in here let's use the Ming one um and then um I'm not going to do any index so index zero pitch settings response threshold rmvp I think this is good enough so if I start audio conversion audio hello testing hello testing oh there we go it's working oh there we go it's working okay lag let me play around with it a little bit and then I'll be back once I find some good settings all right so um I think I found some valid settings or the best settings for RBC real time gooey um the spot threshold right here this is for the uh volume of the microphone that it takes in so if I do - 60 it's going to capture all of like the background noise so you'll hear a little bit of um artifacting and then if I bring it down to like -40 the background noise will go away so the pitch of course is the pitch of the boys so if I change it around um that is going to be kind of like the same pitch as from W oara uh 12 works for male to female which is what I'm doing right now and next would be the index file um I the Kiki file just killed it when I tried in when I tried increasing the index rate um and then of course you've got Lou that which is just kind of like a static gain I believe and then it looks like inside a performance setting so if I start changing the performance um it's going to stop the voice conversion and then you can see that there's a algorithmic delay right here so I don't actually know what the delay for how it does a calculation but if I put it to like 0.5 and then start audio conversion we can see that the algorithmic delay is increased here so if I bring it back down to 0.5 it'll go back to about half a second as we can see here um we've got it at about half a second now if we decrease fade length as well it decreases theic delay but now you can set here there is a little bit of uh kind of like a pulsation of the voice which isn't that nice so yeah for my GPU which is a 490 um it seems like the sample length at the lowest and then fade length at Max does a decent job here now this is different than w o with all of the other options and I'm going to go ahead and jump into W's most recent one um just to check out what's changed there but I do know there are different pitch detection algorithms that work on there that might make it a little bit better so look go ahead check that one that out okay so um looks like w okata is already at version 15 and so I went ahead and downloaded the latest one and so that one is at this hugging face link so if you click on the hugging face link it brings you here and for me I downloaded this bottom one where it's um you can see it's 3.15 and if we go to the folder I have the zip file here so let me just extract it all right so that took a little bit of time to extract but here I have it here and oh it's inside of this folder so let me remove it all right and then let me rename it to real time voice changer 1.5.3 point1 15 just to keep track of it and here we are so we're just going to start it and it's going to download a bunch of stuff so I'll have to I have to wait until that's done I'm going to go ahead do more info run anyway one more download okay here we go it's all downloaded and done as you can see all of this stuff is here let's go ahead allow access and then um here we are okay cool so um for the most part everything looks about the same as I remember so if I go to toomi Chan here um the only thing that seems to have changed is well nothing really nothing not too much um but the thing about the W okata voice changer is that you have access to Onyx models which should be a little bit faster so I'm going to close out of rvc's gooey just in case it's causing any um latency issues or anything like that or what what not and load in some personal models So and I've got um and I've got some models here so uh let's go ahead and do our dooy model here and upload that so we've got that in there and let me set up some settings real quick and here we go we've got um all of our settings set up we're using the Onyx version and so that's the difference right here is the W okado one has different um frequency detection algorithms that you can use like the Onyx the tiny and the full um but we're going to use Onyx and Onyx is supposed to be faster so let's start it and here it is is going um so you should be able to to here and and um look that's different is look there are we can go all the way down to one for chunk size which I don't think we can do um so I'm going to slowly bring it down and we'll see how good it actually is so I'm testing audio at 16 and you can um see a little bit of pulsy in the voice the previous lowest was eight so what's go eight and now you're getting some out that okay so that was a little bit too much and my res is now back to normal so 16 you can do 16 maybe I my so if I bring extra let's go to eight oh nope it's not going to work okay so note to self you can't go below eight oh you can't go you can't go below 16 right now I'm on a 490 so it is you know top of the line so it should be good to go there maybe I need to change to server on the audio so let me do that real quick I'll change the sample rate to 48k I'm going to select the USB audio Cod and then for the output I'm going to use the Wasa headphones which I have selected here so maybe this will be faster who knows let me try this one out hello hello testing testing okay so I think this is sounding a little bit crisper already um and if you take a look here this is what's happening on the server so maybe I can bring it down the trunk it so let's try that now okay I it here um and here and it sounds terrible all right so 16 it looks like is going to be the lowest that you can go on W's voice voice changer still and I think all of these settings are about optimal so I mean I could change the truck the extra a little bit but if you CH be extra want to yeah you're still not going to get anything and if I go down to let's just try it at one and see what happens it just ends up sounding like um just sounding like wings on a flly or something like that that's Bing by ear super close and that's a little bit terrible but this is the I guess the voice voice changer um it's still F it's still fantastic as always and has minimal lag on my 4090 um but it looks like I think it to me it just sounds like the performance is a little bit better on 24 than it used to be and that because maybe of this VP andore onics right here um which I don't think was present in the ones that I was using previously so um yeah this is really quick you can see the voice as almost like well the delay on it isn't that bad and I'm not getting any delayed audio feedback when I'm using my headphones and can actually speak and not be confused because if you've tried using the voice changer on a higher delay it gets a little confusing um when you are try to speak because you have a little bit of lag on what is being said and it's a little bit weird so um this is like a phenomenon and it's caused by that delayed audio feedback um and so it just makes it hard to speak sometimes but IBL on so I finally got to testing out the real time gooey on U provided by RVC I think someone did mention that they had had one on the RVC repository but I never looked into it and I kind of forgot about it so when I was going over it again in today I just thought I would make a video on showing it and then just checking out the state of the realtime voice changing for um you know the voice cloting so that's going to be it for today's video this is kind of just more of a showing video I didn't really do any tutorial in here I just kind of wanted to just free flow it and I think I'm going to be doing a little bit more videos like that so yeah I will see you guys in the next one and see you later
Info
Channel: Jarods Journey
Views: 20,237
Rating: undefined out of 5
Keywords:
Id: TZ2odHFHOp0
Channel Id: undefined
Length: 12min 40sec (760 seconds)
Published: Mon Oct 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.