AI Voice Cloning for Singing with RVC - Guide and Set-up

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today I'm going over the AI Voice singing software called RVC also known as retrieval based voice conversion and personally I think it's better than silvitz SVC to show this I've got songs that I've already inferenced one from RVC and SVC and I'm just gonna play them so that you can hear the difference between the two so let's go ahead and do that right now yeah all right so what did you think convinced well let's go ahead and get started with the installation as I'm going to show you how to install it in today's video and one more thing sorry today guys everything is going to be locally ran so I'm not going to be showing any Google collab luckily installation is super easy for Windows users and I think maybe even Mac um but we're gonna head over to the main part of the GitHub so just click this retrieval based conversion web UI and when you scroll down you'll see something like this we're going to go ahead and translate everything to English so just right click and Google translate will do the work for us but we're going to go ahead head on over to this releases Tab and what we're going to do is download this right here which is the complete package where everything is installed so let's just go ahead and open link in new tab it's going to pull this open here and then you just go ahead and install it into a folder down here so just find a folder install it and I think it goes without saying install all these at your own risk do your own research and don't just trust me blindly but from what I've got this is for the most part safe so um this RVC beta you're gonna end up with a zip folder that you're actually going to need to use some type of tool to unzip it so so what I used was this pre-zip which is which comes with the Microsoft store so I already unfortunately deleted the RVC beta folder so um I can't really show you how that was done however once you have it done and unzipped you want to go ahead and double click into it and then you have this RVC beta here so I'm just going to move it one folder up so I'm gonna going to do copy and then go out and then paste so you don't technically need to do this but for my sake of mine and for my ease of convenience I like to do this so that I just have to click into it once so you don't really have to do that if you don't want to wait for it to copy and once you have it done now when we click into it we have everything in here okay cool so before I go into running everything if you did want to install this locally using python to do that it's pretty easy if you head on over to scroll down and click this English down here you can scroll down more and see some insta isolation instructions I recommend you just run this pip install line here and then run the PIP install requirements in a virtual environment I didn't use any of the Poetry or this line at all to do the installation and then what I did was just move all of these from the zip folder into the folders that I installed with vs code but I think that's just a roundabout way of going about it if you wanted to use to install it manually so with that out of the way we're going to go ahead and now jump into here and this is super easy you just go ahead and go to this go go web.bat but before we run that I'm going to go ahead and open with code so that we can see what's actually happening so here we have a runtime python executable um it's running the infer Python program and is opening a web page on this port here so we're gonna go ahead double click into it and then this command line window is going to open so you're going to see a use language English us and a couple of other things pop up and then it's going to open up a browser so you'll see a Windows alert here Windows Defender Firewall has blocked some of those features and I'm just going to go ahead and allow access now you have it running locally on this URL and if you don't have a page that opens up you can always open one up locally here I'll even show in an incognito browser if you just type in localhost 7 what is it 7897 localhost 7897 it's going to pop up this webpage that we have here so here we are we have the page with all of the tabs uh we're only going to focus on model inference accompaniment and vocal separation and train these other ones I'm not going to go over and there should be an FAQ section here but it looks like it's broken so the first thing is we need to separate vocals from audio I recommend you use uvr and Export it out to a WAV file as these don't allow you to export it out to a WAV file unfortunately if you want to see how to do that go to the Soviets SVC video I have for Google collab where I show you how to install uvr ultimate vocal remover if you don't want to go to that video just go to this GitHub page and install it with this main download link here okay done um but for those of you who don't want to install that this one does the job decently well to separate both of them so I don't think it really matters which one you select here I tested both and they both sound the same so hp5 is what we're going to use and I'm not too sure what they're talking about here so all we have to do is select the input audio folder path so what we do is I created a folder called songs inside of the RVC beta folder and inside of this is a song so all you have to do is Click into the folder go into the address bar up here click copy address and then just paste that into here so all you have to do is paste and you're done and then we're going to specify into the same folder you know the same the same place and for instrumentals and vocals and all we're going to do is click this gigantic convert button so click convert and then it's going to take um some time depending on your GPU your CPU and do the conversion so let's just wait for that to finish up here we are file style file success and here we go we've got the two files here a reformatted wave and you can hear the instrumentals and all that okay cool I'm not gonna play any more of that now you've got a vocal file saved so next thing you're going to want to do is train a voice and this is probably the most important part that all you guys are waiting for all we're going to do is head on over to this train Tab and before we actually go into training if you don't have any training data um check out the Soviets SVC video where I go in depth on how you can record your own voice and create samples for that and so we're just going to be reusing those same samples in this video so inside of that RVC beta folder I went ahead and created a folder called voice I made a new folder called me and inside of this me folder I have all of the raw files inside of here so you have to make sure that they are the audio files you cannot have a folder inside of here because if you have a folder in here it won't grab the audio that is inside of this folder so make sure everything is in a WAV file or an audio file inside of this folder um inside of whatever folder that you're going to link it to we're going to go ahead close that for now and enter in an experiment name so I'm just going to call this myself and um because we're going to be using V2 we're going to want to select 40K for our targeting sample rate for this we're going to select true as singing must voice cannot um and then we're going to use V2 here and for the CPU threads you have to adjust for your cp2 for your CPU I'm going to leave at 32 but I recommend doing something like 16 or 8 if you have a smaller CPU and inside of here is going to be the input folder training path so that's going to be where those audio samples come in handy so let's go ahead open that folder once again and go ahead and copy that address so you go to the address bar I'm going to click me copy address and go ahead and paste so we've got that pasted in there we're gonna leave this as zero and then we're going to process the data so it's going to take a little bit of time to process the data and here we have a bunch of lines and yes the data has been processed the next thing we're going to do is head on down over to this step 2B part where we're going to select how we're going to use the what type of pitch algorithm we're going to use so it's going to automatically get your GPU information here so just go ahead and leave all of that and we're going to select Harvest Harvest is going to be the best quality they say but the slowest processing and since we're going for Quality we're going to be using Harvest so go ahead and do feature extraction here it's going to do that I believe this is equivalent to pre-hubert inside of um so if it's SVC and then we're gonna go into this fill training section down here um now that this pre-huber is done so but what I like to do is modify this for how much frequency I want to save it so I'm going to train for 200 epochs so I'm going to do 200 and I'm going to save every 50. this is just my preference um I don't want to save every Epoch because I want to save some space so we're just going to go ahead and do 50 and for this I'm going to do Max size because this is what my my CPU my GPU can handle but you can adjust this based on what you have I'm going to select no here to save disk space we're going to select no here to save more space and then and then we're going to leave this at no here so um this is where your pre-trained models come in they should already be loaded in here and once you have all that we're going to go ahead and train so let's just go ahead and click one click training and it's going to go ahead and start and so I made it a little bit smaller so that we we could finish so that I could just show you how it exports so we got five and five here one click training and in order to see how it's going you can head on over to this command line window that popped up and you can see how the process is going through so this is going to be how you can monitor what's happening and if you're getting any errors so if it's not training or for whatever reason it's not continuing you might be able to find some some important information in here and so here we see myself training Epoch one Epoch finished here and here we have it we have five epochs done and the program is now closed so success so to see um where our folders if we go into the folder now inside of Weights you'll have a myself uh pytorch file here and then inside of logs you will have some additional um files here where if you ran into any issues you could just be continued training so to be continued training all you would have to do is input the the same experiment name make sure you set all the stuff back to the same values make sure you get your your training folder path all these and then you would click train model to continue training your model and so here you can see that it has loaded in this checkpoint here um Epoch 5 and since this one is already finished it's just going to finish out so cool now that we've got our model trained we can head on over to model inference where we'll go ahead and use the model and so what we want to do here is refresh Timber list and index path and here you have the um the path so myself pth pytorch and if you have if you change it for longer if you had a larger Epoch you'll probably see something along the lines that looks like um myself like 130 or myself 150 I'm so on and so forth but since I only did one Epoch and one training interval I just did myself so that is what I have here leave this speaker ID at zero um this is going to be for transposing your pitch so if you're converting from a female singer to a male singer you probably want to go negative 12 so that you're one octave lower um but if you're converting male to male you'll probably do um zero and vice versa if you're converting male to female you'll probably do 12 and you get the point there so the other parts are going to be pretty self-explanatory as well we're going to enter the path of the audio file to be processed and this is going to be the extracted audio file remember remember how we created that songs folder and we got the vocal and instrument file well we're going to go ahead and now copy the path of this instrumental file sorry of this vocal file to that area to make it easy on myself I like to rename it something simple like vocal just in case something happens and um to copy it you can just right click and do copy as path if you're on Windows 10 or somewhere else you can just write click it go to properties and then copy this locations area into there and then copy paste the name after so that might look like this so copy paste go back copy and then do a backslash and paste so that's how that would you would do that if you don't have that copy path that copy as path feature so here we have it we're going to go ahead and use Harvest to have the better pitch and then we're going to leave this here at three um and then as well we're going to select in our Auto detect index path directory so for this click the v21 and then we can leave everything down here as the same so now all we have to do is just click convert and we'll it'll start converting if you're wondering what this is down here this is for batch conversion which we're not going over in today's video um and I haven't played around with it but you can try it out and see how you like it and we're just going to go ahead and wait for this to finish all right and here we go let's go ahead and just see how it plays and this is a super quick trained model so it's I don't know how it's going to sound okay so not bad for about uh two minutes of training of course if you train it longer you'll get the result that I had earlier which is much more clear and a little bit more crisp so um to go ahead and save this it's a little bit different than the sovits you have to go ahead and click these three these this area over here and click download and then you can save it in whatever folder you want so you could save it in like the songs area you could rename it and boom save if you're wondering where they're actually stored inside of the folder if you close out of the web web browser it deletes everything if you are wondering where it generates it it's going to be inside of this Capital temp this temp folder so if you just go into there you'll see that there's a WAV file in here so if we did a second generation we'll actually see that there's going to be a second WAV file that pops up inside of this folder directory so here you go you see that second one pop up and yeah alrighty so that is a bulk majority of how you get everything up and running and of course you may run into some issues so let's head on over to the Wiki of the RVC page so that we can take a look at the FAQ section so let's head on over to the GitHub page click on this Wiki area and then if you scroll to FAQ frequently asked questions now you can see a bunch of FAQ questions that you might be able to um you know that might answer some of your questions so check here before you ask any questions and then as well you can go ahead and you know select some of these other areas inside of the inside of the wiki but be be aware that some of them are in Chinese though you'll have to use um right click translate to English so so you can even see the steps here as well if you want a written explanation of how it's done and that is going to pretty much sum it up alright so hopefully you found that helpful and have some fun with RVC like I said if you want to do any of the audio curation or any of the data curation for training your voice go check out that sovitz SVC video that I did on Google collab as I have an extensive tutorial on how you can split the audio and do all of that sorting so go check those out if you need help with some data and that's going to be today's video so see you guys later and until next time

Info

Channel: Jarods Journey

Views: 214,539

Rating: undefined out of 5

Keywords:

Id: hB7zFyP99CY

Channel Id: undefined

Length: 16min 29sec (989 seconds)

Published: Tue May 30 2023