this tutorial I will be teaching you how to install RVC or retrial based voice conversion to clone the voice of any person you want I will be teaching you how to install RVC train your own voice model and run inference on audio files as well as process the voice changer in real time in order to follow this tutorial you will only need two requirements the first one is you will need to have a Windows 10 or Windows 11 install and the second one is you will need to have an Nvidia GPU if you meet those two requirements let's get started the first thing you will need to do is to go to this RV compact GitHub repository the link will be in the description of this video after you have opened this repository you will need to download it to your PC you can do this by either cloning it or just Al loading thez archive we are going to be using the second option today go ahead and click on the code and then download zip this will start downloading the do zip archive once it has been downloaded we need to extract it to the folder you want to install RVC to for me I will just put it on the desktop please make sure that the location you chose has enough space I would expect at least 5 GB of free space for the installation to run successfully now that you have decided on the location to install RVC 2 it's time to actually install it open the extracted folder and launch the launch RVC dob file this will prompt you for the administrator privileges just click yes now a small console will open please read carefully everything that it says as it is very important once you have read the warning and make sure that no file and folders are in the same folder as the installator just type yes do as I say it will start installing the RVC this process will take around 10 minutes so relax and wait for the installation to finish I will just fast forward this bit now you can see this window that means RVC is installed you'll see two possible modes to select from this menu the first one is train and inference and the second one is the real time inference the first mode has been used to train the model and then run inference on existing audio files second mode on the other hand takes an input from microphone and processes The Voice Change in real time I'll show you how to use both of these modes the time Stamps will be in the description and you can fast forward to the part that interests you the most let's begin with traying our own model we choose mode number one it will start loading RVC in this console and once you can see this URL all you have to do is to copy and paste it in the browser however if your setup is somewhat broken just like mine you might encounter this error of page not being accessible the fix is pretty simple all you have to do is to just replace the with Local Host and then press enter and then the web UI for RVC will load successfully now we are approaching the most interesting and the hardest part of the video we will be creating a data set for the model training the thing is we need at least 10 minutes of high quality speech to create a realistic sounding voice clone you can do that by downloading or recording YouTube videos or streams anything that contains the person voice will do the job now before we continue go ahead and create such data set so I can explain you how to remove the background noise or background music if present so your data can be as clean as possible for TR great so now I expect you to have a data set the first thing we're going to be doing is go into this vocals accompaniment separation and reverberation removal tab here you can select the folder where your data set is located or just process of files One By One The text at the top describes every possible model you can use so read that and apply whichever processing step you want as a rule of sum I usually recommend running at least the hp2 all vocals model to isolate any noise even if your data set is perfectly clean you can select an output folder for the vocals and noise SL music over here once you have done all the processing you want go to the folder you put as an output folder and get the clean data now we can go to the train Tab and start actually training the model in the experiment name you will have to write the way you want to call your model so for example if you want to train the model of Elon Musk you can type elore mask in the Target sample you have to set 32k now this might sound counterintuitive however setting the target rate to the highest value doesn't actually result in a better voice clone now if you don't see an option for 32k it is a bug to fix it you will have to click on the version one and then go back to the version two the 302k sample rate should now appear select it and let's go to the next option now the next option is pit guidance P guidance should always be turned on even if you don't plan on using your model to create AI music covers as it improves the overall br relability of the voice now the version should always be set to two as it is much better than version one the number of CPU processes should be set to around half of all your CPU cores however this does not affect performance too much so you don't really have to worry about it now that we are done with the base settings we can proceed with the data set preparation go ahead and paste the P to your data set in this window this should be the folder with all of your audio files to train the voice from the speaker or singer ID should not be changed from default just leave it at zero now go ahead and click on this big process data button and just wait for the processing to finish once it has been finished we can go to the next step here you will need to specify the GPU to run the training on if you have two different gpus just like I have just select the most powerful one if you only have a single GPU you don't have to do anything here at all the rmvp GPU should also not be changed it is the best option here now click on the feature extraction and wait for the process to finish now we come to the part where we select our training specific settings here everything should be adapted to your own data set as a rule of some I always put safe frequency at 50 and total training pox at 200 however you are free to set it to the values you want feel free to experiment with it the batch size per GPU depends on how much vram you have on your GPU in my case I have 24 GB of vram so I can Max it out at p 40 for your GPU you'll have to see what max value you can achieve without running out of memory here set the save only the latest checkpoint to yes set cach to yes if your data set is under 10 minutes otherwise just leave it to no and now finally put this save a small final model at each save point to yes in here do not change the pretrain base model G and D and set the GPU index to the most powerful GPU if you have multiple gpus as we already did above now now we have everything set up click on train model and wait for the model to finish training grab a coffee or something because this is going to take some time after we have trained the model we have to train the feature index just click here and wait this is going to be much faster than training the model wait for this process to finish now we are finally finished with the training part and we can begin starting running inference of our model firstly we'll be running the inference on the existing audio file files and then I'll will be showing you how to run inference in real time let's go to the model inference Tab and let's click on refresh voice list and index pass to make sure our newly trained model shows up in this inference voice list you will have multiple versions of the model here try them out and decide for yourself which one you like the most at this point you might also want to select the index model in this drop- down menu here now transpose option is a very subjective setting what it does is basically makes voice higher or lower in Pitch as the rule of sum you can leave it at zero if the gender you're converting from and to is the same for male to female conversion you will put 12 here and for female to male conversion you will put minus 12 here however just like I said feel free to use other values that might work better for your use case now put the pass to your audio file in this window and we are almost done all you have to do is to change some settings in here however the defaults usually work fine the only setting I would recommend you to play with is the search feature ratio for my voice anything over 0.5 produces noticeable artifacts you are free to experiment with the settings As You Wish let's press convert now and wait for the inference to finish let's hear the before and after speech hello everyone and this is a Showcase of how RVC or retrial based voice conversion changes your voice from the original to the Target hello everyone and this is a Showcase of how RVC oral based voice conversion changes your voice from the original to the Target hello everyone and this is a Showcase of how RVC or retrival based voice conversion changes your voice from the original to the Target great so that works now you know how to process audio files to change the voice in them now let's get started with changing the voice in real time for that I would need you to close the the current console and go back to the folder where RVC is installed open launch RVC dob file again and now select mode number two realtime inference this will open the simple GUI where we can perform realtime conversion the first thing you will need to do is to load the model and the index file to do that we simply click on the select that. pth file and then we navigate to the folder where our weights are saved usually this is RVC folder assets weights now here select the model version you want to use now we have to repeat the same process for the index file however the index file is located in a different location to find the index file you will need to go to the RVC folder logs and then the model name you chose here select the index file that starts with the added prefix select this one and let's proceed after selecting the model and index file we will need to select the input and output device you can use Virtual cables to do this however for a simple demonstration I will be selecting my mic microphone as an input and my headphones as an output make sure to select the same type of driver or you will get an error example mme input and mme output now let's move over to the settings they are mainly the same ones found in the web UI so just take the explanation from there however there are two additional new important options those are response threshold and Sample links response threshold is the slider that allows you to filter out The Unwanted noise however as a side effect it might make your newly generated voice cut out I recommend starting with this value at minus 60 and then slowly working its way up until you find a sweet spot sample length in a simple term is the quality of the generated voice the higher the settings will be set to the better quality the generated voice should be now keep in mind that increasing this value will also increase the process in time making it difficult to use in a real-time application so just like with the response thres hold you will have to find a sweet spot for your voice now what you need to do is to click on that start audio conversion button and after a couple of seconds once you talk you should hear your voice back and now if everything works correctly we should hear ourselves everything works correctly we should hear ourself and as you can see this is working perfectly fine as you can see this is working perfectly fine now let me show you how the sample links effects let show how the link the inference time so if I put the sample lengths to 0.08 stop the audio conversion then we can see that the quality of the voice is much worse and it is processed much faster however if we set it to the max to 2.4 time to process but the quality will be much better start out the conversion again it's going to take much more time to process but the quality will be much better now as I said already you can experiment with all of these values uh you are free to find the best ones for your voice values uh you are free to find the best ones for your voice and I'm going to stop the voice conversion because this is kind of annoying to talk with it talking back to me so yeah just like I said just find the values that work for you stick with them and yeah do whatever you want important part is that if you want to change any settings for it to take any effects you have to press on the stop audio conversion and then the start audio conversion again for the settings to actually apply and this is it guys now you know everything you need to know about RVC and how to create a good sounding voice with it hopefully you found this tutorial helpful leave a like if you liked it dislike if you disliked it thank you so much for watching this was
