RVC: Local Training Tutorial (How to make your own custom vocal a.i.)

Video Statistics and Information

Captions Word Cloud
Reddit Comments
so you want to make your own custom AI vocal model using RVC well you're in luck because in today's video I'll be showing you how to remove vocals using the ultimate vocal remover feature how to prepare and adjust your data set so it's ready for training how to train your AI model and also how to combine multiple models together using the checkpoint system all inside of RVC and we'll also be looking at a website where people can train custom RVC and SVC models for you as long as you have a proper data set if you can't figure all this out yourself if you don't have RVC installed already go ahead and head to my tutorial it's just a simple 7-Zip file that you download extract and then run a bat file and you're good to go so go ahead and head to the video and then come back here and we can get started with the train when you open up the interface first things first you might notice is there is no ultimate vocal remover 5 Section that's actually the section here which is in Chinese and all you need to do is right click and click translate to English if you're in Google Chrome and it should give you all this information here using the uvr5 model is referring to Ultimate vocal remover in this video I'll be using my own song as an example called Code Lyoko because that's the one I have expressed it permission to use if you are training a model the legal way then you should have better access to acapellas anyway and you shouldn't need to rip them using this model but we're going to go ahead and go over anyway and I'll use my own song as an example so there's a few different boxes here that we need to focus on the model box is which model we're going to choose there's three main models for moving vocals in my opinion hp3 works the best and I think that's because it removes all vocals and not just the lead vocals like hp5 but you can go ahead and experiment with them all yourself to get started all you need to do is click to upload or drag a file here like I said I'll be using my song called Lyoko and then one important part is you need to specify the output Master voice folder so for me I just made two folders inside of the RVC beta named non-vocal and vocals then I can just right click those folders and click copy as path and then paste those into the interface and then all you need to do is press convert and it will do the work for you my stupid ass forgot to remove the path to input audio folder if you get this error just remove that path and convert again should take about 22 seconds and when it's done we'll take a look at the instrument in the acapellas and how well they separate it before we play them I'm gonna go ahead and play the real song for you so you get an idea of how it sounds [Music] so there's two different vocals you can hear distinctly in that song and let's see how good it did removing those vocals one thing I noticed is it often places the instrument file as the vocal file in the vocal file as the instrument file I'm not sure why but they're still super easy to find and separate by the way how quality is able to rip the vocals from a song has a lot to do with how the beat sounds how many layers of vocals they are and how much Reverb and effects are on the vocals if the vocals are super reverby and full of effects like those were you can use the D echo or D Reverb to remove Echo delay and Reverb from the vocals but like I said before this is the least important part because when you're preparing a data set you more than likely have legitimate access to actual Studio vocals and clean audio so it's more important to focus on the actual preparation of the data set so what do I mean by preparation of the data set basically a good data set is going to contain at least 80 to 100 10 second clips of clean high quality audio there's many ways to separate this audio into 10 second Clips I personally just used a program like FL Studio or Adobe Audition to do it automatically for me and detect the silences but there's plenty of different ways you can do it online in my previous videos I suggested a program called audio slicer which is totally free and I'll have it linked below but my opinion it's it's much easier to just use something like Adobe or FL Studio or any other Daw to go ahead and manually slice the audio into 10 second Clips yourself the data set I built for training is based on my own vocal songs so I was able to get extremely clean acapellas and cut them all into five to ten second Clips good to have a friend at the end of the tone reach out to ask and you're never alone as you can hear there's no auto-tune there's no delay there's no Reverb or effects on the vocals these are clean Studio acapellas from my own songs so this training will come out super clean and the audio quality of this data set should be incredible let me just give you guys a quick example of preparing some data sets inside of FL Studio we'll take these vocals we extracted earlier and we will look at 10 second Mark here all we need to do is come cut this up so that none of these are longer than 10 seconds try to make the cutting points clean if you can so you're not cutting directly on a vocal but it doesn't necessarily need to be clean I think the training just comes out better that way but as you can see it's super simple inside of any dog to just chop up into 10 second segments like so and if you're using FL all you need to do is export all playlist tracks and that will export your data set for you once you have it set now that we have a decent data set prepared let's go ahead and head into the training step one is to name your experiment which will be the name of your model we're going to name this petrol then we're gonna do 40K as the sample rate this true or false is to determine whether or not you want to train the pitch which of course you do if you're using a singing model and then I'm not sure about V2 or V1 but I just leave it at V1 and the threads for your CPU that is up to you and I have a 16 thread CPU and I believe it will set it there by default path to training folder just means the audio for training so we're gonna do the same things earlier copy as path and then paste that as to the path to the training folder only single speaker is supported for the time so I'm not sure why it's asking to specify the singer or speaker because I know that there will only be one speaker the output message does not tell us if it's a success but if you head into the bat file you're on earlier it should say end pre process with a lot of successes like so if we head to our main RVC folder and go to logs we will see there is now a folder called Petro just like mine was with all the audio properly pre-processed if your interface is freezing inside of Google Chrome you can either turn off Hardware acceleration and see if that helps or you can just go ahead and run it in Internet Explorer which seems to do the job just as well the next step step 2B uses the CPU to extract the pitch and then the GPU to extract the features you must specify what your GPU is for me it already listed as zero so I type it here then I'm gonna go ahead and select Harvest because it's the best quality but it takes the longest this process should not take that long and when it's over it should say all feature done the settings for the actual training are extremely simple compared to every other process I've ever used so super shout out to whoever put together this RVC GitHub we love what you're doing so let me go over a few of what these settings mean an Epoch means one full cycle of your training and most models are at least 500 to 1000 epochs to get a quality model and saving frequency is how often you want to save I'm going to say every 50 epochs just to save space then there's this option to save only the latest checkpoint file to reduce disk usage and of course we're going to check that yes because we want to save space but if you don't need to you can just turn that off cache all training sets to GPU memory small data sets work best for this and my data set is under 10 minutes so this should work fine and for the batch size I know that people with a more powerful GPU were able to put it up to five or six but I'm keeping it around four because I know my GPU can safely run that I should already give you a pre-trained model for both G and B I will set the epoch to 500 epochs and will begin training the model if you have any issues with memory or not enough processing power just switch this to no and then click train model again when the training is finished you should have a path file named something along the lines of whatever the name of your folder is and then the number of steps after it and you should also have index files as well and you will need those index files when you go to load your model if it did not spit out a proper file for you or you ran into some errors during the process there's a variety of different things that could be an issue the first thing I would do was copy and paste my error into chat TBT and try to troubleshoot it myself and the next thing I would do is head into the comment section of this video or any common AI Discord and then ask them questions about the error or see what the community thinks and maybe someone has same issue as you and they can help you as well though to be fair it might just be that your GPU isn't powerful enough or you forgot to run it off your Root Drive as I noticed there's a lot of different issues when it comes to running it off an external drive and that's where this website comes in for anybody looking to train their own custom vocal model that couldn't get this working all you need to do is Click custom RVC model or custom Soviets model and you can order a model yourself however they do say that the quality of the model is highly dependent on your data set so if you did not make a proper data set like we discussed in this video there's no point in doing this that's going to do it for this video thank you guys so much for watching I hope this helped you guys learn to train with RVC so you can make your own custom AI models so much love so many more videos coming in the future and peace
Channel: p3tro
Views: 65,580
Rating: undefined out of 5
Keywords: rvc, rvc ai, rvc local training tutorial, rvc local installation tutorial, vocal ai tutorial, how to make vocal ai, how to make vocal covers, how to use rvc, how to train a custom vocal model rvc, how to train a rvc model, rvc ai tutorial, rvc tutorial, how to rvc, how to use rvc local training, how to train with rvc, rvc ai vocal tutorial, ai vocal tutorial, how to make ai vocals, how to make ai music, how to rvc ai tutorial, so-vits-svc, svc tutorial, so-vits-svc training
Id: Q8du7n0vgfU
Channel Id: undefined
Length: 8min 12sec (492 seconds)
Published: Thu Jun 08 2023
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.