How to Make the PERFECT Dataset for RVC AI Voice Training

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

I'm gonna show you how I create high quality data sets for my AI training this is important in order to get a good sounding model so without further Ado let's go ahead and jump into the video all right so you need some prerequisites for this python git and vs code a link Down Below in the description towards a video that shows you how to install them and everything else I'm going to show you how to get in today's video and one last thing to keep in mind is Hardware so I was able to get this running on my i7 8650u on my laptop but it was really slow so this is going to vary from person for how fast it's gonna run based on your Hardware alright let's run into the installation part of the video we're going to be using some custom code that I coded up for this video audio splitter underscore whisper so this is where git Python and all that is going to come in handy first open up a new file explorer then what we're going to do is go ahead right click or shift right click and then we're going to go ahead and do open in terminal once we have this open in terminal we're going to go over into our GitHub click on the code button up at the top and then click the copy button and then go back into the terminal window we're going to to do git clone and then we're going to paste this into there so like I said you need to have git installed in order for this to work link Down Below in the description to that video and this should be all good to go here so now that we have that cloned up we need to now go install some other things so go ahead close out of this window we're going to need to install ultimate vocal remover so go ahead and scroll down click this main download link and then you're going to go ahead and save it somewhere I already have it downloaded and then the next thing we're going to download is ffmpeg so a link to this page is going to be in the description as well what you want to do is click this ffmpeg git full.z.7z and then save this into a folder as well cool so we should have ssmpeg we should have the uvr set up and then we should have the GitHub repository clone so what we're going to do is install uvr now so go ahead and go ahead and head on over to where you have it installed and double click into the setup.exe this window is going to pop up go ahead and accept the agreement go next you can choose to create a desktop shortcut if you want go next and then click install and it's going to install on your computer I already have it installed so I'm not going to do it and if you're wondering where it installs you go percent app data percent um once you're in here click on app data go into local scroll down until you see programs double click into programs and you'll see the ultimate vocal remover file here so here is the ultimate vocal remover once you're in here you can scroll all the way down to the uvr.exe and launch it from here so so once we're in that folder where we have ffmpeg what we're going to do is extract it so you need some type of 7z extracted to do this um and to do that you can just right click and do open with search the Microsoft store if you don't have one installed and I use this Breeze zip you can use whatever you want win what WinRAR 9z it doesn't really matter but now once you have one of those unzippers installed go ahead right click open with and then go ahead and click on it and then once you're in this window you're going to do extractions so go ahead and click extract and it's going to extract into that folder so here it is it popped up so if we go back into the folder here you can see this build right here if you double click into it double click into it one more time you have all of these file here so now that we're in here you want to double click into bin and what we're going to do is copy all of these so right click copy go out go out go out and then we're going to put this inside of the folder that we cloned so that is going to be this audio splitter underscore whisper so inside of here you're going to do paste and so now you have ffmpeg installed in here and we should be good to go okay we're almost done with installation the last thing we're going to do is open up visual studio code let's go ahead and open it and then open the audio splitter whisper folder and then from the address bar we're going to drag and drop into vs code and here we are so depending on what type of Hardware you have if you have Cuda compatible Hardware we're going to run this script if you have a CPU you're going to run the CPU script I'm going to run this Cuda script right here so once you're inside of here what you're going to do is first click on this bottom right corner you should see you should see some type of python version in here if you don't go ahead and type Python and you should be able to see all these different pythons that you have we're using this 3.10.11 dot bit and what we're going to do is go to the top window go to run and then click Start debugging so we're going to click python file right here and then make sure you make sure you're in that setup Cuda bat and it's going to run so this window might pop up in the bottom right corner we're just going to go ahead and select no for now so this is going to install and this might take a while as it's going to be downloading a lot of files so just wait for that to finish up and we'll be back when it's done alright so once it's done installing now we're going to go ahead and activate the virtual environment just to check a couple of things so to do that what we're going to do is type the name of the virtual environment venv we're going to click tab we're going to type in s and then tab and then we're going to type in a and then tab and then we're going to click enter if you run into any issues here most likely your Powershell your execution policy needs to be changed so let's go ahead and change that so in the search go ahead and type a Powershell and then go ahead and click run as administrator go ahead and click yes and you're going to be in this window right here so we're going to change it from restricted to remote signed so this means that local scripts can run or anything by a trusted publisher can run as well it decreases your security basically on your system so do at your own risk once we're in here what we're going to do is type in set execution policy and we're going to type in remote signed and then when we're here we're gonna go ahead and do a yes to all so once that is done nothing is going to pop out and you're good to go so let's go ahead pop out of there and then you should be able to rerun that command so type nven Tab S Tab a tab and then enter and here we are and so that's just about it for the installation process so let's talk a little bit about data we're gonna want at least 10 minutes of audio data if you don't have 10 minutes well it might be able to train but you're going to want at least minimum 10. if you train a model with 10 minutes and you deem it's not good enough well you can always add more data to it after the fact but I would stick with 10 first just to train a model in a reasonable time and then see how it sounds after that and if it sounds good perfect if you want to add more data samples you always can though and if you're curious I trained my models on about three hours of audio so three hours of clean audio and It produced some pretty decent models but try 10 minutes first I don't know if three hours is absolutely necessary alright so let's jump into the data processing part of the video so hop into that folder the audio splitter underscore whisper that you have and we're going to create a new folder in here so go ahead and right click go to new and then click on folder we're going to call this data so inside of this data folder you want to have your audio file in this case I'm going to use my most recent YouTube video how to get AI voice models so I programmed the script to take any video file or any audio file so that shouldn't be any problem but once you have your file we're going to want to run it through uvr to remove any background noise alright so go ahead and open uvr I'm using this Kim vocal model but whatever is auto selected here should be fine for you what we're going to do is process this file so what and so the easiest way to do this is just drag this into select input it's going to put the folder path here and then go ahead and drag the folder address into output so here's the output here's the input and and let's say you have two files what if you want to do two files well then you can just drag the folder address into here into the input and then click on input and you'll see what files are going to be processed so I'm only going to process the one so if I re-drag this into input and then click it it's going to show me the one once you have your input and output selected um we're going to leave batch size at default we're going to leave audio volume compensation on auto we're going to select dot wav here like I said I'm using Kim vocal one I'm using GPU conversion and you're going to want to select vocals only however for demonstration purposes I'm going to unselect that so I can show you what happens with the instrumental split as well so you could start processing here but there's a couple of additional settings I want to have you adjust so click this wrench right here you're going to go into choose options you're going to click Advanced MDX net options and then you're going to click denoise output so that's going to denoise the output a little bit and then we're going to go ahead close window and then now we can go ahead and click start processing so depending on how fast your Hardware is it's going to take some time to process so you'll see the status bar progressing right here and once that is finished inside of the main folder you're going to get two files a vocals and instrumentals so here's the instrumentals [Music] so you can hear the song in there and then you can hear my voice in the background because the song is super low so let's go ahead and take a listen to pre-split and then after split how to get AI voice models for the AI voice changer so if you don't have the hardware to train or you how to get AI voice models for the AI voice changer so if you don't have the hardware to train or you simply just don't want to train it so from that sample you can hear that the background music is mostly cut out of it you can still kind of hit her a little bit but awesome and so you would do uvr for all of your data before you process it so once we have that here now we can head back on over into our python script and start getting things running so so in this top left corner we're going to go to explore and we're going to open it up and then we're gonna go into the split underscore audio dot py so once we're in here what we want to do is go to this bottom right corner click on this this number 3.10 we're going to change our interpreter to our v e n v so we're gonna go ahead and click that and you'll see that all the squigglies went away so once we have that we're going to want to set up our configuration file and that is this file right here so before we do anything in there we want to go to the configuration file go to rename and we're going to call it conf.yaml in here we're going to select the language that our audio samples are in we're going to be doing English the only ones supported I believe are these models right now for whisper X but this works beautifully for Japanese as well as I've tried inside of model we're going to leave and then this last part is going to be diarization which I'm going to show later in the video and like I said earlier my laptop with my i7 8650u was able to run on large V2 for those of you who know whisper if you modified anything go ahead go to file and then do save and then we're going to go back into the split underscore audio.py so once we are in here we're gonna go into run go to start debugging and we're gonna go ahead and once we're in here we're gonna go over to that folder that data folder that we went into and then if we double click into it you won't see anything but that's where your audio files should be so I'm going to go ahead and click on data and then do select folder all right and it looks like I ran into an error so Windows 2 system cannot find file specified so let's go ahead and exit out it looks like I actually need to activate my virtual environment so go ahead and type in v e n v go ahead do scripts a and then activate that and then let's go ahead and rerun this so go to run start debugging and then let's go ahead and select that data folder one more time select folder and we should be good to go hopefully yes so here we go you should see this pop up and don't worry about this error code here it doesn't really mean anything it will run just fine it's going to perform transcription and alignment and this is going to take time depending on how many you have when it finishes it's going to save save segment all of these things and let's go ahead and open the folder so here is that data folder two new folders are going to be created wave underscore files and output wave underscore files is going to contain the wav files if what you put in was an mp4 file or a video file but if it's empty that's okay inside of output is going to be your now segmented audio data so if we go into output if you go into the name of the video so in this case remember that my file is called vocals.wave if we go into that folded name now you should see all of these different segments in here and so that is pretty much it if you have a bunch of different videos in here you would just put all those into one folder and you would follow all of these steps that I show in the RVC tutorial videos on how to add data sets into the training so that is pretty much it for the vocal samples however there is one thing inside of here there are actually two speakers however it's all put into one so that can cause issues with training and we don't want that so if we take a listen to this one and then we can go ahead and hear Halloween voice model sounds you can hear two voices in there so that's not what we want now you could manually select and choose and delete those however that is going to take a lot of time and there's a faster way to do this so that is where this diarize comes in handy so let's go ahead and set that up now alright so let's head on over into the whisper X get GitHub repository and we're going to scroll down until we see speaker diarization and so what we need to do is accept and accept some user agreements for the following four models go ahead and click here and it's going to bring you over into this page and then we're going to go ahead and sign up and once you finish signing up you're going to need to confirm your email address so go ahead and confirm your email address in your email and once you've confirmed you should have this new token where you can click a button so let's go ahead and create a new token let's just call this uh voice and roll read generate a new token so inside of here you can see your hugging face token I'll go ahead and delete this after the video so that you guys can use it and what we're going to do now is now that we have our account is hop back on over to this GitHub page right click segmentation open a new tab right click voice activity open a new tab right click detection and then right click speaker diarization and you should have four new tabs that opened up so let's click on the first one of what we're gonna do here is agree to this right here I'm going to say J Micah and then I'd put my GitHub repository here and I just put speaker diarization agree and access repository and once you have all of these done um you will now be able to actually use the speaker diarization so go ahead exit out of all four of those go on over into your hugging face token go ahead and copy token to clipboard go ahead and head on over into vs code delete this enter your hugging face token and go ahead and put diarize to true so it should be blue and then go ahead go to file go to save and then let's head back on over into split audio.py and we're gonna run this and just so we don't get confused I'm gonna go ahead and delete all of these inside of my data folder so that we only have vocals that wave go ahead and re-run the program we're going to select the data folder select folder alrighty and here we are so now this one I programmed to have a different file structure or a different folder structure so if we go into that folders if we go into that vocals folder now you have a speaker 0 and a speaker one so inside of zero is going to be the Sakura Miko voice and then inside of here is going to be mine because that audio file kind of merged that you heard earlier at 47 you're still gonna have it in one of the two folders so it's not that perfect yet and so here we'll find it inside of my voice here at audio 46. go ahead and sound so that file I would delete so I wouldn't want that and so what I could do here is you know rename this to my name so that I don't get confused but if you go to the other folder here are the four audio files under the voice of Mikko so yeah this is pretty cool because now you can split and you can have multiple speakers and you can have it split your audio so um if you're doing like it so if it's like an interview or whatever it'll split it into the two speakers and that makes it much easier to curate the data for multiple speakers in one audio file and I forgot there's one more thing there's this audio shortener that will cut files to 10 seconds or less and this might be necessary for not getting out of memory issues so so this one is called audio shortener go ahead go into audio shortener go ahead do F5 or

Info

Channel: Jarods Journey

Views: 100,530

Rating: undefined out of 5

Keywords:

Id: 9lsSSPnF67Q

Channel Id: undefined

Length: 18min 17sec (1097 seconds)

Published: Sat Jul 01 2023