Train RVC Custom Voice Model for Any Voice [No GPU Required]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone so in the previous video we saw that how to create an AI clone Voice using a pre-trained model so in this video we will see that how we can create our own model to replicate someone voice especially in this video we will see that how we can replicate Neil Tyson voice I hope that you all know this person Neil Tyson he's an astrophysicist I'm a great fan of him and I'm a fan of his voice because he got some manly voice also okay let's train a model to train his voice and at the end let me clone my voice into voice and let's see how it sounds okay since we are going to train uh Neil Tyson's voice so first we need a voice file office we need a sample voice of Neil tyon So using that voice we can train our model and at last we can replicate using that model so first what we need is we need a database of this person voice so for that always remember three things when you're downloading someone voice always remember three things the first thing is always try to find person's speech don't download some songs of that person or something always try to find a person's speech because that will give you a better result the second thing is when you are downloading that person speech always try to find a clear audio don't try to find when he is speaking in when he's giving some public speech where there is lot of noise or some background music yes we can use some software and we can remove that noise and music and all but when we are using that kind of software it will little change his original voice because when you're using that software it will change his voice right so the output will not be that much accurate so always try to find a clear audio of that person the third is like try to find either longer audio or try to find different uh many different audios because we want an audio sample in which he is using different kinds of word like we want to know that how you pronounce different kinds of words different kinds of syllables so always try to find these three things try to find a speech of that person and that speech need to be clear and we need to find a longer audio or many audios so and when you are finding in clear audio I will give one more hint is that don't directly go for YouTube to find because in YouTube most of the time it will have some background noise or it will have some mixed music because always the content creator who is posting that he'll mix with some music or some some noises will be there so always go for some official sites or sound clouds to get a better audio so keeping this in mind let's go and find Neil Tyson's audio sample for our model training so let me maybe put spe MP3 okay see this seems like some official site because English speech channel so this seems like some official sites right so let's go for it okay let me play this one of clarifications before I out North America what drives people and in Florida but I had 196 a Life magazine and college politics all of that matters see this audio seems like very clear this is like very clear audio even though it is having some music in the front and in the end it's totally fine because we will use some softwares and we can just drop out this uh two part and we'll use only the main part but the thing is like this audio is very clear and the second thing is this AUD is long enough which means he may use like different kinds of word like different kinds of pronunciation so that our model will uh train properly and give accurate result okay let's download this audio it's already contained like MP3 download if it does not have like MP3 download then you need to find then there are many uh sites that you can use like you can use save from net site or SS YouTube site like you can use there are many sites to download audio or video and this site it's already contain MP3 so I'm just simply downloading it so let's download MP3 yes I already created a folder called database I'm just storing it there yes done okay now we have downloaded the audio but as I said before this audio the first part and last part it contains some uh music and also it contains some clap part right so we need to crop it out so for that I'm using autoc City Soft autoc City software if you want you can use any software and you can remove but I use this one because later we'll do kind of splitting the audio that also can be done in autoc City software so if you want you can just download and install this software and you can also use the same software aut City I'm just using this software just opening it and let me drag and drop this audio here okay so here see if I play this [Music] one for that war until until here there is like some music and clappy sound so I want to crop that one so I'm just using the select option I'm just cropping I can just crop more also no issue then I'm right click and split so after splitting I'm just uh deleting this one okay done so like that at the end also there is some happened to Le thank okay so let me take this select and crop it up split and delete I just need to that one delete yes so now my audio contain only the speaking part without any clapping or any external music background noise nothing so only a clear part but now I need to consider one thing that okay now I have around 16 and half minute of audio if my audio contain like different pronunciation of words like he is speaking different words and different vocal sound and different syllable if I'm very sure that he is using all kind of different pronunciation then I can keep this 16 minute audio as a single audio but if I'm not sure that whether he is using different kinds of word or not because I cannot sit and here all the 16 and half minutes and I cannot uh note down that whether he is using different pronunciation of yes if like different sounds is there now Vel sound and consonant sound that how he is using all these things if you are sure you can just keep the 16 minutes if you are not sure then we need to split this 16 and half minutes audio into smaller part maybe we can divide the 16 and half minutes audio into one one minute audio and we can make it as a 16 or 17 audio so like that we need to split if you are not sure that whether he's using like different pronunciation of words then it is always better to uh cut the video into smaller part which will increase the accuracy the second thing is when you are just splitting the audio into smaller part then it will little fasten the speed of the training process so always I'll recommend that you can crop the audio into smaller part so let's see how we can crop the audio into smaller part for that we can use the uh same uh autoc City to do that just click this uh which you want to split and go to analyze here and find for label sounds okay here you need to set up two things one is the minimum silence duration and minimum label duration just keep the minimum silence duration as the uh 10 milliseconds which means that it will just split wherever there is a 10 millisecond gap which means almost everywhere it will just split and we need to keep like how much is the minimum duration of a splitted audio that we want so we want at least one minute audio because I'm just dividing the 16 and half minute audio into one one minute right so just I'm keeping one minute so you can use this software or you can use any software and just split it into like 16 part or 10 parts like it's up to you so if you want to use autoc City just keep 10 millisecond and keep the minute of the audio you want ideally you can just keep one minute then give apply once you give set apply see now it is just divide this audio into smaller part of yeah 17 because 16 minutes and 30 seconds yeah so now what I need to do is I just need to click file and Export export multiple yeah then it will ask that where I need to store so I'm just choosing the directory in the [Music] folder you see we have database so in this database let's create a new um folder name name so I'm just storing it here and we need to press label because we are just splitting according to the labels and yeah nothing else don't need to change anything just give export so it is just a it is just splitting and dividing the audio into 16 part and storing one by one if you go here okay now it's done if you go here and see there is a folder and in inside that folder see there are 17 files if I just open one one audio file and I can see before I agreed to accept that see it is just a one minute audio clip so like that it just divided this audio into s okay now we can use this data set and we can train our model okay now we have a data set for training this model okay let's see that how we can train the model okay now we can train the data set in two ways one we can train the data set locally in our own system one laptop or we can train the data set online so in this video we will see how we can train a model in online and the next video we'll see even how we can train the model in offline with the minimum uh GPU that we have so for that first we need to go to um this link I will give the link in the description also so once you gone here you can see this kind of okay okay so once you come here you can see this kind of uh format here so all these things are the coding you can just directly run over here but for the safety purpose what I will suggest is you can just click here and save a copy in Google Drive so that the coding copy will save in your Google drive because now it is available in my drive right so if I unfortunately if I delete in the future or if it went missing or if it went moved then this link will not work anymore so for the safety purpose you just make a copy of uh this one and uh save it in your drive so that you can use this coding in future okay so okay for now training the model first thing what we need to do is we need to connect uh this Kel so for this just click the reconnect and it will start to connect so it'll just say connecting and it'll just connect it so here you can see there is a virtual RAM and this so we are using Google scab GPU and this so we don't need to have any GPU of our own so we are using Google cab GPU and disk for our process okay now just come here and here it says install to Google Drive just click that run button it will ask to connect to Google Drive just click connect to Google Drive and give your Google Drive account and just click allow okay okay what this piece of code is doing is actually this piece of code is uh making a folder inside your uh Google uh drive and you it will it will contain all the necessary things about this training and all this process so that we can train and we can store our model index file everything in your Google Drive so basically what this piece of code is doing is it is just simply connecting to your Google Drive and it is installing the basic necessity basic necessary things to train a model now the installation to Google Drive process is over so once it is over successfully then it will show a success note here which means that this code ran successfully so what this piece of code did is actually uh this is a folder where it contain all your Google drive files and folders see all these things are my files and folders of my Google Drive like if I go to my Google Drive and check see I can see all these files right and it shows the same so it connect to my Google Drive and after connected to my Google Drive one more thing it did is like it created a new folder called project main see if I go here and I can see there is a new folder called project main it contain all the files and folders to run this RVC so that we can train our model so it contain all the necessity coding and everything okay so even after training we like all our model and index file everything will be stored here and we need to take out from here so that is the thing that this first part did okay now we come to the second part that is a pre-processing data so here the first thing we need to give is we need to give a model name so I can give any name I want since I'm training Neil Tyson's uh model so I'm just keeping Neil model you can keep any name you want but uh don't give any space or something just give as a single uh word the next one it is asking is the data set folder actually we have the data set here right so the data set that we have here is it's stored in my laptop it's stored in locally but now I need to give uh the online path of this so what I can do is I can just upload it here and after uploading it here I can just give the path of this one so to upload first create a folder let's create a new folder and keep any name you want I'm just keeping Neil tyon because in Google collab you can cannot upload a whole folder you can just upload file by file so I'm just creating a folder here and I'm just giving upload and I'm going to my database that which we created or we see data set mil I'm just I'm just pressing control a and open okay now it is uploading all the sound files uh of our data our data set okay now our data set is ready we just uploaded Neil Tyson right I just go inside inside that folder and I can see all the sound files that we uploaded here and remember one thing that when you are keeping a folder name keep the folder name as a single word don't give space or something okay so just keep as a single word okay now what we need to do is we need to give the path of this folder to here so for that just go click this three Dot and click copy path and paste it here so that's done we just SK the model name and we gave the data set folder path also so just run it so now it will start to pre-process your data so if you are giving some space here like if you created a folder name with space or something then it will show it will show that it can't find your V set or something so remember to keep as a single word see now this also is Success this so success which means this ran successfully now we come to the Future extraction part in future extraction part there are like four algorithm we can choose since we are using like on platform so we just take GPU because Google is giving free GPU that for that sake only we are using this online platform right so no need to change anything just keep as GPU and run it so this will do some the future extraction part uh for your model okay now the future extraction part also is over and I got a success sign if if there is no success and if it is showing some error then you can just put it in the command then I will say how to solve but 99% it will work without any problem if you just follow the same step that's why I'm just going very slowly step by step so just follow like exactly the same then it will not show any error at all okay so now the future extraction is over now we need to train index here we don't need to change anything just run it what this code is doing is like it is creating the index file actually in last video we saw that when you are going to clone your voice we need two things right we need a model file and we need a index file so this one will create an index file uh for your uh model okay now this is also is done it will not take it will just take only seconds to run so now it created an index file so if you go to your project main that drive and if you go inside your logs see the same name that the model name I kept in the same name there will be one folder will be created if I just go inside I can see there is an index file okay so there are two index file we we'll see that which one we need to choose that we'll see later okay so this created an index file so now we need to train a model we need to create a model file so for that we we need to fill all these things first is it is asking about token it is asking about token actually what is this authentication token is for tensor board okay actually what tensor board will do is actually tensor board will give a dynamic workflow of our model like when our model is training so we can see dynamically we can visualize and see like how it is training how much accuracy it is getting whether it is training properly like whether it is getting close to our real voice or not to see it accurately in visualized way so we can use tensar boort so to connect there what we have to do is first we need to click this one we need an authentication right so for that click that link and click it again so now we need to create an account so we can create using our uh Google account so I just loging with Google and I'm using my Google account okay so now we can just accept the terms and condition and create an account okay so it is asking for multiactor authentication just skip it and click got it okay now it is asking what is my purpose of using this you can just say that developer building for fun and you can say it is my device and my own network and it is for myself and yeah click continue and here you can see that is like your out token there you can see a link right so this is the link that they are asking there so we just need to copy it here and we need to just paste it here yeah paste it over here okay now here it is asking for model name so we need to give the exactly the same name that we gave in pre-processing data so just copy that and paste it over here yeah that's all okay now here we can see there are two things one is save frequency and another one is ebox epoxy is nothing but how many number of times you want to train the model like the more number of times you are training the model then the output will be more accurate so if the more you are increasing here then the more accurate your output the more clear your output voice will be but the thing is the more number of times you are increasing the eox the the time taken also will get increase so for ideal condition you just keep around 250 to 300 then it will give a ideal output but if you want more clear output you can just increase until th000 it's available here you can just keep maximum of th000 so here just it is a tutorial video I'm just keeping 250 okay now the safe frequency safe frequency is nothing but like how often you want to store your model okay I'm just uh running 250 times and save frequency I kept 50 which means for every 50 EO for every 50 times it will store my model so 250 means it will store 50 100 150 200 250 so it will store my train model five times there are many reason that why we are saving this model like one of the reason is like okay I'm just running 250o when it is training around 220 or 230 around that time if because of some Network issue or something if it got disconnected then the whole process will get uh spoiled right I need to train it again from the beginning so this when I'm just using the save frequency it will just save 50 100 150 200 right so even if it got disconnected around 220 still I can use my 200 Epoch train model because already it will be stored so there are many reason we are use like why we are doing this one of the reason is this one so you can just keep save frequency according to that if you are just training for longer time then always keep maximum of 50 because it will it will just store train model right so it will occupy space also so I recommend you to keep around 50 as the maximum you can just keep from minimum 5 to 50 okay then here then here it ask for catch always uncheck catch like if your data set consist of 10 minute less than 10 minutes audio then you can just catch it if it is more than 10 minutes long then uncatch it otherwise your CPU will get CPU memory will get filled so our audio is around 17 minutes so I just uncheck it so now it's done we can just start training okay see now you can see that it started to train like it is like starting with one epox and he here you can see the time when it start like when it happened and this is the time taken it says it take 45 seconds to run one eox which means just to train one time it is taking 45 seconds see for second it is taking 30 seconds so like that it is it is going to run uh 250 epox okay just for an average if you are taking that it is taking 30 seconds to run an epox so let's calculate we just use calculator it is using 30 seconds to run one epox we are trying siging 250 ax which means it will take 7,500 seconds we just divide it by 60 so that we know minutes it will take 125 minutes which means it will take around 2 hours it will take around 2.08 hours it will take around 2 hours 10 minutes so we have to wait until 2 hours 10 minutes to get the result here if you see that already my 250 uh epox already it got over it took around one and a half hour uh because uh yeah it took around one and half hour to finish so already it's over so now let's see that where we need to find the index and model file so now you go to the drive and in the drive like you're just going to this project main right like in we are just going to drive and here this project main so inside this project main there is a folder called logs so just go inside the logs and like we kept the model name as Neil Neil model right so if you're creating two or three model then everything will have folder now we are just created Neil model so I'm just clicking that and inside here you can see there are two index file one is added and Neil model index and another one is trained Neil model index so here this added now this added and we having that model name index so this is the index file that we want to use for our model our model so we can just download this one so download the scanning forus and it is asking where to save I'm just going to create like I'm just storing it outside okay it just downloaded it's around 92 MB then we need to download that model file also now I download index file then I need to download model file so for that go in go to the project main folder and there is something called asset asset folder so inside the assets there there is another folder called weights okay here we have all our train models and since we gave the safe frequency as 50 so it stored 50 100 150 200 250 epox model so and also these things will happen dynamically which means while the whole training process is going on and once the training is done for first 50 Epoch so it will store the 50th Epoch model and once it is done for 100 heox then it will store the 100 aox model so that even while the training process is going on we can just download the train model and check how our result is for each model anyway since here the whole training process already complete which means that the whole 250 EPO is already complete so I'm just going to take the final model which is the 250 Epoch model I'm just storing in that file okay so now we have both model file and index file so let's clone my voice into nil Tyson Voice using the model that we have trained so for that first we need to copy this uh index file and go to RVC beta folder and here go inside log folder and paste it here just same as we did in last video and go and take nil like model for file and go to RVC beta folder inside weights folder paste it here if you don't know from where I'm just getting this RVC beta folder then go and watch the previous video I just gave the link in the description that we just download this RVC beta and we just unzip it and make this folder okay now we just placed both files in a proper location now we need to find go web. bad and just press enter and run it it will take maybe 30 seconds to load okay so now here if I just see this drop down menu I can see the model file here and here I can see the index file okay now I need to give the uh audio that I'm going to convert I already created a text um a test file that is over here just if I run you can hear my this test file that we are running for 250 eox model so let's see how our model trains okay so this is my voice I'm just going to clone this voice to nil voice so I'm just copying this path and paste it over here okay so now we need to set this parameter if you don't know how to set this parameter then watch my last video I already explain like each and every parameter and how to set so I'm just doing it like little fast yes okay let's convert my voice since it is like a small file it's not a song or something right since it is a small file that we are converting it is just a 10sec audio so it will take it will the conversion will take very fast maybe in 15 second or something it will get converted so let's see yeah it get converted okay this is a test file that we are running for 250 Box model so let's see how our uh model trains yeah see now you can see a great difference here see this is my voice this is my voice okay this is a test file that we are running and this is Neil Tyson voice okay this is a test file that we are running for 250 BLX model see even though we just trained our model only for 250 EPO but it is like resembling around 70 to 80% of Neil Tyson voice so if you increase the EPO to around 500 around thousand then it will give a very accurate result even with this we are getting this much result so if you are increasing your number of epo then it will get will get an accurate result okay this is the way that we can clone anyone Voice using this RVC model but here in this video that we saw that how to train a model using an online platform that is Google collab but using this Google collab have some problem like that it has some drawback one is it need a stable internet connection if you don't have an internet connection then you cannot train a model using Google collab and also you need a stable internet connection which means in between if your internet goes off and come back then the training process will stop then you need to start from the beginning so that is why I already suggest you to use Save frequency to save the keep on saving the model otherwise if in between if it got disconnected then all the training process will get spoiled so this is the first drawback that we need in stable in internet connection the second problem is it will ask to verify like we just here trained only 250X model which took around one half hours so that is no problem but if you just training for thousand EPO then it will take around 7 to eight hours and Google collab like if you are using a Google cab if you're running something in Google cab for more than two hours then in between it will ask you to verify that you are a human or not so we don't know that when it will ask so you need to be like always focused and when it will ask because if you didn't verify that in one or two minutes then it will disconnect your connection which means that it will disconnect the training process so all your training process will get wasted again so this is the second problem that it will ask you to verify in between and we don't know really when it will last so we need to be always focused that when it to last the third thing is if you are training many models continuously without any Gap then uh Google collab will find there is an unusual traffic in usage of their virtual Ram because they are giving free Ram not for this purpose actually the Google collab they are giving free RAM for Education purpose we can use Google collab for study purpose or some research purpose so if they found that there is an unusual traffic unusual usage of their Google collab virtual Ram then they will ban your account for maybe 4 hours then if you keep on doing the same thing then the band time will be keep on increasing like 4 hours 8 hours then 12 hours like that it will increase for days and months also so what what I suggest is like if you want to continuously actually first thing is don't train uh many model continuously just train one model and give some Gap and train another model but if you want to train many model continuously then create 4 five Google account Google collab account then just use one account to train one model then use another account to train another model like that if you do then there is no problem at all but anyway still there are there are two problem one is connectivity problem and another one is verify problem but if you are just training a model using using your system then you no need no need to have any internet connection and you don't need to verify you are human or not you can just train your model without any problem but the only problem that we are facing in training a model in our own laptop our own system is we need a GPU power because without GPU we can't train at least we need a minimum level of GPU to train even with the minimum level of GPU actually in our in my laptop I have 4GB GPU even this 4GB GPU is taking around uh 4 hours to train this 250 eox but Google collab is always faster because they are giving around 16 GB virtual RAM for GPU so Google coll will be faster so if you have a system which contain GPU then you can train your own model in your own system itself so in the next coming video we will see that how we can train our own model using our own system and also we missed one thing in this video that actually we gave the AL token for tensar board something called tensor board to visualize our training process right that also is a little big concept so in the coming video we will see that how we can use tror board to visualize our training process so that we can decide like how many EO we need to run to get an accurate result so stay tuned for the next video just by subscribing to the channel so that you will not miss the next video
Info
Channel: Learn with Dev
Views: 25,357
Rating: undefined out of 5
Keywords: AI voice cover, modi singing, AI voice cloning, AI voice cover for free, AI voice cover tutorial, AI voice cover free tutorial, sidhu moose wala ai voice cover songs, how to make ai cover songs with your voice, how to make ai voice song cover, how to cover ai voice, how to make ai voice cover, Free AI cover songs, RVC voice conversion, RVC local install, How to install RVC GUI in laptop, RVC Train Custom Model, Train Custom AI voice Model, RVC Model, RVC Custom Model
Id: 79K9k8OSpIA
Channel Id: undefined
Length: 32min 5sec (1925 seconds)
Published: Wed Oct 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.