(COLAB PRO ONLY) AI Voice Cloning with RVC in GOOGLE COLAB - Guide and Setup

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what's up today we're going over AI voice training in collab RVC some prerequisites for this are some voice samples I'm not going to go over how you get those today because in a previous video I showed how you can record your own voice to do this and I'm only going to show you how to do it with your own voice so if you want to procure samples from somewhere else you're free to do so with that out of the way let's go ahead and jump into the folders so we can get everything all set up for collab so assuming you have your voice samples you're going to need to zip them so to do that go into wherever you have your voice samples and for today's tutorial I'm only going to do five vocal files however you probably want to train with more I'm gonna do five because collab is very slow so that we can maintain this in a timely manner so to zip it up what you want to do is select one hold shift and then select another one and then you're gonna right click go to send two and do compressed zipped folder and that's going to create a folder I'm going to call it me and then that is what I actually have out here so here I have a me test zip and we're going to to go ahead and leave it at that once you have that let's just go ahead and hop on over into a Google Drive and now we're going to go ahead and import those and before that we're going to need to create a new folder so go to new go to new folder call this data set so rename it data set and then click create once that's created we're going to go ahead double click into it and in here we're going to drag in the files that we're going to need so click new go to file upload and once you're in the folder where the files are uploaded in my case it's this me underscore test.zip go ahead click open and it's going to upload this to Google Drive so now just wait for it to upload to Google Drive and we'll get going so as you can see mine is uploaded because I only have five samples so that we can make everything nice and quick here the next thing we're going to want to do is we're going to want to upload a vocal sample of whatever audio we want so let's say we downloaded a video from YouTube I'm not going to go over how you can do that there's plenty of tutorials on that let's go ahead and rename it to something a little bit more manageable so click rename and I'm just going to call this vocals so we're gonna call this vocals.np4 and then we're going to drag and drop it into the Google drive right here or you could go to new and do file upload that's going to be the same process so we're gonna have our voice samples we're gonna have the vocals that we're going to want to extract for the audio that we want to play or sing over and that should be all the audio and data that we're going to need for RVC so with that out of the way we're going to hop on over into the GitHub where we're going to go into the Google collab so here we are in the RVC GitHub if you scroll down from the top links Down Below in the description you're going to have this collab button right here so I'm going to go ahead and Link it in a new tab so it opens up in a new tab here here we go everything is in Chinese um so if you want to translate to English go ahead right click on this top bar up here and then click translate to English this works at least for Windows 10 on Chrome browser so it depends on what browser you have and so this is going to translate it English for me and yeah so this first one we don't need to run but we do need to run this cell here so what we're going to do is we're going to set up our Google collab so that we can get all this up and running so go ahead click install dependencies this little thing is going to pop up and you're going to click run anyway this is basically warning you this notebook is not created by Google so we're going to trust it and click run anyway so it's going to go ahead and run it's going to install some dependencies and so while it's doing that what we're going to do is check to make sure we're connected so if you head on over to this down arrow over here on the stock in this top right side of the screen you're going to go to view resources and you should see that you have system Ram GPU RAM and then disk and with this we just want to make sure that our runtime type has a GPU Hardware accelerator so make sure when you go over here you click this change runtime type down below go to Hardware accelerator make sure you click GPU and the GPU type is a T4 because that's the free one and we're running Python and then click save so that should be enough for that so go ahead exit out of resources by clicking up here and we should be good to go so if you can see you have a bunch of python libraries are installing here and we'll let that continue going on so we're going to scroll down a little bit and we're going to go ahead and run some other cells so they all run in a sequential manner so it doesn't really matter how many sales we run so we'll click the next one here uh we don't need to click this get pool we will click this Aria 2-1 um click this counter one um this voice separation model click that cell download Hubert base we'll click that so and then we're going to go ahead and mount Google Drive disk and we're going to go ahead and stop at this cell until a new window pops up so here we are once we wait um once all the cells are finished processing before this thing is going to pop up here allow this notebook to access your Google files so we're going to go ahead and connect it to Google Drive so it can access our files that we uploaded um connect to Google Drive this is necessary and then what we're going to do is click on the account up here and then scroll down until you click see allow and we're going to go ahead and allow it to access our Google Drive so if that was done successfully what you'll see is this little green check mark right here and then the time that it took right below it and we're going to make sure that we have our Google Drive mounted by clicking this folder icon to the left the files and then we're just going to want to make sure that we have something that says drive so if you click down on the drive you should have something that says my drive here so we're going to go ahead and we close that for now by clicking on this folder here and move on over to our data set so here is where we have our data set so let's hop back into Google Drive here's the folder we created data set here is the file name that I have here so what I'm going to do is click on the rename I'm going to copy all of this and then click copy and of course you could just type it out if you have a simple name what we're going to do is go ahead and delete all that and then paste that in there um control V or you can click right click paste and you'll have this inside of data set so once we have that we want to copy all of our data by clicking the cell now and and you just need to make sure that this is correctly named and you should see something that looks like this so that's going to go ahead copy everything over next thing you want to do is run this cell this just makes it make sure that there's no duplicate files inside of the data set so as long as you don't have any duplicate name you should be fine and then we're going to go ahead and start the web app by just clicking the cell here so that is going to be the end of that and once this finishes running it's going to open a web application that we can access via a link okay so if you're familiar with the previous video this is basically equivalent to your command line window so what we're going to do is go ahead and click this running public URL button or we're just going to click on that link and it's going to bring us to this um GUI this graphical user interface so here's our model inference voice operation and then train so what we're going to do is we're going to train and so in here we're going to go ahead and change it to something like me so I went ahead and inputted me into here we're going to leave Target sample at 40K click yes here and here you're going to have something that's random in here already so we're going to go ahead delete that and then we're gonna head back on over to Google collab where we see our data set so here you have our data set what we're going to do is um go ahead and right click copy path and then we're going to go ahead go back into here and go ahead and paste so here we have content data set and what we're going to do is process our data so you'll see it continuing along here and you should see something that looks like this where you see end process if you go back on over to the Google collab you should have an end pre-process and it should have all of these um successes so that means we did everything correct so we're gonna go back into the uh the graphical user interface if you have any error um read through the error see if you can decide for what is happening there and make sure that your path is correct so we're going to go ahead scroll down and this is going to be our extract pitch area so zero TS Tesla T4 is what the Google collab runs on and we're going to set it to maximum CPU threads so we're going to leave it at Harvest and then we're going to go ahead and do feature extraction so we're going to click that feature extraction can take anywhere between um 30 minutes to an hour for in my experience appearance it can take a very long time to complete based on the CPU that is being ran and on Google collab depending on how many samples you have it's going to take a little bit longer so this one you might just have to click and let it go if your runtime disconnects um what that means is basically this disconnects you'll want to go on over into runtime manage resources and make sure that this is still active if it's not active then you do need to start all over once again but this should be fine here as you can see it is updating and sending requests over but that is okay we'll wait until this is finished and I'll go ahead and show you what happens when that is finished okay so here you can see all of these right here um let's just scroll back up and see what happened so we were watching this and you'll see these occur if you see these happening that means it's working if you don't see anything happening that means nothing is happening so make sure something is being put into this line or you might just have to click or you might just have to click feature extraction again so let's head back on over there and take a look at what's happened so all of these mean that the pre-processing for the feature extraction has finished and this all feature done should be here as well so to make sure everything processed fine we're gonna head on over to our files area over here so go ahead and unhide files click on retrieval based voice conversion go into logs go into whatever you named it I'm going to call it me and you want to make sure this three underscore features 256 is here if you don't see it that means your pre-hubert that means your feature extraction is still occurring and still going even if you're not seeing anything in this window here and like I said it can be very very slow so just be patient and it will in fact finish here we're back in gradio now we're going to go ahead and train it so what I'm gonna do is what I recommend for you to do is just leave these at the numbers that they're at 520 and then seven but for this for sake of speed I'm going to go ahead and do two and two just so that I can process everything fast for the video this is if you want to save the latest checkpoint to save this space we're going to go ahead and click no and then we're going to click no here everything else is fine to leave we're just going to click one click training so here you have the window export message here we're going to go back on over to the collab notebook just to observe everything is happening and things are going just fine so so if you see anything that says error that usually is an indication that things are going bad but we're not seeing any of that and we're just watching it so let's just go ahead and wait until it finishes all right and so here we have it going so you can see that it finished an Epoch and then it now finished Epoch II and they're both done so it's going to go ahead and end the process and what we should have is a technically a trained model so if you go on over into uh the left side here of the files you'll see new things have appeared and you want to make sure that you have this trained underscore IVF file which is the little index.index file will need and then what you're going to want to do is let's just reclose logs go to weights and then make sure there's something called your speaker name.pth or your experimentname.pth so that's your Pi Touch pytorch file you're going to use for inference that means we have successfully trained our model and as you can see all processes have been completed here in this messages tab so that means we're done with training so in your case you may have to train a little bit longer if you want to train a little bit longer you could do me do all the same stuff pre-process data feature extraction and then let's just say we want to do four we're going to go ahead and click on one click training once again it's going to go through all of this once more so our fourth Epoch just finished and here we have it so that means our training is done and we are good to go here so let's head on over now to vocal separation so we can get this one up and so we can get a vocal file separated so I personally recommend that you get your vocals separated on uvr on locally it's a little bit easier but regardless let's just go ahead and do this in the gradio web inference so so head back on over to the files area close retrieval based voice conversion and then we're going to go into our Drive click my drive go into data set and then here's your vocals.mp4 file we're gonna go ahead and copy path to that vocals MP4 now that you have that we're going to go ahead right click paste into our input audio folder path and we're going to click our model let's just do H P5 and then for the vocals output folder we're going to go ahead and paste the same area and so and then also actually here is just data set so just make sure all of these are data set you don't need to copy the path to the exact audio file just make sure it's the folder and then click convert so this is going to continue processing so let's just go ahead and head on over to the window over here to observe everything is going fine so here we we have this is f of MPEG is in action if you see any error codes that means something wrong is happening but looks like we're going through everything smoothly so it is finished with 100 done here okay and so we should be good to go here it has finished them and then we get an error here because of course inside of that folder we have meet underscore test.zip which of course it's not going to do that so um let's take a look here and make sure that we actually have the vocals file so like I said before I click a right click click refresh and make sure that we do indeed have our vocal um file that we're going to want to use so we do and what we're going to want to do is rename this vocals one to something a little bit easier so let's just call this vocal and then let's go ahead and select all of this and then delete all of everything before dot wav and it should be something simple like um vocals.wav and then go ahead click enter so that the so that it's renamed this is perfect now what we're going to do is go ahead and right click um and click copy path go back on over into the gradio so we've split the vocals and we're gonna head on over to model inference if you click this area here you won't see anything yet so what you need to do is refresh Timber list and your pytorch file is going to be here so go ahead click on that me.pytorch and here is where you want to switch change the Octave of the voice if it's a female singer to a male singer voice then you're going to want to go negative 12 if it's male to male leave it at zero if you're going male to female do 12. so on and so forth so we're going to leave it at zero because it's male to mail and then we're gonna go ahead and paste that file here so enter the path of the audio file that's going to do this be here and then we're going to do Harvest inside of this search database index we're going to go back into the Google lab scroll on up and then we're going to click retrieval based voice conversion once again scroll down to logs click on the experiment name or me the project name and then we're going to go to this trained underscore IV IVF file we're going to right click copy path go back on over into the gradio window delete and then paste that in there that should be it for that and we're going to go ahead and click convert so one thing that I noticed is if you have it on harvest what may happen is the graphical user interface will time out and an error message might pop up up here and so here we have it we have ours is finished at success when I was talking about that pop up earlier if your audio file is a little bit longer mine is a minute and 47 you might get that error so let me go ahead and do this with a longer file real quick and here we have it something went wrong connection errored out you'll see two errors here but don't worry it's still processing behind the scenes so this happens after about 60 seconds inside of the graphical user interface what we're going to want to do is head on over to the Google collab going over into our RVC folder and then go to Temp and then we're going to go ahead and click on this gradio here so we're going to go ahead and wait until the second process the second folder appears here we have it the second folder popped up and now we've got two folders with an audio file in each so this means that the Hubert has finally finished running and it actually was processing but behind the scenes so here we have it one that was finished at one that was finished at 1103 and then one that was finished at 1107 so in order to save these what we're gonna have to do is download them so go ahead and click or right click it and then click download and we're going to go ahead and download both of them so so we're gonna go ahead and download them and just pick a folder so here we have we're gonna place it in here and if you're having any trouble um downloading it in Chrome just make sure that this download button down here is clicked um is changed to allow so go ahead just click allow and then click done so that you can get your files downloaded let's hop into that folder and take a listen to the crappy audio that we have because we only train this on two and four epochs so bear your ears [Music] okay so you can tell that it was processed and this one since it was trained for so little with so little voice samples sounds absolutely horrendous and quite demonic so you're going to want to train a little bit longer you're going to want to use more vocal samples than I did and you're going to want to train for longer than what I showed you but that is how you can get a model trained that is how you can get a vocal split and that is how you can use your trained vocals to create an inference over that voice now what we're going to want to do is save that into our Google Drive so we don't lose it so this is the probably the most important part you don't want your hours to go to waste so what we're going to want to do is go into logs on the left hand side here go over to me and then um what we're going to want to do is take a look at which is the latest Epoch so or the latest step and that is 32 so we're gonna go over to here we're going to rename um this is this is right below the command line window that was running above so we're going to scroll back down to it which is a manually backup chain models to Google Cloud disk we're going to name it what we have it named and that name comes from this experiment so here is me we're going to name it me and then we're going to take note on what the most recent model pth number is and that is 32 in our case we're going to go ahead and click run the cell and as you notice this isn't actually running so what we have to do is scroll up here and then click on interrupt execution and that's going to interrupt the execution for the web application so make sure you're not doing anything at the time and it's going to kill the server here and once you see it finished running uh we'll head back on over to our Google Drive and it's not going to be in our data set folder it's going to be in our my drive area so scroll on down to um to where you can see your your files so here we have me.32 and that is exactly what we had trained so now let's say we want to get it back and now we want to use the model on a different day well if you want to use this on a different day what you have to do is run all of the previous steps that we did before minus the data set part you don't need to rerun the data set you don't need to run this rename duplicate files but you do need to restart the web application um but before you restart the webinar question don't do it yet you're going to want to restore the pth from Google Drive so here we have it we're going to go ahead and do what we've got there so what do we got we got me 32 so we've got me here and then we're going to do 32. it looks like we should be good to go so we've got um it's actually this me 32 right here and we're going to go ahead and click this cell run button so just to show you that it is in fact doing it um I did use a different model this is one that I played around with yesterday um we'll go ahead and click right click and do refresh a couple of times and we'll see that we've got the D here we've got our G here and if we head on over into weights um we've got our pth as well so so this should be here and since it was the same name we can take a look at the modified time to see that we did in fact um just upload as today is right now is 11 16. so we've got those those models in now you're ready to go back up to this um start web app up here and you can rerun the start web so you will have to close out of this previous window and you'll have to open up the new link that pops up inside of this command line area so and here you go you can do all the previous steps before one common question that I get asked quite a bit um for the previous video is do you have to run all of the previous cells if you're coming back from a different day yes you have to run all of these cells once again in order to be safe and you've got everything covered so run all of the cells do all of the previous things that I just showed in the video and you should be able to train your model split the audio and then also use use it for inference alrighty and so that's how you can use ovc in Google collab so hopefully you can get it up and running I know people have a lot of issues with Google collab and this kind of Hit or Miss sometimes so some things like that make sure your feature extraction has finished your training samples could be either you know too long you want to make sure that you split up your training samples which I go for over in the Soviets SVC cloning video so if you want to check that video out you can as well if you want to use uvr locally as well you can do that from that previous video so that is how you can get it running on Google collab hopefully you found that useful if you leave a question down below in the comments I'll do my best to try to help you out and hopefully I'll see you guys next time if so see you later

Info

Channel: Jarods Journey

Views: 114,538

Rating: undefined out of 5

Keywords:

Id: 9wu6LSue_dU

Channel Id: undefined

Length: 22min 46sec (1366 seconds)

Published: Mon Jun 05 2023