Get the BEST AI Voice Models by Analyzing Tensorboard for RVC

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
how do you know if you're over training or under training your AI voice model the short answer is you want to have a graph that looks like this where the slope is constantly descending and then flattens out and as you can see there's all these different Hills and Valleys which indicate different stopping points and different continuing points so I'm gonna go over all of that in today's video but before we get started there are some prerequisites you're going to need to make sure you have python 3.9 or 3.10 installed on your computer and you're going to need to have run RVC successfully on your device as we're assuming that you have already trained a model and you just want to figure out how to train the models better so links Down Below in the description for all of those tools where you can go find out how to train a voice model or download Python and then you also need vs code as well that's going to be very helpful so with that let's go ahead and jump into it so the first thing we're going to do is open up vs code so go ahead and open up vs code and then we're going to open up a file explorer window where you have RVC installed and we're going to go ahead and drag and drop into vs code a note this can all be done inside of the Powershell window but we're going to do everything in vs code today once you're in here what we're going to do is type in tensorflow tensorboard and we're going to get this error here so this is what happens if you don't have tensorboard installed and so that's okay what we're going to want to do is create a new python environment so to do that we're going to go ahead type in Python Dash m v e n v v e n v and so we're going to go ahead click enter and that's going to create a virtual environment and we're going to click no on this window here so cool so once we have that we want to type in v e n v let's click tab type in SC Tab and then activate tab so the tab just autofills that and then we're going to click enter and once you're in here all we're going to do now is do pip install tensor board and once you have that go ahead and click enter you're going to see a bunch of stuff occur and once it's done um it should be good to go so go ahead and type in CLS into the console it's going to clear it now what we're going to want to do is make sure we have this logs folder here and this is the directory that we need for tensorboard to look into so for collab users you want to download your logs files and you have to do this immediately after training your model so go to code inside of here paste this line then on the left hand side we want to navigate to our logs folder and then we want to right click and do copy path delete this last part paste that path in here and then run the cell go ahead and hide the RVC folder right click and do refresh and then you want to right click download this folder save this into a directory you know go ahead and extract all go ahead navigate all the way into the file and then you want to go ahead and copy your trained voice inside of this logs folder into this RVC beta logs folder so go into RVC beta go into logs and then right click and then paste your logs into here you need to make sure that you download RVC beta as shown in the RVC local installation video now what we do is type in tensor board and then we're going to type in two dashes log enter and then we're going to just simply type in logs once that's done click enter and it's going to go ahead and launch it on this URL here so um you could you could manually type this into the browser but we're just going to go ahead and do control click and then it's going to open it up in this window here so so I went ahead and redid tensorboard launch for the models that I've trained and you can see all of them on the left hand side here if you don't see anything you either have your log directory incorrect or you have no training data inside inside of that logs folder if it's just taking some time you can click the restart button up here but once you're in tensorboard we want to go to scalars up on this menu bar up here and then you want to click toggle all runs so that gets rid of all of the other runs so that we can select what run we want to look at what we want to do is scroll down until we see loss and then once you're in loss we want to click next page next page until you can no longer click next page and we want to make sure that we see a a g total graph so once you're here you can go and select the voice that you want to look at today we're going to look at two the tensorboard test and the marine and we're gonna click on the check mark for what we want to see so here is one of the tensorboard trains that I had here but first we want to increase our smoothing to 1. with smoothing increase to one um here is the at zero here are all of the ups and downs all the Peaks and valleys here it is at one and so as you can see um to zoom in you hold alt and then you use the mouse wheel and then you can also drag it around as you're holding alt um as you can see right here there was a drop in the graph which means that um the model is learning a little bit more is getting better that occurred where you see this this Valley right here if we zoom out a little bit and go to this point we can see that this occurred at step 5.6 K but I'm getting ahead of myself here so let's go ahead and increase smoothing up to one and then go ahead and just click this fit domain to data and then we can see the graph again so this is a pretty good graph it's still decreasing so there's probably some room to continue training here um but all of these models are going to be generally pretty good look at this model right here we're going to click this full screen button here and it's going to bring it into full screen so here we have the model the Finish train model and I would advise to not get too caught up in trying to grab the perfect step the perfect value all of these models that are highlighted right here along this flat line are going to sound really good and for the most part it's going to have indistinguishable differences and one quick thing in RVC if you're wondering how I have all of these different epochs and steps the e means Epoch this is Epoch 105 with step at 7455. if you're wondering how I have all of these when you go to train your model you need to make sure that this save a final model to weights fold at each save is at yes so what this is going to do is going to save it at every time your save frequency and X so let's say I do it at one it's going to create a new weight everyone Epoch if you put it at five it's going to create it at every 5 epochs so feel free to play around with this and use at your own discretion so that you can grab the proper step that you see in the logs however let's go ahead and take a listen between two points so that we can kind of you know illustrate what differences small differences there might be to do this we're going to go ahead and decrease smoothing down to zero and then we're going to go ahead and take a look at the graph here we have the graph we have this point at 10K you can see in the black bar above and then we have this point at 70.8 K so I advise you put on some headphones because if you don't have headphones the sound is going to be really hard to differentiate but I'm going to go ahead and highlight some points between the 10K and the 70.8k models and that should also say the audio samples are in Japanese so just really pay attention to any Distortion that occurs in the models a camera yeah [Music] okay and so that is a short comparison between the 10K and the 70.8 K there are very minute differences between the two voices the 10K in the staticky Parts the distorted Parts is less distorted than the 70.8 K as well in some areas it sounds a little bit more full a little bit better than the 70.8 K however this changes when you bring the octave down from 12 to 6. so in converting it with 12 the octave is a little high a little bit outside of the data training that I had provided but when we bring it down to six that is a little bit more within the range of the voice let's go ahead and take a listen to some samples [Music] and so as you could hear the voice was a little bit deeper and in my opinion the 70.8 K sounds a little bit more accurate sounds a little bit better to me than the 10K like I said these are very slight and very minute details it's nothing that would make or break the model and so if you get caught into trying to find the best model for your training um based on these Peaks and valleys you're going to be here forever trying to analyze many millions of different things and that's going to be that's going to be a time sink and it's going to take too much time so my recommendation is find a model within this this area that is relatively low and just use that model instead as I showed earlier uh look at the smoothing when it's at Max and then you can bring the smoothing down to one and then you can find a value in here that is at its lowest but as I just Illustrated this High Point right here resulted in a model that still sounded really good and I I wouldn't really be able to tell the difference if I wasn't listening with headphones okay so to just review a couple of things with the tensorboard you just want to make sure that your tensor board is decreasing until it flat lines and then you want to grab some models from here and see what uh which ones sound the best subjectively the numbers on here don't always mean that the model is going to be better or worse than others because flat area The Voice models all start to sound pretty good in my opinion and once you hit that flat area I recommend you just stop trading so that you're not wasting any more time trying to continually train the model for these minimal gains as I Illustrated that the voice can sound really good at the beginning and at the end but this is a good guideline to kind of check and see what's happening with your voice is it over training is the graph going up and over training or is it flatlined therefore it's generally pretty good good what I recommend is don't use the voice changer to judge the quality of your voice use RVC to judge the quality of your model so that you can use it on an input file that makes it a little bit more consistent if you're still getting robotic qualities if it's not sounding that good you want to analyze your data set and you want to make sure your data set is high enough quality and so so I'm going to leave that to another video to investigate the quality of data and how that affects your model do you need a train longer on good quality data or can you get away with training less with good quality data if your data is is noisy and messy is training longer going to help the model at all some of these questions I'm going to answer in a future video so that is going to be that hopefully you learned something today guys and found something useful and if it helped you guys out please like subscribe all that fun stuff and I'll see you guys later
Info
Channel: Jarods Journey
Views: 19,426
Rating: undefined out of 5
Keywords:
Id: P0M7PAsG1fk
Channel Id: undefined
Length: 11min 40sec (700 seconds)
Published: Thu Jul 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.