AI Voice Cloning for Singing using so-vits-svc-4.0 with Google Collab/Nvidia Card

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what's going on today YouTube sorry for the lack of uploads I've been sick for about a week and my voice just wasn't having it I was having a lot of coughing and I might cough sometimes during this video as well today I'm going to be going over sovitz SVC 4.0 which is basically an AI model that can create or that can sing songs um based on a voice that you train it with and it can reproduce the song vocals with that voice so um I think the easiest way would be for me to show you some clips on YouTube and then I have trained a voice model on my own voice which is it's okay right now it's not the best but you know it can get the meaning across so let's go ahead jump into into some examples and for whatever reason there's a ton of Kanye ones on YouTube I don't actually listen to Kanye so I can't tell whether or not it's accurate but let's just go ahead and give some of these a listen and you be the judge for yourself to see how good it sounds so here's one right here by Lukey dejano and it's Shake It Off I see how to lay got nothing in my brain and that's what people say that's what people say I just like I got this music in my mind seeing it's gonna be all right cause the play is gonna play play play and the haters gonna hate well now I'm just gonna shake I shake it off shake it off okay so that was actually pretty good I don't actually know how Kanye sounds um but that was uh that was pretty dope in my opinion I haven't actually listened to this one so let's just go ahead and choose one more and I was scrolling down and I saw this Japanese song Quite a popular Japanese song like that Goldman and um is Donald Trump singing it so I think everyone knows how Donald Trump sounds so we'll go ahead and play this one and see what happens I said that nobody told [Music] all right so I've shown you some of this in action so what the heck is it well let's go ahead and take a look um let's go ahead and head on over to the GitHub repository for SVC and the one that I'm going to be going over today is the Soviets SVC Fork this is the one that I use but it's based off of the main um the main GitHub repository as well um and let's just go through a couple of things in here so this is a fork of it with a better graphical user interface and basically how it works is you give it a bunch of data samples of some voice um whether that be speech of it and then you train a model on it then all you have to do is load in a vocal sample of a song and feed that in with the trained model and then it'll do an inference based on the song that you gave it so it'll use all the intonation all of the accents that you have inside of the song and then it'll use the voice model that you chain to create that voice model for the for The Voice output of the song so that is how I understand how this works I'm no expert on it but it's a pretty simple installation if you have some python um and you can even do this in Google collab it'll just be a little bit slower as well basically you set up your environment with these pip installs and then you go down and then you get all of your data samples put them into a folder I'll show you all this later and then you just run four lines a sec pre-sample pre-config pre-hubert and then a train line so this is all pretty easy to do um and and you know it looks a little bit complicated but it's actually the this part of it the front end of it where you're just giving it data is actually pretty easy because we're not actually designing anything and so a couple of notes for this type of model is that you know the audio samples have to be underneath 10 seconds so if you feed it anything that's over 10 seconds you'll probably run out of vram and you know that's what is going to be needed to train these models so if you have anything lower than 10 gigabytes of vram you won't be able to actually train this model on your computer but you can always use Google collab so I'll show both sides of it you can go ahead and read more of this if you want to but I think the the best way is to just jump into the code and see how it works and today's not going to be a fully comprehensive guide on how to get this set up just kind of an overview on how you could get it set up so a full detailed tutorial will be coming later but a couple of things so you need some trading data so what I did for this was I created my own training data and what you have to do is you have to open it up and you have to upload it into your Google Drive so here is a Google drive right here and so what you have to do is you create a folder called so dashvits. SVC Fork Dash fork and then inside of it you create a folder called data set um and then inside of this data set you put in all of your data samples so here is a folder that is me and I have 100 samples of my voice um in here that I recorded of me just saying some sentences so if we take a sample look at one here's one the Birch canoe slid on the smooth planks so that is one voice sample right there and these are some sentences that I got that are supposed to be phonetically balanced or something along the lines of that but 100 samples so that is what I have there in the collab though you know once you have all that set up inside of your Google Drive you need to set up a run time so for this you just set up a run time and make sure you have a GPU good GPU there and then you click this a little connect on the top right corner you can't see it because I'm blocking it click this connect right here and it'll connect to a machine so this is basically just connecting to a machine on Google servers um and it'll allow you to use some of the some of the compute power so here we go um if you if you don't have more than 12 10 or 12 gigabytes of RAM on your computer you'll have to do it this way um so once you have that once you have all your data in there you can go ahead and run this right here that's just an authorization saying that this was not uh was not curated by Google it was created by somebody else so only trust it if you trust it I just clicked OK and as you can see we're running on a Tesla T4 from Google which is more than enough for trading this model uh the other thing you're gonna have to do is Mount the Google Drive so this is basically going to give the notebook access to your Google files only do this if you trust it I'm going to say connect to Google Drive and then it's going to authorize it right here so let me go ahead do that already I went ahead and authorized it um while that's doing that another thing you need to know is on this left side is your file browser so think of it as the windows file browser it's just the same thing here so inside of that you have files and then inside of here you're going to have your drive so this drive is your Google Drive and this is the top folder for it if you click one more on my drive then you'll see everything that's inside of here so here we have everything that's in my Google Drive and as you can see we've got that sovitz SVC Fork here and then inside you can see we've got the me data set in there and I'm actually going to delete this I don't need this config anymore this config.json so I'm going to delete this file I don't need it and then close all this so we can see everything alrighty so here is where it installs dependency so you just go ahead and run this and it's going to install all of the dependencies which is going to take a little bit of time if if you're on Google servers it's going to be pretty quick so here we go here we see it running so I'll be back when that's finished alrighty so we're back and it took one minute to run on um or about two minutes to run and so once that's done then you can head on over to this next cell which is this it's going to create the data set directory that it needs for it so that's going to be really quick and then inside of here you're going to want to change you know the data name to what you have so I have mine set as me so my data set name is me as and as you can see here is the folder structure that it wants you to have for the data set so let's go ahead and run this and it's going to copy the data set over to this little file browser on the left hand side here and then we'll go ahead and do this these next couple of steps so if we close this out if we minimize it and reopen it up you'll see a new folder pop-up called data set raw and here we have the data set all right so uh what we're going to do is just run this next sale pre-sample it's going to pre-sample the data and then we'll just click pre-config as well so that's going to create a configuration file that we can modify in order to run this already and so once that is done you have your you have your configuration file here we're going to go ahead and change it before we copy it so minimize and reopen it and now we have a new folder called configs inside you have one called 44k and then you've got to config.json so when you double click into it another screen is going to pop up here on this right hand side and that's what we're going to adjust and here we have some confusing numbers there are only a couple of really important ones that you need to really focus on for training um at least to my knowledge at the current moment which is going to be your batch size your amount of epochs the log interval and then evaluation interval okay so let me go ahead and explain this to the best of my knowledge and feel free to correct me in the comments below if you know a little bit more about it and if or if my explanation was wrong so um a couple of things that you need to know are the epoch count the batch size and then your log and evaluation interval so the only thing that you'll probably be changing is the amount of epochs and the amount of batches people say A good rule of thumb is to have 25 000 steps what is a step the step is the size of the step is basically one batch say I have um in my case I have 100 samples if I set my batch size to 100 it's going to use it's going to run through all of those 100 samples in that one batch so that is one step to explain a little bit more and Epoch is basically the amount of batches it takes to run through all of the data since I have 100 samples in this case my batch size is set to 100 I've got 100 samples so I'm gonna finish this Epoch in one step because per batch I'm doing 100 and I only have 100 samples so if this was the case right now um and I want to run 25 000 steps because my batch size is set to the amount of samples that I have in the data I would set my Epoch value to 25 000 so that would be to run 25 000 steps keeping in mind that I have a hundred samples what if I want to do a smaller batch size because the larger the batch size the more intensive it is on your computer and the more vram you need so they say that you can lower the batch size if you have less vram but there's a certain amount of vram minimum that you have to have and I think that is 10. so um let's say I have 50 so I've got 50 for my batch size it's going to take two steps to run through 100 samples of data so what I'm going to need to do is now set my Epoch value to half of 25 000 which is 12.5 K so if I go along the same lines and do 25 I'm going to need a total of half of 12.5 which is 62.50 and then if I reduce it down even more you know you you probably don't want a you probably can't do a batch size of 12.5 and so you could keep doing it this way but eventually we're going to end up with a batch size with a decimal point value um which we don't want to do so oh so basically what we would do is there's a certain formula that you can use to calculate the amount of steps that you need and so this formula um based on my explanation can be defined as your sample size divided by your batch size multiplied by your epochs equals the amount of total steps so if you want 25 000 steps move all of the variables over to the other side and you can find out your Epoch by by algebra so now if you don't want to do any of that algebra um which is okay you can just go ahead and set this leave this at ten thousand and one and then set your batch size to tool to a small number if you have low vram you can do 16 in this case I'm going to go ahead and set it to 50 and see how that runs that's your config.json we're going to go ahead control save that and to do that you just click Ctrl s and it's going to save it um and then we're going to go ahead close this Json and then now we're going to go ahead and copy this configs file so you just click copy configs and then the um the pre-hubert that we're going to use here I actually don't know what F0 stands for I have to go take a look we're going to use the crepe model I hear it generates a little bit more human-like voices and we go ahead or less robotic voices and we go ahead and run this and it's going to go ahead and run through pre-hubert this is weird it is um not giving me what it had before it should list out a bunch of things that occurred so we're gonna go ahead and run it again already and I guess I I guess maybe that's what it's going going to do so on the collab so I guess we can just go ahead and run train and then we should be good to go so we just run this last cell and it's going to go ahead and start running the training part portion of the model so we'll see if it can run it as you can see this didn't actually run because there was something wrong with this pre-hubert step here so so we're gonna go ahead and actually lower this um we're gonna lower my batch size here and we're going to do 16 because we are running on a collab and I don't know if you can use all if we can do 50 so we're gonna do 16 for pre-hubert and see if we get a different result alright so it doesn't seem to be working correctly so we're just going to go ahead and stick it back to Dio and try running it this way okay so I tried one other thing which is due to do um Dash N2 and then lower the batch size so we're supposed to get a lot more of this it should go all the way to 100 so I don't even think this train is going to run correctly so if it doesn't run here on the collab it may be a glitch app opening I'm not sure what happened because I was able to run this last week but maybe something changed um okay so lo and behold it actually is trading here so I thought I wasn't going to train um but yeah so um let me show you so let me show you what it's supposed to look like so here I have it in vs code already when it does pre-hubert it should reach 100 here for the uh for the model um okay and so yeah it should reach 100 for the pre-hubert um you know if you're running it but maybe that's just my limited knowledge of this but as you can see it is running here it is running at um it is running at what did I set it at running at a batch size of six so um it's slowly moving through it as you can see uh it's got 17 steps right here so this is um 17 steps that it has to do you get the 17 by dividing 100 by six so if you divide 100 by 6 6 and so what it does is it rounds up to the nearest whole integer so if you have 16.1 it's going to round up to 17. so you have 17 steps inside of your Epoch and it's going to train the model as it gets going so good to know that in order to train it on collab you still have to run Dash N2 and that is how you get the collab running for a voice model and it's going to take longer depending on how many sample size you have and whatnot so with that out of the way with collab out of the way I'm gonna go ahead and show how you can run it if you have a graphics card that can support it with the amount of vrams so let's go ahead and head on over to a and so the same way that you did it in collab you can also do the exact same thing using um using vs code so I'm not going to go through the entire steps you need to do this but basically what you do is you clone the repository so here it is so if it's SVC Fork do this in installation here so that is what I did inside of my VSCO terminal here you run all of these pip installs I recommend that you do it inside of a conda environment or inside of a virtual environment and if you need to set up a virtual environment in vs code it's pretty easy all you do is Ctrl shift p and then python create environment and choose one of these environments here and that'll make sure that you don't have any package like conflicts or anything like that and then all you have to do is do the exact same thing you know get all your data into the right spots so inside of this Soviet SVC Fort folder all you need is a folder called data set raw and so here we have the me folder here and then when you run those commands you're going to do your your pre-sample your pre-config the pre-config you're gonna end up with a configuration file that you're going to configure the exact same way so I'm running a batch size of 50 here and then you know you do the pre-hubert and then you run trains so when you train it it should continue running so I will probably train my model for a little bit longer I might need to add some more data samples and just run it entirely again so I'm not going to train it on video because it's going to not allow me to record we'll just go ahead and jump into the demonstration part of my voice so let's go ahead and do that so if we're in the folder we could just shift right click and then click open Terminal and then we could just type in svcg so SVC GUI and it's going to open up this a graphical user interface right here so what we're going to do here is we're going to go over to browse and we're going to search for the model path so the model path for this is actually inside of Soviets SVC Fork um logs 44k and then your latest G model in my case it's going to be ten thousand and one so I opened that the configuration is going to be in the same folder logs 44k and the config.json and then your audio path is going to be the audio file or the song that you want to um that you want to turn your voice into so here I have some songs and I extracted out the vocals for them so we have um Plain White T's Hey There Delilah let's go see here are the vocals so we'll go ahead play that put that there we're going to change my octave down to let's just do zero and then see how that sounds turn off auto play turn it down to Crepe um and then we're gonna go ahead and just click infer here so I'm going to use my GPU by select misuse GPU option and then I'm going to infer so um it's going to take a couple of seconds if I open up this window here you can see that it's running through the um the inference and it'll tell me when it finishes up so this this will vary depending on your GPU so just um you know wait for it to finish and here we go it's finished and it's going to be inside of that folder so and let's go ahead and take a listen Hey There Delilah what's it like in New York City a thousand miles away but girl tonight you look so pretty yes you do some Square against shawna's Pride as you ask words [Music] [Music] [Music] all right we're gonna go ahead and stop it there um as you can see it's not perfect there are hiccups it does sound a little robotic and so that comes with more training you know adding more data there is no science to it and usually throwing it more data and training it for longer generally results in better results and this also works in other languages as well so so here we are and if we go ahead and play this it'll be uh Japanese foreign [Music] okay so yeah it kind of broke um that's all I wanted to go over today um I didn't do a full comprehensive tutorial on it but if you'd like to see that let me know down in the comments below um this is something that I just got really interested in the last couple of weeks and I haven't been really working on working on the other stuff but um I have some stuff that I need to work on for Vivi so expect some videos for Vivi uh The Voice Assistant coming out soon and yeah I will be gone next week so I won't have too many too much content next week either but that's gonna be today's video see you around later and um if you have any questions let me know down in the comments below
Info
Channel: Jarods Journey
Views: 25,194
Rating: undefined out of 5
Keywords:
Id: 7hoFqNJ0fcs
Channel Id: undefined
Length: 22min 21sec (1341 seconds)
Published: Wed Apr 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.