(Tutorial) Use YOUR Voice in AI Cover Songs with Replay and RVC

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we're going to use the update of Replay along with RVC to make AI cover songs using your voice welcome back to the channel where we explore the creative uses of AI and a couple of weeks ago I made a video about this software called replay and it got a lot of attention and there were a lot of complaints too because there were some issues with it and I want to address some of those issues talk about the update and then address one of the most common questions I got in the comments which was how can I use this with my own voice well the answer is you have to have a model of of your own voice okay where does that come from you may have heard me talk in this series of videos about RVC and maybe even seen a video on it we're going to go over how to install and use that software on your own computer to make clones of your own voice that then you can use with replay and other software out there that uses RVC models now a couple of caveats right off the bat that we need to clear up both replay and RVC require you to have an Nvidia GPU in your system for best results now you may be able to find some workaround software out there that allows you to use it with AMD and Intel but there's no guarantees here at all unless you're at least using Nvidia on my system that we're going to be using today I'm using an RTX 2070 which is not a super strong GPU I have another system where I run an RTX 3090 which is a stronger GPU and does things faster but this process all still works on my 2070 just fine first we're going to address the replay update and some of the common issues that people talked about then we're going to talk about how to get RVC on your computer and then then create training data for that model that you're going to create with RVC and then how to bring that model back into replay so that you can create cover songs using your own voice let's get started over on replay if you've never downloaded it before you can do that here this of course is the windows option but if you click other platforms you'll see that you can get it for the mac and windows and Linux in my case I would just click download replay it'll download to my computer and then I can just double click to install it now here's where people get into trouble the first thing people freak out about is this notice that they might get on their Windows machine saying that hey man I we don't know about this this is unrecognized you better run for your life this happens a lot with software and development and there really isn't anything to worry about what you would need to do is click on more info and click on run anyway then just go through the installation process as you would any other piece of software I'm just installing to the default locations be aware that this program does download some fairly large AI models you're going to want at least 10 or 15 gigs of free space for this program RBC and then the things you're going to do the more the better of course once it's installed you can just click finish with this checked and it will run the program for the first time now this is another place where people run into trouble this process can take a long time and you might think that there's a problem you might really think there's a problem you might insist that there is a problem and what I'm telling you is most of the time if you just waited out things will continue what happens is when you click on download it tells you you need to download some AI models for a total of about 3.21 GB depending on your internet speed it may take a while but there are points point in this download where things seem to absolutely stop and most of the time it's going to say something like decompressing file and then a number of 47 and the decompression can take quite a while for some of these files to the extent where you will insist that something has hung up and that the installation has failed it got me several times but if I just waited maybe walked away came back 5 six minutes later it completed the process it may not do that for you it doesn't do it for everybody and it does do it for a lot of people and they think that there is a problem that's the first issue you sit around and you wait for quite a while for these models to install just a little feedback here for the replay team that they didn't ask for part of this installation process should give you the choice of what models you want to download for example some of these models that I think are the biggest are the ones that allow you to generate music from a text prompt and quite honestly the quality of that stuff isn't good enough for me to want to download this you can test it online for free several places and it just isn't that impressive compared to things like sunno and other AI Generation stuff that's really of the art and doing amazing things I wish I didn't have to sit through this process and I'll bet a lot of other people do too because the main purpose of this software is to create AI covers generate music from a text prompt that you probably won't be able to use with this software or anything else for that matter now right here on file 47 of 47 is where a lot of people think that something's gone wrong it says 100% the decompressing file notification is there and it will sit there sometimes 5 6 minutes and you'll think that there's something wrong I would just say to wait it out but even when you do the problems don't NE necessarily in there and I should point out that if you have replay installed already and you're doing this update you're still going to be forced to go through this model downloading process just be ready for that just so you know as I'm recording this video and going through this process I've been sitting here at least 4 minutes waiting for this to complete note right here it does say this might take a while maybe they need to elaborate a little bit okay so it's completed and then you get this screen which is the same screen you got at the beginning and you go wait a second I got to go through this all again if it worked like it did for me the last time you don't just just sit there a second and it just came up or you can close out of that install and then rerun it and here it is it's just a glitch once it loads really the major difference I can see in the interface is just that these buttons here multi model merge and favorites have been moved to the top instead of along the side not quite sure what else has happened there another thing to clarify is that there are versions of this software floating around and you'll see videos where once you install it you've got a whole bunch of models already built in and you're wondering hey how come they're not there where are they because of copyright issues they had to stop including them in the actual downloads now you need to go to waits. to download them and you can do that just by clicking right here it'll open up a browser window where you can search by name and you can preview what they sound like as a speaking sample for a weights GG voice model you'll notice that the dialect isn't there I get a lot of grief sometimes for doing imitations of the voices that I'm cloning and you're actually supposed to do that to give the model less work to do if I was to go into Squidward for example talking like this it has a lot more work to do than if I go in there talking like this if you have the ability to mimic a voice you should use it you'll need to create a free account at weights. to download models and things like that but that's very easy to do you can use your Google account or whatever you'd like now what you download is going to be a zip file first of course you'll need to extract the model from the zip file and then you'll also want to rename it let's just say extract all I have a models folder where I put all these things I'm going to create a folder for this so I can keep it separate open up the folder select folder extract once the files are extracted I'm going to look for the pth file which is the actual voice model and I'm going to rename it to the name of the voice and then I'm going to just drag that pth file right in here where it says select or drop custom RVC model give it a second to do its thing if all goes well you'll get a green little message up there that says model successfully loaded but sometimes you get this and you don't know why a lot of trial and error just tells me you just load it again for example it won't show up in the list right now let's try to load it again I'll see it in the list my point is just keep trying and eventually it's going to show up here I can choose this right here I've got my song in here I took off to New York that's the original voice you click on create song to New York in the summer bre that's I did it my way no one to I'm going to run that again just to demonstrate how changing the octave because that singer usually sings at a higher register and I'm going to click create song again I took off to New York in the summer nope that's too high one other trick I want to discuss here because it was asked about is when that happens when one voice is too low and one voice is too high we need to adjust everything right in this case at 12 up his voice is too high I'm going to take it down to about seven right that's five semitones down from the original which means we need to make a similar adjustment to the instrumentation and we want to make that as small an adjustment as possible to get to the same pitch the quickest way there is five semmit tones down again we'll go down five and then we will click create song I took off to New York in the Summer Breeze I did it my way no one not a great Phil Collins sample actually but that's the mechanics of getting a sample in there now the great news is if you have a bunch of models once you create this audio extraction one time it's a lot quicker to change out for another voice because it doesn't have to do all the separation of the vocals from the music anymore if I was just to click this one here let's put everything back where it belongs click on create song it's a lot quicker just to change the voice than it is to do all the extractions which just took a few seconds I took off to New York in the summer breed all right that's how that that all works now how do you get your voice in there well you have to build a model what are we going to use to build a model we're going to use free software that you can install on your Invidia powered system called RVC and to make that process so much easier we're going to use my favorite app called Pinocchio now let me just clarify there are a lot of oneclick installs for RVC floating around out there this is not the only way to do it this is just the way I like to do it because Pinocchio manages all of my different AI software environments nicely I provided the link in the description of course and once you get here you'll see that you've got choices for downloading in terms of platform you have Windows you have mac and you have Linux in my case I'm windows so I'm going to click download for Windows it creates a zip file which and within that is a an exe file and we're just going to run that once again we're going to get this notification that says I don't know if you should be doing this and you click more info and you're like I'm going to I'm going to do this and you click run anyway it even says that right here on their page once it's installed you're going to be presented with this settings screen I'm just going to leave everything exactly how it is and click on Save okay now it's all installed but there's nothing here you've got to go find something you silly goose and there's so much to find but for now let's just get RVC and here's how you do it click on Discover up here and you can probably just scroll down and find it but it's a lot quicker if you just type RVC and there it appears just click right here click on download and if you've never downloaded any AI software before you're going to be presented with all of this stuff that you need to download onto your computer to get it ready to run this and other software now for the most part this right here is a one-time thing and and I've already done this on another computer so I'm not going to do that here I'm going to switch to my other installation of Pinocchio and show you what you would do from this point when you click on download you're going to be presented with this dialogue box which is basically what's the name of the folder you want to put this thing in an RVC would be good and I'll just say and then click on download Once that's downloaded you go back to Pinocchio main screen you'll see RVC there and an install button go through that process which may take a while to download all the models and then you'll be able to run it now if you have any problems with Pinocchio on any level which a lot of people do to be quite honest remember there are numerous places where you can find and download and install the RVC software I would just invite you to do a search for that find the most recent version you can because any link I leave here could be outdated by the time you see this and then follow their install instructions once it's up and running you can follow this basic procedure now once you get RVC loaded and running you're going to see a lot of stuff and it may look overwhelming and a little bit confusing we're going to just talk about what you need to do to clone voices and I invite you to explore the other functionality things like separation of stems it's all built into the program here and of course training the voices which we're about to do and other kinds of processing of the checkpoints that I never ever use I really come in here and I do exactly what I'm about to show you the first thing we need however is data to feed this model now in the case of creating a voice you want to make sure that the source material that you use for the voice that you're cloning is great it has no background noise there's no Reverb there's no music behind it no sound effects no nothing you ideally you want to go into the studio and that's what we're going to do any you want anywhere between 10 and 20 minutes of this audio to create your data set I'm going to go into the booth and create a fresh data set right now okay so I'm going to record about 10 minutes of me just talking and I may use elevated pitch I may sing a little bit but since I'm not a singer and to create a model of me singing for 10 minutes would make it sound horrible what I'm going to do instead is just create a speaking model with maybe some singing in it to give the model some idea of various ranges that occur while I'm talking know okay so here I am I'm now talking just for 10 minutes that's a long time to just talk but I've done it before I should pick a topic maybe I could talk about you know Ai and people's reaction to it and what they're saying about it now we have RVC running and we have our training data that audio file in a folder over here the first thing we're going to do is go to the train Tab and we're going to enter the experiment name which is really the name of the voice that we're training in this case I'm going to say Bob Doyle that's me high energy we're going to do the target sample rate of 48k cuz why not go for the best whether the model has pitch guidance yes we do need that to be true because this is we're testing singing version two number of CPU processes we will leave at 16 that of course is going to depend on your CPU mine should handle this just fine you may need to play with this value a little bit if you're getting errors here it says enter the path of the training folder and that's the folder where you have the audio file or files that are being trained in this case we just created one long file as opposed to several smaller files which you could also do as well but what we want is the path of the directory where they are I'm going to click up here in the address bar for that directory I'm going to click on copy address as text and then I'm going to click this right here and paste it please specify the speaker singer ID we only have one voice in this audio recording it will be the only one there is which is the default value of zero enter the GPU index if you have more than one GPU you can choose one or more of them here I only have the one it is value zero I leave that where it is this right here is actually talking about the extraction algorithm when it comes to the pitch extraction for The Voice we're going to go ahead and go with the default of rmvp GPU and we'll leave this just as it is save frequency save every Epoch now an Epoch is kind of a slice of training and we're going to do 250 Epoch total for this I'm going to save every 50 Epoch so that I'll have basically five files to choose from with the final model is perhaps overtrained or doesn't sound quite right I can go back to earlier iterations of the model and see if maybe that worked because sometimes lower step values work better than higher step values so it's good to have some saves that you can check on total training like I said I'm going to go 250 I've had great success with anywhere from 150 all the way up to 250 Epoch after that I don't really notice too much of a difference the batch size per GPU I will be able to crank this all the way up to 40 because of my 3090 but you will have to determine for yourself what your processor will handle save only the latest checkpoint file to save dis space I'm going to say no cash all training sets to GPU I'm I'm also going to say no save a small final model to the weights folder at each save point we're going to click yes because those are the files that we're going to be actually using and bringing into replay down here we can leave everything as it is and just move on over to oneclick training and click it we can get a high level overview of what's Happening Here under output information if you scroll through this output information you can basically see the output of the log of what's going on and tracking what's happening breaking it up into smaller audio files and working its modeling magic if you want to more easily track the progress of things you can open up the terminal window which will be down at the bottom of the taskbar it takes a little bit of time for things to ramp up and then it will start training the epoch and then you can track that more easily around here right now it just says Epoch 1 it has the date and the time it's taking about 11 to 12 seconds per EPO which is about what I'm used to with this particular setup in GPU again it might take much longer for you or it might go much faster for you depending on your configuration but for me in about 20 minutes this should be done okay that actually did take longer than 20 minutes because I forgot I was trading more Epoch than I normally do in any case it's done here as it says right here Bob Doyle high energy saving final checkpoint success Let's test it now back in replay all we have to do is drag that pth file into replay successfully added Bob Doyle high energy and now we're ready to use this voice with any file that we have let's use sunno to create a song real quick for us to replace the vocals on mail singer songwriter ballad about love and loss create [Music] okay we're going to just download the drag and drop it there let's just listen make sure it's still there great Bob doil high energy is the voice we're going to leave all of this here even though that voice is pretty high well we'll give it a try cuz I can always lower it again let's click create song separating track okay again the original and but she didn't I just another face all right and now the converted one I saw her standing [Music] there it's pretty high for my [Music] voice catch okay let's try it at 12 below I think that's going to be too low but let's at least give it a shot and then we can play with adjusting the instrumental pitch as well but she did didn't see me I was just another face it is a little low so let's pitch this up four two three four and we'll pitch this up four two as opposed to going down eight which would make the piano I think too low let's see what happens I saw standing there let me pop ahe heart against m I just don't know that really sounds like me very much I think really more the problem is the conversion the actual quality of the voice itself at the beginning and the way that it's pronouncing the words just for grins let's do this file that we had son AO create let's just listen once the Night Under The Fading lights promises whispered okay it's a weird song terrible on every level but let's see how it converts my voice since it's more of a spoken mid-range thing it might do a better job we need Bob Doyle high energy make sure everything is at zero and we'll hit create song okay here we [Music] go okay I got to lower it an octave and try it again cuz that's clearly too high for my voice this won't take as [Music] long the yes okay yes it's highly autotuned it's weird it's affected but that is my voice correctly clone and just a very weird weird track but even though this thing was fraught with weirdness I hope you understand the Simplicity of the basic process is to use RVC to clone your voice and drag it in here if this is the kind of thing you like then why not subscribe to the channel if you subscribe now I will not look for you I will not pursue you but if you do not I will look for you I will find you and [Music] I
Info
Channel: Bob Doyle Media
Views: 7,423
Rating: undefined out of 5
Keywords: ai, artificial intelligence, chatgpt, creative ai, synthetic media, bob doyle, replay ai covers, replay ai tutorial, replay and rvc, clone your own voice, how to clone your own voice with RVC, RVC voice cloning tutorial, create AI cover songs with your voice, how to create ai cover songs
Id: Vy82s1NsKY4
Channel Id: undefined
Length: 19min 29sec (1169 seconds)
Published: Wed Apr 10 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.