Can we CLONE my voice using ML?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what's happening guys welcome back to a another live stream we are going to be going through and seeing whether or not we can use machine learning and deep learning and all that good stuff to clone my voice we shall see if we can get this up and running now we're going to be using an awesome github package and this is um partially in response to a question that i saw on the community tab as well i just realized i didn't have a noise cancellation or um somebody asked how do i find a bunch of awesome machine learning use cases and like cool models to try out well i mean the best resource is github and this brings us to what we are going to be taking a look at today real time voice cloning but before we do that let's say a big shout out to everyone who's on the live stream hey doing davida how you doing how you doing leonard how you doing my do it al paslan how you doing uh christoph how you doing r.g ruben what is happening guys how are you all today hope you are voice distortion oh no you're saying it's not good huh okay i don't know we shall see let me know if the audio sounds a little bit weird oh no what is happening with the audio maybe this is this how bad is it i don't know i can't hear any feedback so i'm trying to wonder what it is now my voice is distorting there's only one thing that i've changed and i've got a feeling that it could be all right give me a sec hold on i'll be right back how's that is that better we back disconnect your earphones my ear the audio is not coming from my earphones though it's better thumbs up yes or no boom [Music] live debugging on the fly guys there talk about tech under pressure gosh all right uh all right clearly that's um still some very slight crackling that is such a pain so what happened was i actually did some i got this new powerline connector to try to stream over a wide connection and there's still a little bit of static i've got no idea what this could be the reverb looks so fun i'm so curious to hear this back after this how bad is it is it watchable or is it not watchable perfect now okay all right i'm gonna run with it for now i need to go back and watch it after this to see what on earth is happening and or at least do a recording and play it back we shall find out very soon okay all right live debugging on the fly i love it community support you guys rock uh also i wanted to sh yeah so i was at the gym today and um i was actually i found the most motivational quote i've seen in a long time i don't know if you can see this might be super small but once that gets into focus run faster eat pasta how good is that anyway i was like code faster eat pasta and i have that that's my personal favorite hey saga what's happening man welcome to the live stream guys for those of you of you on the live stream saga on the channel or um in the chat is uh actually one of my colleagues or he's he's actually departing onto a brand new opportunity but um he's also an attack or not also he is a tech wizard uh i don't consider myself that all right cool what are we doing we are going to get into real-time voice cloning so let's jump into it okay so also remember how i told you guys the game plan is to be one percent every bet better every day check out the live chat here how are we pretty awesome right okay so the game plan is that we are going to try to give this real-time voice cloning package a crack to see if this is going oh if we can get this up and running now how i found this this is answering somebody else's question that i saw on the community tab where they were actually asking about like how do i find good machine learning packages and packages with like awesome um machine learning models and whatnot so what i will normally do this is literally my thought process i didn't do it for this but this is what i normally do so i will go to papers with code if i know what type of model i want to use papers with code and from here i will actually go insert like so let's say for example we need a transformer model i'll type in transformer and then from here we'll see that there's a bunch of papers so if i want to read a paper i can do that and if i want code then normally these are ranked by the number of stars the number of stars on a github repository is indicative of how good that particular library or that particular package is so you can actually go to that library just by click or to that github repo and it is going to give you an example of where to do that so if we go and click that github repo which is the hugging face transformers one so we can go to the code this is probably going to be a good indication that we are going to have a good model to work with the other place that i'll normally go to is like straight into github and i'll type in so in this particular case for this voice cloning uh tutorial on tutorial live stream whatever um i'd actually just jumped in straight into github and i typed in voice cloner this is inside the hugging face repo let's go back out of that voice because we want to search all of github voice cloner and you can see this is the one that we're going to be trying out to create a voice or clone a voice in five seconds to generate arbitrary speech in real time this is going to be interesting five seconds let's see how long it takes us to get this set up but this is i've never seen this before as well so you are going to see me trying or attempting to set something up in real time so i've already got the repo here i'll drop it in the chat so y'all have it now what we are going to be trying to do is set up this voice cloning library so let's have a quick read so this repository is an implementation of transfer learning from speaker verification to multi-speaker text to speech synthesis sv2 tts with a vocoder that works in real time this was my master's thesis correctnj mad props for making this public and open source this is like some really really cool tech there are so many applications of nlp i think out there so when it comes to investing in your time in ml type capabilities that are going to be most profitable for you as well as most valuable in the ml space honestly nlp unstructured data is going to be absol an absolute game changer image stuff is still very good but nlp is really where it's at at the moment okay let's take a look sv2 tts is deep learning framework in three stages in the first stage one creates a digital representation of a voice from a few seconds of audio in the second and third stages this representation is used as a reference to generate speech given arbitrary text video demonstration click the picture i don't really want to all right let's watch it i don't know if you can hear this but basically you probably can't hear this whatever let's just jump straight into it okay so we've got papers implemented news setup all right this is what we want all right so both windows and this is what i do on real-time packages right i'll actually just walk through a github repo and work out how to set it up sometimes if i've got like specific code that i want to go into i'll actually just jump straight to the specific model or find where the model actually is and then i'll dig into that from there so um just a little point to note all right so both windows and linux are supported a gpu is recommended for training and for inference speed but is not mandatory python 3.7 is recommended python 3 or 5 or python 3.5 or greater should work but you probably have to tweak the dependency versions i recommend setting up a virtual environment using v-e-n-v but this is optional install ffmpeg i think i've already got that set up install pi torch oh gosh this is gonna smash my live stream um we'll see how bad the live stream or whether or not impact's downloading stuff um pick the latest stable version your operating system your package manager pip by default and finally pick any of the proposed cuda versions if you have a gpu otherwise pick cpu run the given command install the remaining requirements with pip install okay this is the pre-setup okay so i am going to go on ahead and kick this off so let's go i'm just going to go to my d drive where i do all my youtube stuff i haven't done a lot in here as of late so i'm going to create a new folder i'm just going to give it a date this is literally what i do when i'm creating a new project for you guys 12.7 20 21 and we are going to call this tap into it again uh we're going to call it voice cloner and then i'll go and open up a terminal and we'll go d drive cd youtube cd 12-7 and then what do we need to do so let's quickly jump back to do so both windows so i'm going to have python already so i'll have python already here so 3.9.7 in case anyone wants to know the version what are we doing so we're going to first up create an environment so um python v-e-n-v let's call a voice cloner no such file uh hold on why am i forgetting this oh god i'm having a complete brain fart having created a virtual environment in the ton v and v uh steps you can clearly tell that i haven't gone and created a virtual environment here we go uh what is it python dash mvemv uh all right python voice quina oh 20 20 oh gosh yeah my bad oh yeah it's been a long day guys all right we're just going to say we started it last year um python source activate yeah we gosh i really haven't done this in a while dot backward slash voice cloner [Music] it's scripts activate my bad i've typed that incorrect voice cloner boom okay so that's our virtual environment activated now what we need to do is likely go and install all of our dependencies so what's he asking for here ffmpeg i'm pretty sure i've already got that done pie torch so pytorch is going to be the biggie that we've got to go and install so let's go pi does he have a recommended version inside of these requirements let's actually clone this down first so i'm going to grab this we're going to run git clone paste that in i'm a time traveler hell yeah one of my many mysterious skills alongside chugging beer all right this is coming down now give this a little time to clone i love it 2021 i'm an absolute muppet aren't i all righty that's our repo cloned so we've got real-time voice cloner so we can see the into that cd real-time voice cloner yeah all right we've got a bunch of stuff so let's go and take a look at this so where's our requirements file does not so this specify the stuff that we're going to be asked to install right so the inflect package librosa this is a um an audio analytics package matplotlib we've all seen that before it's for visualization numpy great for working with arrays pillow is what is that for image iqt5 i believe that's a gui framework for python psychic learn machine learning scipy is a scientific data scienti library sound device haven't heard of that one before sound file tqdm haven't heard of those before umap learn uh uni decode url lib is for handling requests wisdom and web rtc vad haven't heard of some of those but that's fine we shall explore guys okay so uh what do we need to do now so we now need to go and install our dependencies so the first thing actually wait he said install pytorch first so i've created a virtual environment install ffmpeg i've already got that done install python uh just for you guys here i'm installing install ffmpeg so if you just go to the ffmpeg website you can actually step through is it really real time if you can see 2021 this is all pre-recorded guys um so you can actually go and download the source code i think there's an installer as well so depending on what framework you're running on so if you're running on linux you can install the linux version if you're on windows you can install the windows version if you're on a mac you can go and install the mac version as well uh we don't need that because i'm pretty sure i've got it installed at least uh let's go quickly double check so property um how do i get to environment variables pretty sure i had this added to my path fmpeg ffmpeg yeah there we go so i've already got ffm peggings in my c drive so i don't how am i going to zoom in on that right so you can see it there that's ffmpeg okay so we can close that close that close that close that what are we doing next so let's take a look so what do we need now pi torch all right i touch off to the pi torch website and we can go and hit install and then we're going to choose our versions as per usual so we're going to choose stable we're going to choose windows i'm going to use pip because i like pip we are going to choose python and then cuda oh gosh what version of cuda am i running probably none of those but uh we will soon find out if i go to my where's cuda and sword now let's actually check out environment variables again so if yeah if you don't know how to install cuda uh then what you need to do is go to uh the cuda actually i've got a video on cuda what am i saying there's a video on me installing cuda and if you watch any of the transformer videos or if you watch trying to think there's definitely one if you want a video like a proper tutorial on cuda installing cuda and cu dnn let me know and i will do that for you okay so we're not going to do cuda but i'm pretty sure i've got cuda installed let's just double check what version by going to my path i am running cuda 11.2 it supports cuda 11.3 i'm gonna download that one and hope it works we shall see so let's do that so we are going to copy this command here i'm just going to throw it inside of here i can just run pip install cut that out so we don't screw that up and we're going to go back to our terminal and we are going to go in and saw cuda that's going to take 10 minutes uh five minutes five minutes twenty three two you'll see question from alan hey nicholas once your channel grows even more have you considered this making this your full-time hobby i think it is my full-time hobby did you mean my full-time job job alan um honestly i love doing this like i was i was actually thinking about it today as i was sitting on the couch at my lunch break and i'm like i actually really really enjoy live streaming um and making videos as well but live streaming particularly so i don't know why i just like the interactivity and like presenting um but yeah it is my full-time hobby it's probably more than a full-time hobby at the moment oh gosh this is slowing down massively let's go to the chat while this is downloading because we can't really do much right now alrighty question which kind of laptop would you pick or suggest this is right in the way which kind of laptop would you pick or suggest for ml or mldl kind of work except a macbook um i was actually looking at this recently there's a couple of interesting um so there's obviously somebody mentioned the tensor book inside of um one of the comments that i saw recently so i might take a look i'd probably be taking a look at that um pretty much anything that's got a mobile gpu i'd be looking at i mean don't expect it to be as fast as a gpu on a on a desktop like a full-blown stand-alone gpu but i mean like anything that's reasonably high spec enough i'd i'd be taking a look at that i don't i typically use matt like my macbook and use collab separately but um yeah really anything with the mobile gpu is gonna serve you well so like if you're getting like something with like a 3080 ti that should be pretty good uh otherwise just get any laptop and use cloud cloud services for um particularly if you're going to do deep learning just use cloud services with with gpus attached that tends to work well as well yeah alan i i figured the word job so i absolutely love my job i don't and this isn't me just saying it because like i there's maybe some people that i work with online i absolutely love my job um and i i can't imagine a day at least at the moment where i'd want to purely drop my job and just do youtube and the reason for that right is i learn so much by being on the job because i actually get to work with some ridiculously intelligent people and i've actually got i get to meet a lot of people working in this space and and share what's possible just like i'm sharing with you right now i think it's um who knows maybe one day things will change and maybe um some something amazing will come of it and and maybe we will do this full-time you never know but um for now i'm pretty much spending all of my free time doing youtubing and doing streaming and learning about this stuff so um i love it right now it's it's working out pretty well and yeah you know we'll see where it goes but i'm having fun that's the most important thing i think um key thing when you're whatever you do in life whether or not you're building code or working on projects remember it's gotta you've gotta enjoy it because um long after the shiny bit has passed you need to still have something that draws you and it you're attracted to it um i like in spite of data science continuously changing i love analytics i love building sophisticated models i love coding um i think some part of me is always going to want to do that pretty much for the rest of my life like um whether or not that's still data science or building apps i think i'm probably going to be doing something in that space um yeah you so you're getting it looks like somebody's getting some recommendations on laptops in the chat um how to start learning ml i'm watching videos tutorials and in theory we're probably going to address that a little bit later um all right cool so this is still downloading let's keep watching we're getting close now so it's still it's got like what about a minute left we're gonna run through this for 50 seconds it's so funny i went to all the effort of getting this powerline adapter to hopefully get the stream running with better latency and i wonder if that was the driver of the audio problem because that was the only thing that i changed in my setup who knows it shall be fun all right that looks like it's downloaded now downloading torch vision torch audio almost there must be the nbn ah don't get me started on that anyone that asked me about internet i have this huge rage about it because i'm like living in australia like we do not get any like we do not get fast internet like i was getting ridiculously fast internet in the hotel in singapore i'm like what this is faster than my home setup it's ridiculous i probably could have live streamed from from the hotel what am i doing so we are currently getting all the dependencies installed to start working with the voice cloner library so we're actually getting this set up all right cool um so you are using pip that's fine doesn't matter so let's just take a look at pip list all right so we've got pytorch we've got torch audio got torch vision and a whole bunch of other stuff so let's go into we're already inside of our real-time voice cloning so pip install what do we need to do now i believe we wanted to install yeah they're they're dependencies so we're going to go and install dash r requirements let's just double check we've got the requirements file in here [Music] we do requirements so we are going to go ahead and install those dependencies so pip install let me move this over to this side so we can actually see what we're doing bring this there bring this there okay it looks a little bit better pip install uh requirements for txt oh gosh this could be uh like a fun animation in and of itself so the game plan is i mean we'll see how long uh we've been streaming for 27 minutes already well the game plan is to try to get this up and running we'll see how long it takes and and whether or not we can get this up and working but first time never quite goes right if we fingers crossed how cube internet is pretty fast in romania i'm i'm coming uh corianis you're in perth and the internet's like yeah i know that feeling it is is absolutely brutal particularly when you see some streamers like downloading a game in like 30 seconds i was i'm still trying to download the latest forza i bought it and i'm like a quarter of a download through i'm like i'm not waiting for this like i've got the shortest attention span ever alright cool so what's happening uh all right so step two optional download pre-trained models for pre-trained models can you see that yeah all right cool i wonder if we should move that chat later on um so pre-chain models are now downloaded automatically if this doesn't work for you you can manually download them before you download any data set you can begin by testing our configuration with python demo cli.pi just double check uh yeah what are we doing let's open this up python cli. all right so we've got it in here python demo underscore cli.pi crossing my fingers something popping up okay something looks like it's downloading found one gpus available so that looks like it's picked up my uh my cuda insulation looks like pytorch is working so it's downloading uh so you can see here it's downloading the pi torch tensors so that's downloading pre-trained models all right we're gonna wait for that to download if all tests pass then you're good to go is this downloading is it freezing uh that's still downloading hold on there's something going wrong there right so assertion error download for synthesizer pt failed you may download models manually instead do we need that okay where did they want them placed all right so all right let me work you through what's just happened so it looks like this is downloading the synthesis the encoder the synthesizer and the vocoder so it is it looked like it had downloaded the encoder successfully where these actually stored so introduce path save model so it should be in uh let's go into our d drive youtube i'm just going based on what i think is the way that this is working so it's under saved models default all right so it's got the encoder and it's got me that seems very small for a vocoder okay so what we need to do is go and download them manually and drop them in here let's try this okay so where are we going we are going to over here open that in a new tab i've uploaded the models so we've got encoder i think that one's okay i don't think the vocoder is downloaded it's two kilobytes there's no way we need these two the vocoder and the synthesizer please ensure that the files are extracted to these locations within your local copy of the repository okay so let's download that this is the vocoder downloading in from google drive and then we need the synthesizer as well so let's download that oh god it's huge okay so these are downloading uh 20 minutes change the file path solve this problem i mean we're downloading it anyway now el paso and so you're saying changing the file path from is that configuration to actually go and change the file path who knows we'll come back let's see if we can at least get to step three done encoder size is 16.6 so i think that one looks okay i'm gonna assume unless we get a specific error you get music for the live stream what do you think like background music all right it's downloading now we're gonna have to wait a little while so this is downloading the vocoder pie torch weights and the synthesizer pie torchway so around about once the vocoder downloads we'll see how long it's actually going to take we might need to do a part two play some melody songs i should get have you seen that um that streamer that's got like a flamethrower in the back and has like lasers that could be another alternative like every time we successfully get some code running i can just like smash that button and just flame through i was flying out all right we've got the vocoder i'm going to delete this one because i'm pretty sure that is not it not the one how big is this let's get that yeah 52 megs all right so i've at least got that let's somebody said check the uh the melody uh melody check the encoder sides let's actually do that yeah 16.3 so we're good there was all right cool we just got to wait for the synthesizer and then we're going to be we're going to be it all right in the meantime let's double check what else we have to do [Music] before you download any data set you can begin by testing your configuration python demo cli for test pass so that's really all we've got to do and then we can run python demo toolbox python demo what happens if we just run that now we'll do it run python demo toolbox and close this we don't need that anymore so running that other command that's what i'm doing right now so i'm just running python demo toolbox just this command to see if we get anything doesn't look like it okay what happened i ran python demo toolbox and it's so it's still waiting on the synthesizer so we're gonna have to wait 13 minutes you know what in the meantime what we can do is check on our oh no why did i close that we can check on the model that we started doing training yesterday do you want me to do that so the speech translator i did a little bit of work on it last night after i checked out let's do it we'll get our five lines of code done today anyway so i don't know if you tuned into the daily data live stream yesterday but um we were working on an encoder while this is downloading let's just keep that we'll make sure we can check our downloads yep that's cool god it's still downloading so slow um while we were setting up so i was actually training the transformer language translation model and we had a bunch of issues namely because i think the encoder encoders hadn't been trained on the entire data set so i'm just going to fix double check that this is working now and that should be at least the entire transformer series of videos up and running so let's connect this so we can hit allow 11 minutes all right cool we can run that so what i did yesterday is a little bit of a magic and i actually went and saved the vectorizer so i went and trained this it took an hour and 12 minutes to train the two vectorizers but i managed to get them trained yesterday so we're actually going to be able to leverage those and i'm going to share those weights with you as well and i went and trained the actual language translation model using the new vectorizer as well so fingers crossed we are looking much better now and i wrote a bunch of code to actually load the pre-trained vectorizer weight so if i run that run that run that and then over here we've got pickle so this is what's loading the those vectorizer weights from google drive and then if i go and load that up so basically actually went through so we you would have seen that we brought up um stack overflow when we were trying to work out how to pick or how to save down weights for vectorizer layers so it's literally done using a like a pic the pickle library so what we're doing is we actually just dump the config and the weights as a dictionary and then we're able to bring them back up and load them into the model that way so over here we're running with open and then just the file path this is probably way too small for you guys to see we're just loading the file path from from google drive and then we're able to load those up this way and then what we can do is actually use text vectorization site text vectorization.fromconfig to load up the config and then there's this weird bug that forces you to actually run adapt as a like forces you to run like a pre-training before you can load the weights fully so then we can run e and weights and then that will load the weights and then we should be cooking with fire so if i run these these and then just test out those vectorization components how we're doing with this download nine minutes i don't know if we're going to get yet to do that in time we might need to finish continue voice cloning tomorrow all right let's take a look some questions well this is all downloading all right where did we end up uh hey uh so hey chad i was watching the five hour video and got to the annotation section is there a way to create image annotations automatically without drawing bounding boxes for each image yeah so there there are automatic image annotation libraries out there um go and have a look to see if there's like you just google auto image annotation this stuff that they can actually help you out how much should i score in a masters data science for a good job honestly focus on getting experience so the score is obviously very much important but making sure that you get experience in data science is just as important if not more important um muhammad the voice cloner yes that is an open source package share my streaming setup have you guys seen the video that i did on my battle station breakdown i don't know if you saw it but that is pretty much my entire streaming setup i'm going to link it down here look at me self-promoting myself not me there you go check it out um nick vod hey i love you too alrighty cool let's get back to this cool so we have now downloaded our data set this looks okay let's check our vectorizer is vectorizing and so you can see that we're no longer getting too many ones because one represented unknown so if you cast your mind back if i type in en vectorizer dot get vocabulary nope i spelt that wrong why also i um i dug into this data set a little bit so the opus data set that this is trained on is a medical speech to text library so just keep that not speak to text medical language translation library um so unknown is token one there you can see that now i've already gone and trained it so we do we need our data set we probably don't do we that's fine we'll set it up anyway uh and then we will bring in all of if this works that means the all three of the transformer tutorials are good to go i said they were good to go on a community tab but i clearly was incorrect because we got a little bit of additional work to do let's define the model we can compile it i don't need to fit it because i've been running this pretty much all day accuracy is a bad metric to look at so loss is the better metric that looks like it's performing a lot better okay fingers crossed what we can then do is load our weights so we are going to load the weights for i got up to epoch 30. okay and then we can get an input sentence so this is this i was looking at i'm like why are these weird data sets or like with these weird sentences in the pregnant write the auc for calculated free drug at this dose was approximately eight times the human auc at 20 mil adam 20 milligram dose so now we can run it through our greedy algorithms again i did a brief live stream on that as well let's see what this looks like if i print out the translation how far are we on a download four minutes left i don't know what's happening here we're gonna have to go and debug this an absolute ton so this was the original error that we had right it's just predicting unknown no good at all i'm gonna have to dig into that a whole ton more i was hoping this would work but that was clearly not the issue i'm probably just gonna go and run through uh mr charlay's tutorial and then debug from there still spitting out unknowns not happy anyway even though loss was decreasing this is the fun with deep learning guys there's uh there's a ton of steps that you've got to go through to actually go and debug it all righty i've got to wrap up this live stream soon now anyway because i promise this was originally only going to be five lines of code a day we've clearly expanded out but hopefully you guys are still enjoying these live streams so tomorrow i think we're gonna go and finish the the voice cloner before we do wrap up though let's quickly check and see what you guys are chatting about and uh if you need anything from me can you land a job in ai 1819 depends on what experience you've got very much do need to be super experienced to be able to get in uh at that stage did the voice cloner work so right now it's still downloading so you can see that we have three minutes left but i'll effectively have that synthesizer downloaded for our test tomorrow so we just need to get that synthesizer or those synthesizer weights thrown in here then we'll be able to kick it off um do when i make the agent for a game where do we control the mouse and buttons so it'll be inside some some presuming nikvod you're speaking about reinforcement learning there so to be inside of the action space and you would be doing the transformation of the action space to a mouse or button click inside of your step function so def step is possible to put my voice in some song without singing i have no idea maybe we'll try it out yeah so we really just got to go get this synthesizer downloaded and you could get a good job in ai with just a bachelor's of science in cs stay tuned for my ama because i'm actually going to answer that in a little bit more detail so that is going to be coming a little bit later this week depending on how we go but uh not exactly a super code focus or super output focus live stream but hopefully you've enjoyed it nonetheless again thanks so much for tuning in guys i love you all peace i'll see you in the next one i'll see you tomorrow you

Info

Channel: Nicholas Renotte

Views: 12,858

Rating: undefined out of 5

Keywords: data science, machine learning, deep learning, python

Id: qVD1NBLoOJw

Channel Id: undefined

Length: 46min 52sec (2812 seconds)

Published: Tue Jul 12 2022