Speech Recognition using Deep Learning Part 1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what's up guys this will be part 1 of my deep speech tutorials in this video we're just going to get it up and running just run some basic inference on your CPU and just you know test it out to make sure that it works the way we expect it to so I'm going to be going through I'm gonna have all the commands and everything you need to do on my blog here and I also link you to Mozilla's repo where they actually implement deep speech if you want to dig deeper into the code all right so what is deep speech so from that github repo deep speech is an open-source speech-to-text engine using a model trained by machine learning techniques based on Baidu's deep speech research paper project deep speech uses Google's tensorflow to make the implementation easier so now with that out of the way we can get started we're going to use Conda to manage our virtual environment here if you've never used Conda or you don't know how to install it or whatever you can check out my blog I have a tutorial on installing that so we're gonna create one called DES for deep speech and we'll just use the latest version of Python yep alright let's activate it okay so the first thing we're going to want to do is pip install deep speech [Music] all right now that we have that installed you can see I already have these pulled down so we're going to need an audio sample directory here and a directory containing all of our models right these are what I pull down and then I untaught them both right here and the commands to do that are right on my blog so what you'll do is you'll copy and paste this curl command right there it'll take about ten minutes but you can pull it down should be fine then you want our tour it with this command right here you'll do the same exact thing to get the audio files after you've done that and you have the same audio and deep speech 0.6 0.6 point one models directory here you should probably explore them a little bit to see what we're working with so let's go into the audio all we have in here is just a few WAV files containing some audio that deep speech can be used to run inference against we'll also use our own voice later on in the video as well and in the deep speech models directory the important stuff to pay attention to here is the language model binary and this model right here this is a model I know they use for training and then they converted it over to a PD mm because essentially it runs faster you don't have to be concerned with that right now since really all you'll need right now is this file right here all right so let's make sure we're back in the root of this directory for this next command to work and you can pull this right off the block right you can basically copy and paste everything to make sure that it works but first let's listen to some of the audio so you'll notice the audio that I'm passing in here is twenty eight thirty right that's this wav file right here so let's play that really fast so I'm using a CLI tool called seem use you can use any tool you want to play it back but here we go so let's see what he has to say experience purpose experience proves this right you'll know you'll notice that deep speech shows an okay job at figuring out what he said but will also test it against our own voice and you can see that deep speech generally does a pretty good job and eventually we'll go over training it to make it better too so for the next thing yeah let's just run it and see what deep speech thinks that he just said and then we have to get out of the audio directory make sure you're in the root of the directory for this next command to work because it's just you know it's got relative paths here so let's talk about this command actually for a second when we pip install deep speech we installed this deep speech command right we're pointing to this model that I just showed you that PBM M and the reason we're using the PBM M is because it's just faster at inference than the dot PB right and then we're passing from the audio directory here we're passing this WAV file alright so now let's run it okay so you can ignore all of us down to here this is what it thought it said experience proofs less right so that's pretty cool that's pretty close to proves this right so not too bad but could be better so the next thing we're going to want to do is check it out on our own voice so to do that I recommend installing a package called Sox right this guy right here and that's all I'll be on my blog as well if you're on a boon to or some Debian based distro you can install it what's you do have to install Sox on arch linux you can get it from the aur and on mac you can brew install it the reason we're installing Sox is because when we install Sox we get a rec command or a record command so we can record our voice with specific parameters that make it easier for a deep speech to understand what works so and I'll go over them next so here's what we'll do we'll record our voice and let's talk about what we're doing here so we're using the rec command to record our voice the R is for rate and we're doing 16 K so 16 kilohertz and mono channel right so then we'll also store our recording in my recording dot wave so let's get started the quick brown fox jumps over the lazy dog alright so you can just control C when you're done and you'll be out of the recording so now you should see my recording dot wave and now we can run it against that so you press up a couple of times to get your old command back and we'll just back out the audio pointing to the audio directory and we'll run it against our new my recording got wave alright and you can see it did a lot better this time so that's not a perfect example there old example because deep speech is actually pretty good so the quick brown fox jumps over the lazy dog got it all right and depending on where you are it'll do really well like how where you're positioned on the mic you know like accents and all that kind of stuff plays a role but training it on on your own speech and trading it in general on other things skin can improve this as well but as a base model it's pretty good so we should also talk really quick about what you want to see when when you pass a dot wave two to two deep speech so essentially deep speech it only does really well with 16 kilohertz and 1 and mono channel right it can I think it can work with other ones like it could work with like you could pass a different rate and you can pass other stuff in but it won't it won't run inference as well so if we do media info and this is another command you can install on my recording don't leave this is how we'll know that it's actually recorded in the right way right so if you want to use a different a different way of recording your voice into a dot wave then the recommend that we used earlier you can check that it's in a good format using this command media info so you'll essentially want it to be you'll want it to have a bit rate of 256 kilobytes per second B mono channel or have one channel and be at a sampling rate of 16 kilohertz alright and that's pretty much it for this video in the next video we'll go over how to run inference on a GPU and in the following videos we'll go over things like training it and getting it built into something like a server so you can send commands to it

Info

Channel: chris@machine

Views: 17,783

Rating: undefined out of 5

Keywords: deep learning, deepspeech, machine learning, DeepSpeech, Tensorflow, speech recognition

Id: BltcZmpo1dI

Channel Id: undefined

Length: 9min 15sec (555 seconds)

Published: Fri Apr 17 2020