Real-time Speech to Text with DeepSpeech - Getting Started on Windows and Transcribe Microphone Free

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone in this video I'm going to talk about deep speech alright so in this video I'm going to talk about deep speech an open source speech to text engine by Mozilla based on deep learning which allows us to convert speech audio files into text and I'm going to show you how you can get yourself up and running on Windows so without further ado let's get started the first thing we need is a modem and while creating your own model is possible it is very computationally expensive so for this video and in general for most of your projects you should start with a pre train the model that you can find in the deep speech repository on github then on releases and by the time of this video we are looking for release zero point seven point three and we go down and download two files which are models that scorer and models dot PB mm and depending on your connection this can take a few minutes so I recommend you to do that first all right so while our models are downloading the next thing we should do is installing deep speech so what I prefer to do is creating a new folder in the Documents folder code for example deep speech then I'm going to CD into it and then the next thing we should do is making sure that we have Python 3.6 installed and the best thing to do is just type in Python in the terminal and then check in the version here this is crucial because python 3.7 or python triple h are not compatible with deep speech as of now so if you don't have the right version the best thing you should do is going into the Python download page and then searching for python version six point seven or six by nine for example and download it and install it okay after we check your version we are ready to create a new virtual environment and the comment to do that the type in Python - m vm dot which means create the visual environment here right so it seems like it creates the environment now we have to activate that Ireland and they coming to do that is scraped activate and as you can see we now have deep speech at the beginning of our prompt line okay that means that we are into that environment now we are going to VIP and stow the speech and then be patient because it usually takes a while alright so as you can see we already have deep speech enabled now and this is enough to convert audio files into text but in this video I wanted to show you how you can use your microphone to convert speech to text in real time and to do that we first need to install another thing which can be found in to the deep speech examples repositories also on github and in particular we want the meek vet streaming folder and we want to copy it into the deep speech folder we created earlier right so as you can see if I type deer and now have the meek fed streaming folder I'm now going to City Internet and install all the required dependencies with pip install - our requirements wait for a bit because it takes a while alright so now that the dependencies are installed we can try the example and to do that without this command Python then the name of the script Mick vag streaming - M and then the path of the dot P B and M model we downloaded earlier space - s and then the path of the score we found before then we can start hello everyone goodbye as you can see my speech was converted into text we can experiment with this a lot I am going to show you some cool tricks you can do at this point if we open the Macbeth streaming script and we go down to about this line right here 194 here we have the conversion between the stream and the text one very interesting thing is that if we change a finish stream will finish stream with metadata and we save it and execute it again the script will now give us time information for each token and this is super cool I'm going to show you hello everyone as you can see for each letter it says the time at which that letter was detected and this is super useful for many applications such as for example if want to add subtitles to videos you now have the exact moment in which you should put that letter and I'm going to show you a nice little project in one of the next videos but for now you can experiment with this as much as you want right so before leaving you I'm going to show you another very cool thing related to deep speech which is the common voice project so most of the cloud speech API such as the one from Google use big data sets that are proprietary and they are not free to use for everybody this makes creating open-source speech-to-text engines such as deep speech verb difficult so Mozilla created this project come on voice in which you can donate your voice and also validate voices made by other people and this is super useful if you have some spare minutes during the day I highly suggest you to do that and another very useful thing is that because they are creating this huge data set and they created this platform we also offer the same thing for many languages I personally contributed quite to the Italian dataset because as of now there are no good Italian datasets and you are not a native English speaker I highly suggest you to contribute to common voice in your native language because it really helps the project and eventually we will get open-source speech-to-text engines for everybody an open-source speech-to-text engine is a great thing to have because as of now we only have good performances on those cloud services such as the Google speech API which is only running on the cloud and is proprietary so two good things about deep speech is that it is not proprietary you can do whatever you want with it and you can also run it locally and this is very good for many applications thing is the performance of this speech is not as good as the proprietary counterparts as of now so please contribute to those great projects if you have a bit of time because eventually we will all benefit from these projects so thank you very much I hope you liked this video if you did please consider subscribing to the channel and leaving your like because it really helps and I hope to see you next time

Info

Channel: Federico Terzi

Views: 127,018

Rating: undefined out of 5

Keywords: deep, speech, deepspeech, to, text, speech-to-text, local, free, open, source, opensource, mozilla, tutorial, guide, demo, model, pretrained, getting started, english, voice, audio, common, project, transcription, locally, python, install, how, installation, getting, started

Id: c_0Q3T0XYTA

Channel Id: undefined

Length: 7min 19sec (439 seconds)

Published: Thu Jun 11 2020