Transcribe Audio Files with OpenAI Whisper

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome back in this quick video today we're going to learn how to easily transcribe audio files using open AI whisper in Python so let us get right into it [Music] alright so this video is going to be very short because the process is extremely simple and we only need a few lines of code to make this work and essentially the idea is that we have an audio file for example here I have this video sound dot MP3 which is basically just the audio file or the audio data for one of my videos we can play this here real quick what is going on guys welcome back I don't know if you can hear that right now to learn about the python package that but it's essentially just one of my videos as an audio file and what we want to do now is we want to transcribe this professional you want to get the text for example to create subtitles or to just do some machine learning work to create a search engine for my videos maybe I want to transcribe all of my videos and I want to be able to create a search engine so that you can look for a term and then you can find where it occurs in a certain video that would be one use case but we want to do that professionally and we don't want to do this with a lot of mistakes now it's not going to be perfect because there are certain words that are just not going to be recognized so for example certain package names uh certain names of applications and technologies that are not necessarily in the dictionary those are going to be a little bit more difficult to transcribe but essentially this is going to be a very high quality transcription so what we're going to do is we're going to create a new python file I'm going to call this main py we're also going to open up the command line and we're going to install a package called open AI Dash whisper and this is uh one of those models that you can use without tokens so you can just install the package and you can use it without an API key or anything and what we're going to do here now is we're going to say import whisper and then we're going to say model equals whisper dot load model and we're going to load the base model and the result of the transcription is going to be just model dot transcribe and we're going to pass here the video underscore sound dot MP3 file and then we just want to open a new file transcription dot txt in writing mode SF F dot right and we're going to write a result into that file but not the uh not the whole result but just the text that is produced and that is the whole Magic I think this is running locally so we're not using some model in the cloud from openai I think this installs basically the model locally I don't know if you have to have certain um if you have to have certain gpus or something that are a little bit more modern I ran this also on my laptop that is now five years old and it has an AMD GPU so you definitely don't have to have Nvidia uh it worked perfectly fine now let's see if it works here as well I think we're going to skip uh the whole process because it takes oh actually it doesn't take too long so we can actually wait for this or was this just a download I think this was just a download so it will probably take some time so we can skip to the part where the transcription is done all right so it seems like the transcription is now done so we can open up this transcription.txt file and we can see here what's going on guys welcome back in this video today we're going to learn about the python package that allows us to automatically find and so on so this is very accurate this is exactly what I'm saying in the video but I'm sure we are going to find some mistakes uh modules and packages that are used in a project automatically create a requirements txt file let's get right into it all right so for this video today I'm going to be working on Linux and so on I think the package that we used here was pip Rex so p-i-p-r-e-q-s I think this is definitely not going to be transcribed properly um so let's see if it's somewhere occurs ah pip Rex there you go so this doesn't work because of course pip Rex the package name p-i-p-r-e-q-s is not going to be recognized by openingi Whisper because it's just not a commonly used thing but pip Rex because Rex is a word and pip is known as a package manager um this is going to be recognized so you will have to do some manual adjustments here probably or you can just ignore it um but other than like these special names I think everything should be transcribed correct I'm not going to look through all of the text now but I think that this is a very very high quality uh transcription because if you compare this to something like uh just a basic speech recognition which you can use so you can just use the python speech recognition module you can go through the file you can split it up you can transcribe the individual pieces first of all it's a hassle because you have to um not use too much data at once when you transcribe like that but then again every single word is probably or every other word is going to be incorrectly transcribed so this is a very high quality transcription and all of this is running locally on your machine for free so you don't even have to uh use an API key you don't have to use tokens or anything I think you have to have some decent Hardware but as as I said I'm running this now on a pretty strong computer but I also ran it on my laptop which is not as strong it's not even strong enough to record properly so I think that most of you guys will be able to run this to some degree all right so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting the like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you next video and bye
Info
Channel: NeuralNine
Views: 26,814
Rating: undefined out of 5
Keywords: python, openai, whisper, python openai whisper, python whisper, transcribe audio, python transcribe audio, openai whisper transcribe audio, python openai whisper transcribe audio, audio to text, python audio to text
Id: UAdX0cGuC28
Channel Id: undefined
Length: 6min 14sec (374 seconds)
Published: Sat May 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.